In this post, we will give an introduction to the fundamentals of image processing. The content is based on the book Introduction to Autonomous Mobile Robots.
This post will cover the following topics
- image filtering
- edge detection
- line extraction algorithms
- feature extraction
- corner detection
- Harris corner detection
- SIFT features
- image matching
There are two types of image filtering
- filtering in the frequency domain
- filtering in the spatial domain
We will focus on the filterings in the spatial domain in this post. To perform image filtering in the spatial domain, we need to define a mask (or a kernel). A mask is a weight matrix that characterizes the "impact level" of the neighbors of a given pixel.
Mathematically, we can write
The first operation is called correlation with the kernel \(w\) and the second operation is called convolution with the kernel \(w\).
Examples of image filers are:
- Identify filter
- Average filter
- Ridge detection filter
- Sharpen filter
- Box blur filter
- Gaussian blur filter
- Unsharp masking filter
Image filters can be used to blur images or reduce noise.
Kernel convolution usually requires values from pixels outside of the image boundaries. There are a variety of methods for handling image edges.
- Crop/Avoid overlap
- Kernel Crop
The figure below illustrates how image filtering works. The image on the left is the original image and the image on the right is the result after applying the filtering. As we can see, each pixel in the new image is calculated based on the pixels in the rectangles in the original image.
The basic idea is to search for significant changes in the image. A perfect edge can be represented by a step function but we are far from this ideal representation in any practical applications. One of the challenges to detect edges in an image is the noise. As we can see in the figure below, when the noise is present, it becomes super hard to identify the spike that corresponds to the value change in the step function.
A good edge detector would want to achieve the following goals:
- Maximizing the signal-to-noise ratio.
- Achieving the highest precision possible on the location of edges.
- Minimizing the number of edge responses associated with each edge.
The basic idea of edge detection is to first smooth the image and then calculate the derivative s of intensity. Two classic edge detectors are
As described earlier, we want to minimize the number of edge responses associated with each edge, a technique called non maximum suppression is used. For example, in the Canny edge detection algorithm, after we find the magnitude and orientation of gradient, we check if pixel is local maximum along gradient direction (this may require checking interpolated pixels).
Line Extraction Algorithm
In this section, we briefly describe some of the classic line extraction algorithms. For a robotic application, especially the one with man-make structures, straight lines in an image can provide valuable information about the surrounding environment. For instance, if a robot is operating in an indoor environment, straight lines may be more likely to be associated with hallways and walls. If we can detect straight lines in the captured images, it will help the robot to locate itself.
Most of the time in a robotic application, we need to deal with range data from distance sensors. It is more convenient to use polar coordinates in this case and a point will be represented by \( x_i = (\rho_i, \theta_i) \). A line in polar coordinate can be written as
The equation is a direct translation of "the projected length of the line segment \(OP\) on the normal direction of the straight line is \(r\)".
Once we have the line representation, it's easy to do line fitting. For example, we can use the least square error to find the optimal values for \(r\) and \(\alpha\).
1. Split-and-Merge Algorithm
q = Queue() q.add(dataPoints) candidates =  while not q.isEmpty(): s = q.pop() line = fit_line(s) point, maxDistance = detect_max_distance(line, s) if maxDistance > threshold: s1, s2 = split(s, line, point) q.add(s1) q.add(s2) else: candidates.append(s, line) output = merge_collinear_segments(candidates)
2. Line Regression Algorithm
3. Incremental Algorithm
Note: It seems that this algorithm requires the data points to be sorted by the angle because there is an implicit ordering involved when we select the "next two points" in step 6.
RANSAC (Random Sample Consensus) is an algorithm to estimate robustly the parameters of a model from a given data in the presence of outliers. This algorithm is probabilistic and involves repeated sampling. The idea is that if we try enough times, we would get the right result with a high probability.
In the above figure, the inliers are the black points and the outliers are the red points. The question is how many times we need to try so that we can select two black points with a given probability \(p\). Once we select a pair of two black points, then the line will become a candidate because it has many inliners (step 6 and 7).
If we know the fraction of the inliners in the database (i.e. numOfBlackPoints / totalNumberOfPoints), we can calculate the minimal number of iterations \(k\) so that we have a probability \(p\) of having selected a pair of two black points at least once in the \(k\) iterations.
5. Hough Transform
Hough Transform is a simple yet beautiful algorithm. The core idea is voting. A line can be represented by a pair of parameters \( (m_k, b+k) \), which represents the line \( y = m_k x + b_k \). Each point in the image can vote for the lines that pass it and we report the lines with the most votes.
6. Histogram of Angles
We can also identify straight lines by using the histogram of angles. For example, we can visit data points in clockwise order and calculate the angle of the line segment that connects two adjacent data points. Then we build a histogram of these angles. If there are straight lines in the environment, we will see spikes in the histogram, which correspond to the slopes of the straight lines.
Feature extraction has many applications:
- Image alignment and building panoramas
- 3D reconstruction
- Motion tracking
- Object recognition
- Indexing and database retrieval
- Robot navigation
The objective is to find a systematic way to construct a representation of a local area of interest in an image. Here is a list of important properties of a good feature:
- Localization accuracy
- Quantity of features
- Computational efficiency
The feature extraction algorithm should be able to run independently in different areas in an image or in different images.
There are three categories of features
- Feature with a semantic interpretation. This type of feature is associated with a high level concept such as doors, and houses in an image.
- Feature with a location that can be determined accurately and robustly over time.
- Feature that does not have semantic interpretation at individual level but can be used to recognize a scene or an object if taken all together.
Two classic feature detection algorithms are
- Harris Corner Detector
- SIFT features
Harris corner detection uses the first order approximation to measure the variation in a local area. Then, it calculates the eigen values and eigen vectors and uses them as a local coordinate system. The directions of the two eigen vectors are the directions that have the most variation in the local area.
SIFT features are more complex. The objective is to constructor descriptors for the keypoints in an image. The output looks like the following image:
For detailed explanation of the algorithm, please refer to
The figure below collects some important notes.
Once we have the descriptors in two images, it's straightforward to figure out the matching point. The matching points should have similar descriptors so one simple method is to calculate the Euclidean distance of the descriptor vectors in the two images. The pair with the minimal distance is considered a match.
----- END -----
©2019 - 2022 all rights reserved