Introduction to Image Processing Fundamentals

Subscribe Send me a message home page tags

#image processing  #computer vision  #robotics  #line detection  #feature detection  #image matching 


In this post, we will give an introduction to the fundamentals of image processing. The content is based on the book Introduction to Autonomous Mobile Robots.

This post will cover the following topics

Image Filtering

There are two types of image filtering

We will focus on the filterings in the spatial domain in this post. To perform image filtering in the spatial domain, we need to define a mask (or a kernel). A mask is a weight matrix that characterizes the "impact level" of the neighbors of a given pixel.

Mathematically, we can write

$$ I'(x, y) = \sum_{s=-a}^{a} \sum_{t=-b}^{b} w(s,t) I(x + s, y + t) $$


$$ I'(x, y) = \sum_{s=-a}^{a} \sum_{t=-b}^{b} w(s,t) I(x - s, y - t) $$

The first operation is called correlation with the kernel \(w\) and the second operation is called convolution with the kernel \(w\).

Examples of image filers are:

Image filters can be used to blur images or reduce noise.

Kernel convolution usually requires values from pixels outside of the image boundaries. There are a variety of methods for handling image edges.

The figure below illustrates how image filtering works. The image on the left is the original image and the image on the right is the result after applying the filtering. As we can see, each pixel in the new image is calculated based on the pixels in the rectangles in the original image.


Edge Detection

The basic idea is to search for significant changes in the image. A perfect edge can be represented by a step function but we are far from this ideal representation in any practical applications. One of the challenges to detect edges in an image is the noise. As we can see in the figure below, when the noise is present, it becomes super hard to identify the spike that corresponds to the value change in the step function.


A good edge detector would want to achieve the following goals:

The basic idea of edge detection is to first smooth the image and then calculate the derivative s of intensity. Two classic edge detectors are

As described earlier, we want to minimize the number of edge responses associated with each edge, a technique called non maximum suppression is used. For example, in the Canny edge detection algorithm, after we find the magnitude and orientation of gradient, we check if pixel is local maximum along gradient direction (this may require checking interpolated pixels).


Line Extraction Algorithm

In this section, we briefly describe some of the classic line extraction algorithms. For a robotic application, especially the one with man-make structures, straight lines in an image can provide valuable information about the surrounding environment. For instance, if a robot is operating in an indoor environment, straight lines may be more likely to be associated with hallways and walls. If we can detect straight lines in the captured images, it will help the robot to locate itself.

Line Representation

Most of the time in a robotic application, we need to deal with range data from distance sensors. It is more convenient to use polar coordinates in this case and a point will be represented by \( x_i = (\rho_i, \theta_i) \). A line in polar coordinate can be written as

$$ \rho \cos \theta \cos \alpha + \rho \sin \theta \sin \alpha = r $$


$$ \rho \cos ( \theta - \alpha ) - r = 0 $$

The equation is a direct translation of "the projected length of the line segment \(OP\) on the normal direction of the straight line is \(r\)".


Once we have the line representation, it's easy to do line fitting. For example, we can use the least square error to find the optimal values for \(r\) and \(\alpha\).


1. Split-and-Merge Algorithm

q = Queue()
candidates = []

while not q.isEmpty():
    s = q.pop()
    line = fit_line(s)
    point, maxDistance = detect_max_distance(line, s)

    if maxDistance > threshold:
        s1, s2 = split(s, line, point)
        candidates.append(s, line)

output = merge_collinear_segments(candidates)

2. Line Regression Algorithm


3. Incremental Algorithm


Note: It seems that this algorithm requires the data points to be sorted by the angle because there is an implicit ordering involved when we select the "next two points" in step 6.


RANSAC (Random Sample Consensus) is an algorithm to estimate robustly the parameters of a model from a given data in the presence of outliers. This algorithm is probabilistic and involves repeated sampling. The idea is that if we try enough times, we would get the right result with a high probability.


In the above figure, the inliers are the black points and the outliers are the red points. The question is how many times we need to try so that we can select two black points with a given probability \(p\). Once we select a pair of two black points, then the line will become a candidate because it has many inliners (step 6 and 7).

If we know the fraction of the inliners in the database (i.e. numOfBlackPoints / totalNumberOfPoints), we can calculate the minimal number of iterations \(k\) so that we have a probability \(p\) of having selected a pair of two black points at least once in the \(k\) iterations.

5. Hough Transform

Hough Transform is a simple yet beautiful algorithm. The core idea is voting. A line can be represented by a pair of parameters \( (m_k, b+k) \), which represents the line \( y = m_k x + b_k \). Each point in the image can vote for the lines that pass it and we report the lines with the most votes.

6. Histogram of Angles

We can also identify straight lines by using the histogram of angles. For example, we can visit data points in clockwise order and calculate the angle of the line segment that connects two adjacent data points. Then we build a histogram of these angles. If there are straight lines in the environment, we will see spikes in the histogram, which correspond to the slopes of the straight lines.


Feature Extraction

Feature extraction has many applications:

The objective is to find a systematic way to construct a representation of a local area of interest in an image. Here is a list of important properties of a good feature:

The feature extraction algorithm should be able to run independently in different areas in an image or in different images.

There are three categories of features

Two classic feature detection algorithms are

Harris corner detection uses the first order approximation to measure the variation in a local area. Then, it calculates the eigen values and eigen vectors and uses them as a local coordinate system. The directions of the two eigen vectors are the directions that have the most variation in the local area.


SIFT features are more complex. The objective is to constructor descriptors for the keypoints in an image. The output looks like the following image:


For detailed explanation of the algorithm, please refer to

The figure below collects some important notes.


Once we have the descriptors in two images, it's straightforward to figure out the matching point. The matching points should have similar descriptors so one simple method is to calculate the Euclidean distance of the descriptor vectors in the two images. The pair with the minimal distance is considered a match.

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?