In this post, we briefly describe some classic geometric transformation in image processing. Most of the content in this post comes from the lecture notes listed in the Related Reading section. The following topics are covered in this post:
- Homogeneous Coordinates
- Matrix Representation of 2D Geometric Transformation
- Image Geometry and Perspective Transformation
- Image Transformation
- Opencv Code Example
Before presenting the topics, we have a quick note on the topic of "projective transformations vs perspective transformation". Quoted from Projective Transformation:
A transformation that maps lines to lines (but does not necessarily preserve parallelism) is a projective transformation. Any plane projective transformation can be expressed by an invertible 3×3 matrix in homogeneous coordinates; conversely, any invertible 3×3 matrix defines a projective transformation of the plane. Projective transformations (if not affine) are not defined on all of the plane, but only on the complement of a line (the missing line is "mapped to infinity").
A common example of a projective transformation is given by a perspective transformation.
According to a comment in StackExchange, the OpenCV's
getPerspectiveTransform function seems to be incorrectly named. It should be called
- Linear Algebra & Geometry [local copy]
- Lecture 12: Camera Projection Part 1 [local copy]
- Lecture 13: Camera Projection Part 2 [local copy]
- Lecture 2: Geometric Image Transformations
Homogeneous Coordinates is a math trick that unifies the mathematical representation of different geometric transformation. The main idea is to add an additional dimension to point vector and introduce some kind of equivalence relation.
The Homogeneous coordinate of a 2D point \( (x, y) \) is \( (xz, yz, z) \) with \( z \neq 0 \) and the Homogeneous coordinate of a 3D point \( (x, y, z) \) is \( (xw, yw, zw, w) \) with \( w \neq 0 \).
Homogeneous coordinates essentially express the equivalence relation of points on different scaling planes. The points with the same color in the figure below are equivalent.
Matrix Representation of 2D Geometric Transformation
Geometry transformation can be represented in a matrix form with Homogeneous Coordinates. Note that translation uses the additional dimension in homogeneous coordinate.
Image Geometry and Perspective Transformation
There are four coordinates involved in image geometry
- World coordinates
- Camera coordinates
- Film coordinates
- Pixel coordinates
Note that world coordinate and camera coordinate have 3 dimensions while the film coordinate and pixel coordinate have 2 dimensions.
From World Coordinate to Camera Coordinate
This is a relatively straightforward transformation. To convert world coordinate to camera coordinate, we need the camera location \( C \) in the world coordinate. The transformation is a translation followed by a rotation.
In matrix form, we have
Recall that world coordinate and camera coordinate are 3D vector, therefore the homogeneous coordinate are 4D vector.
From Camera Coordinate to Film Coordinate
This change of coordinate is a perspective projection. We project a set of 3D points to a 2D plane. Let \( (x, y) \) denote the projected point on the film plane. According to geometry, we have
This Cartesian coordinate is equivalent to the following Homogeneous coordinate:
The matrix form of the perspective projection is given by
Note that \( (x, y, z)^T \) is the equivalent homogeneous coordinate of a point on the film plane.
From Film Coordinate to Pixel Coordinate
This change of coordinate is related to intrinsic parameters of the camera. Mathematically, it's a scaling transformation plus a translation transformation. Here is the formula:
In Homogeneous coordinate, we have
Put Everything Together
Suppose the original image uses \( (x, y) \) coordinates and the new image uses \( (u, v) \) coordinates. If the coordinates are continues, image transformation means copying the pixel at \( (x, y) \) in the original image to \( (u = f_1(x,y), v = f_2(x, y) ) \) in the new image.
For image coordinates are not continuous but discrete, there are additional work. The ultimate question of image transformation task is what pixel value we should have at \( (u, v) \) in the new image, where \(u\) and \(v\) are integers? To answer this question, first we need to figure out the coordinate in the original image that maps to \( (u, v) \). This is an inverse projection problem. Now the issue is that the \( (x, y) \) coordinates in the original image may not be integers, which means we don't have the pixel value in the original image. So the next step is to calculate this value. Interpolation methods are commonly used for this task.
Opencv Code Example
Example: Rotate an image
To rotate an image, we need to specify a rotation matrix by calling
getRotationMatrix2D. In this method, we specify the center of rotation, the angle of rotation and the scaling factor. According to the openCV documentation, the transformation maps the rotation center to itself. If this is not the target, adjust the shift. This essentially means
Note that this equation is also used to derive the third column in the rotation matrix. There are three columns because we are operation in the Homogeneous coordinates and the third column is related to translation.
Here is the code
def rotateImage(image, angle, center): row,col = image.shape[:2] rot_mat = cv2.getRotationMatrix2D(center,angle, 1.0) new_image = cv2.warpAffine(image, rot_mat, (col,row), borderMode=cv2.BORDER_CONSTANT) return new_image image = cv2.imread(path.join(tmp_directory, 'test_image_1.png')) height, width = image.shape[:2] rotated = rotateImage(image, 45, (int(width / 2), int(height / 2)))
Example: Apply projective transformation
To apply projective transformation, we need to specify 4 point mapping. For example, in the figure below, the red points in the left images are the source points and the green points are the destination points. Then we call
getPerspectiveTransform to get the projective transformation matrix and pass it into
Here is the code (without code that generates the grid):
image = cv2.imread(path.join(tmp_directory, 'sample_3.JPG')) height, width = image.shape[:2] copied = np.copy(image) def drawCircle(image, p, color): cv2.circle(image, p, 15, color=color, thickness=-1) # Draw red points on the image. Red points are source points p1 = (946, 787) p2 = (2448, 820) p3 = (652, 1680) p4 = (2712, 1762) for p in [p1, p2, p3, p4]: drawCircle(copied, p, color=(0, 0, 255)) # Draw green points on the image. Green points are destination points. d1 = (652, 734) d2 = (2448, 734) d3 = (652, 1713) d4 = (2448, 1713) for p in [d1, d2, d3, d4]: drawCircle(copied, p, color=(0, 255, 0)) # Apply projective transformation. source = np.float32([p1, p2, p3, p4]) destination = np.float32([d1, d2, d3, d4]) M = cv2.getPerspectiveTransform(source, destination) correctedImage = cv2.warpPerspective(image, M, (width, height), borderValue=(255, 255, 255))
----- END -----
©2019 - 2022 all rights reserved