Part of CS180 : Intro to Computer Vision and Computational Photography

In the process of attaining photos that will be used for photo mosaics, I took a variety of photos of landscapes and landmarks. These photos were taken in different locations, but followed the same principle to keep the lens as fixed as possible and take multiple photos whilst rotating the camera.

To stitch the photos, we first need to determine the relationship between them. This is done by calculating the homography, a 3x3 matrix that describes the transformation between two images. We compute the homography by using correspondence points between both images, manually (for now) marked locations that are the same between both images. By constructing a system of equations, we can solve for the homography matrix.

$$ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} $$

Because we are using projective geometry, we will set \(i = 1\), which enforces consistent scaling, and solve for the remaining 8 unknowns. We can now solve for these 8 unknowns, let us start by expanding the above equation:

$$ \begin{align*} ax + by + c = wx' \\ dx + ey + f = wy' \\ gx + hy + 1 = w \\ \end{align*} $$

We can now solve for the unknowns by constructing a system of equations. We can do this because our 3rd equation contains just w on the right side, meaning we can substitute the other two equations.

$$ \begin{align*} ax + by + c = (gx + hy + 1) x' \\ dx + ey + f = (gx + hy + 1) y' \\ \\ \text{And further simplifying, we get:} \\ \\ ax + by + c - gxx' - hyx' = x' \\ dx + ey + f - gxy' - hyy' = y' \\ \end{align*} $$

Now we can construct a matrix equation to solve for the unknowns:

$$ \begin{align*} &\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \end{align*} $$

With 8 unknowns, and 2 equations per correspondence point, we need at least 4 correspondence points to solve for the homography matrix. Using least squares, we can solve for the homography matrix with additional points to reduce errors from minor misalignments.

Here is an example of what the point correspondences used to calculate the homography matrix look like on the images:

Homography matrices are useful to warp all the points from one image to another. Going forward, we will refer to images in their own coordinate system, with the origin at the top left corner, as well as the mosaic coordinate system, which is a grid large enough to contain all images in their warped form. The mosaic coordinate system relies on the presence of a root image, which all other images will be warped to. In my implementation, I constructed a graph of where images were connected by their provided correspondence points, and used the closeness centrality metric to determine which image had the smallest distance to all other images. This image was then chosen as the root image.

For each image that would be warped, warp_im, I computed the shortest path to the root image, root_im. Then by traversing from warp_im to root_im, I could construct a homography matrix that would warp from warp_im to the intermediary nodes, and finally to root_im.

Once the homography matrices were computed, I could apply inverse warping from the pixels within the polygon that bounded the warp_im in the mosaic coordinate system, and interpolate the pixel values to fill in the mosaic image.

It is a good check to ensure that the images are correctly positioned atop each other at this point. If we simply change the alpha values and combine, we can see that our warping does work.

However, simply overlaying the images will not suffice. We need to blend the images together to create a seamless mosaic. To do this, we can use a simple linear blending technique. We can calculate the alpha value for each pixel in the mosaic image by taking the distance from the center of the image and dividing by the maximum distance from the center. This will create a gradient that we can use to blend the images together.

By stacking each image’s distance transform, and normalizing the values by channel, we can now blend the images together to create a seamless mosaic using the channel-wise values as alpha values in a weighted sum.

The warping process can be used to rectify images as well. By using the homography matrix to warp the images, we can correct for perspective distortion. This is useful in cases where the camera is not perfectly aligned with the scene, and we want to correct for the distortion. The applications are truly boundless, but one scenario where it is useful is when you must take a picture of an important document, but you are unable to take it head-on. By taking a picture at an angle, the document will appear distorted, and non-legible. By using the homography matrix to warp the image, we can correct for this distortion and make the document legible.

We start by marking the corners of the document, and then the respective location in our new image where these corners should be. In rectification, this is often the corners of the new image, but it can be any location. We then calculate the homography matrix, and using this perform an inverse warp over our new image to fill in the pixels from the original image. Interpolation is extremely important here, as we are filling in pixels that may not have a direct correspondence in the original image.

Someone left their secret password on the table, but I wasn’t able to get a good picture. Let us see if we can rectify the image to make it legible.

This example is quite notable because of the orientation of the document. The document was not only at a steep angle, but it was also not facing the camera. By correctly defining the corners of the document as relative to its orientation, it can be correctly projected into the new image and is now legible without having to mirror the text!

I took a picture of a fun Pikachu graphic, and I want to use it as my background! But the picture is at an angle, and I want to rectify it to make it straight.

Rectified picture of the Pokemon graphic

</div>

In this section, we computed a Harris matrix for our image, and chose the peaks with a limitation of a growing maximum in a region until we had a reasonable number of peaks. This was necessary to keep our values feasible for vectorization. I found around 45,000 to be the vectorization limit, and about 10,000 was a good compromise between speed and accuracy.

We then used the ANMS algorithm to select the best corners. This was done by computing the minimum radius before suppression and keeping the top `n`

with the highest radius. There is an optional robustness variable that helps limit suppression, where a value of 1 enforces separation between points but loses some strong corners.

Now we can sample a 40x40 region around corners and downsample to 8x8 to get a descriptor for each corner. We assume the images are already axis-aligned.

Here are the descriptors after normalizing and standardizing the values, making them more robust to lighting changes and other factors.

We compute the difference between how well a descriptor matches to its closest match and its second-closest match. This helps determine if a match is good, filtering out bad matches.

Finally, we use RANSAC to find a good homography matrix, which allows us to warp the images together and blend them into a seamless mosaic.

The coolest thing I learned in this project was definitely the rectification process. It was fascinating to see how a skewed image of a document could be rectified to make it legible—something useful in many applications!