(Auto)Stitching and Photo Mosaics

Part of CS180 : Intro to Computer Vision and Computational Photography

Photos

In the process of attaining photos that will be used for photo mosaics, I took a variety of photos of landscapes and landmarks. These photos were taken in different locations, but followed the same principle to keep the lens as fixed as possible and take multiple photos whilst rotating the camera.

Mission San Francisco
Left view of Mission San Francisco
Left view of Mission San Francisco
Middle view of Mission San Francisco
Middle view of Mission San Francisco
Right view of Mission San Francisco
Right view of Mission San Francisco
Top Right view of Mission San Francisco
Top Right view of Mission San Francisco
Pong
Left view of Pong landscape
Left view of Pong landscape
Center view of Pong landscape
Center view of Pong landscape
Right view of Pong landscape
Right view of Pong landscape
Bay Bridge
View 2 of Bay Bridge
View 2 of Bay Bridge
View 1 of Bay Bridge
View 1 of Bay Bridge
View 3 of Bay Bridge
View 3 of Bay Bridge
View 4 of Bay Bridge
View 4 of Bay Bridge
Tokyo Tower
Tower view of Tokyo Tower
Tower view of Tokyo Tower
Bridge view of Tokyo Tower
Bridge view of Tokyo Tower

Recovering Homography

To stitch the photos, we first need to determine the relationship between them. This is done by calculating the homography, a 3x3 matrix that describes the transformation between two images. We compute the homography by using correspondence points between both images, manually (for now) marked locations that are the same between both images. By constructing a system of equations, we can solve for the homography matrix.

$$ \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix} $$

Because we are using projective geometry, we will set \(i = 1\), which enforces consistent scaling, and solve for the remaining 8 unknowns. We can now solve for these 8 unknowns, let us start by expanding the above equation:

$$ \begin{align*} ax + by + c = wx' \\ dx + ey + f = wy' \\ gx + hy + 1 = w \\ \end{align*} $$

We can now solve for the unknowns by constructing a system of equations. We can do this because our 3rd equation contains just w on the right side, meaning we can substitute the other two equations.

$$ \begin{align*} ax + by + c = (gx + hy + 1) x' \\ dx + ey + f = (gx + hy + 1) y' \\ \\ \text{And further simplifying, we get:} \\ \\ ax + by + c - gxx' - hyx' = x' \\ dx + ey + f - gxy' - hyy' = y' \\ \end{align*} $$

Now we can construct a matrix equation to solve for the unknowns:

$$ \begin{align*} &\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \end{align*} $$

With 8 unknowns, and 2 equations per correspondence point, we need at least 4 correspondence points to solve for the homography matrix. Using least squares, we can solve for the homography matrix with additional points to reduce errors from minor misalignments.

Here is an example of what the point correspondences used to calculate the homography matrix look like on the images:

Bay Bridge 1 and 2 Correspondence Points
Bay Bridge 1 and 2 Correspondence Points

Warping Images

Homography matrices are useful to warp all the points from one image to another. Going forward, we will refer to images in their own coordinate system, with the origin at the top left corner, as well as the mosaic coordinate system, which is a grid large enough to contain all images in their warped form. The mosaic coordinate system relies on the presence of a root image, which all other images will be warped to. In my implementation, I constructed a graph of where images were connected by their provided correspondence points, and used the closeness centrality metric to determine which image had the smallest distance to all other images. This image was then chosen as the root image.

For each image that would be warped, warp_im, I computed the shortest path to the root image, root_im. Then by traversing from warp_im to root_im, I could construct a homography matrix that would warp from warp_im to the intermediary nodes, and finally to root_im.

Once the homography matrices were computed, I could apply inverse warping from the pixels within the polygon that bounded the warp_im in the mosaic coordinate system, and interpolate the pixel values to fill in the mosaic image.

Results

Mission San Francisco
Left view of Mission San Francisco in mosaic coordinates
Left view of Mission San Francisco in mosaic coordinates
Middle view of Mission San Francisco in mosaic coordinates
Middle view of Mission San Francisco in mosaic coordinates
Right view of Mission San Francisco in mosaic coordinates
Right view of Mission San Francisco in mosaic coordinates
Top Right view of Mission San Francisco in mosaic coordinates
Top Right view of Mission San Francisco in mosaic coordinates
Pong
Left view of Pong landscape in mosaic coordinates
Left view of Pong landscape in mosaic coordinates
Center view of Pong landscape in mosaic coordinates
Center view of Pong landscape in mosaic coordinates
Right view of Pong landscape in mosaic coordinates
Right view of Pong landscape in mosaic coordinates
Bay Bridge
View 2 of Bay Bridge in mosaic coordinates
View 2 of Bay Bridge in mosaic coordinates
View 1 of Bay Bridge in mosaic coordinates
View 1 of Bay Bridge in mosaic coordinates
View 3 of Bay Bridge in mosaic coordinates
View 3 of Bay Bridge in mosaic coordinates
View 4 of Bay Bridge in mosaic coordinates
View 4 of Bay Bridge in mosaic coordinates
Tokyo Tower
Tower view of Tokyo Tower in mosaic coordinates
Tower view of Tokyo Tower in mosaic coordinates
Bridge view of Tokyo Tower in mosaic coordinates
Bridge view of Tokyo Tower in mosaic coordinates

Blending

It is a good check to ensure that the images are correctly positioned atop each other at this point. If we simply change the alpha values and combine, we can see that our warping does work.

Bay Bridge 1 and 2 Overlap
Bay Bridge 1 and 2 Overlap

However, simply overlaying the images will not suffice. We need to blend the images together to create a seamless mosaic. To do this, we can use a simple linear blending technique. We can calculate the alpha value for each pixel in the mosaic image by taking the distance from the center of the image and dividing by the maximum distance from the center. This will create a gradient that we can use to blend the images together.

Mission San Francisco
Left view of Mission San Francisco's euclidean distance mask
Left view of Mission San Francisco's euclidean distance mask
Middle view of Mission San Francisco's euclidean distance mask
Middle view of Mission San Francisco's euclidean distance mask
Right view of Mission San Francisco's euclidean distance mask
Right view of Mission San Francisco's euclidean distance mask
Top Right view of Mission San Francisco's euclidean distance mask
Top Right view of Mission San Francisco's euclidean distance mask
Bay Bridge
View 2 of Bay Bridge's euclidean distance mask
View 2 of Bay Bridge's euclidean distance mask
View 1 of Bay Bridge's euclidean distance mask
View 1 of Bay Bridge's euclidean distance mask
View 3 of Bay Bridge's euclidean distance mask
View 3 of Bay Bridge's euclidean distance mask
View 4 of Bay Bridge's euclidean distance mask
View 4 of Bay Bridge's euclidean distance mask

By stacking each image’s distance transform, and normalizing the values by channel, we can now blend the images together to create a seamless mosaic using the channel-wise values as alpha values in a weighted sum.

Results

Mission San Francisco Mosaic
Mission San Francisco Mosaic
Pong Mosaic
Pong Mosaic
Bay Bridge Mosaic
Bay Bridge's Mosaic
Tokyo Tower Mosaic
Tokyo Tower Mosaic

Rectification

The warping process can be used to rectify images as well. By using the homography matrix to warp the images, we can correct for perspective distortion. This is useful in cases where the camera is not perfectly aligned with the scene, and we want to correct for the distortion. The applications are truly boundless, but one scenario where it is useful is when you must take a picture of an important document, but you are unable to take it head-on. By taking a picture at an angle, the document will appear distorted, and non-legible. By using the homography matrix to warp the image, we can correct for this distortion and make the document legible.

We start by marking the corners of the document, and then the respective location in our new image where these corners should be. In rectification, this is often the corners of the new image, but it can be any location. We then calculate the homography matrix, and using this perform an inverse warp over our new image to fill in the pixels from the original image. Interpolation is extremely important here, as we are filling in pixels that may not have a direct correspondence in the original image.

Results

Someone left their secret password on the table, but I wasn’t able to get a good picture. Let us see if we can rectify the image to make it legible.

Secret Password
Original picture of the secret password
Original picture of the secret password
Rectified picture of the secret password
Rectified picture of the secret password
Comparison of the original and rectified secret password
Comparison of the original and rectified secret password

This example is quite notable because of the orientation of the document. The document was not only at a steep angle, but it was also not facing the camera. By correctly defining the corners of the document as relative to its orientation, it can be correctly projected into the new image and is now legible without having to mirror the text!

I took a picture of a fun Pikachu graphic, and I want to use it as my background! But the picture is at an angle, and I want to rectify it to make it straight.

Pokemon Graphic
Original picture of the Pokemon graphic
Original picture of the Pokemon graphic
Rectified picture of the Pokemon graphic
Rectified picture of the Pokemon graphic

</div>

Comparison of the original and rectified Pokemon graphic
Comparison of the original and rectified Pokemon graphic

Detecting Corner Features

In this section, we computed a Harris matrix for our image, and chose the peaks with a limitation of a growing maximum in a region until we had a reasonable number of peaks. This was necessary to keep our values feasible for vectorization. I found around 45,000 to be the vectorization limit, and about 10,000 was a good compromise between speed and accuracy.

Tokyo Tower Harris Values
Harris values for Tokyo Tower
Tokyo Bridge Harris Values
Harris values for Tokyo Bridge

ANMS

We then used the ANMS algorithm to select the best corners. This was done by computing the minimum radius before suppression and keeping the top n with the highest radius. There is an optional robustness variable that helps limit suppression, where a value of 1 enforces separation between points but loses some strong corners.

Tokyo Tower ANMS Robustness Comparison
ANMS Robustness Comparison on Tokyo Tower
Tokyo Bridge ANMS Robustness Comparison
ANMS Robustness Comparison on Tokyo Bridge

Corner Features

Now we can sample a 40x40 region around corners and downsample to 8x8 to get a descriptor for each corner. We assume the images are already axis-aligned.

Tokyo Tower Corner Features
Tokyo Tower Corner Features
Tokyo Bridge Corner Features
Tokyo Bridge Corner Features

Here are the descriptors after normalizing and standardizing the values, making them more robust to lighting changes and other factors.

Tokyo Bridge Corner Features Normalized/Standardized
Tokyo Bridge Corner Features (Normalized/Standardized)
Tokyo Tower Corner Features Normalized/Standardized
Tokyo Tower Corner Features (Normalized/Standardized)

Matches

We compute the difference between how well a descriptor matches to its closest match and its second-closest match. This helps determine if a match is good, filtering out bad matches.

Tokyo Tower and Bridge Matches
Feature Matches between Tokyo Tower and Bridge

RANSAC

Finally, we use RANSAC to find a good homography matrix, which allows us to warp the images together and blend them into a seamless mosaic.

Tokyo Tower RANSAC
RANSAC Result for Tokyo Tower
Tokyo Tower Mosaic
Tokyo Tower Mosaic
Bay Bridge RANSAC
RANSAC Result for Bay Bridge
Bay Bridge Mosaic
Bay Bridge Mosaic
Mission San Francisco RANSAC
RANSAC Result for Mission San Francisco
Mission San Francisco Mosaic
Mission San Francisco Mosaic

The coolest thing I learned in this project was definitely the rectification process. It was fascinating to see how a skewed image of a document could be rectified to make it legible—something useful in many applications!