For this section, we aim to have several views of a single scene. Using detection, description, and feature matching, we use pairs of images. With the use of the calibrated camera, the following images were taken in order.
Four items can be seen, which are both in different colors to provide more various textures. Shapes are also different from each other. The background is a bit divided into white (cabinet) and brown (floor), something that is contrasting the objects. These choices were done to aid the matching. In addition, the images were downscaled to 0.5 to decrease the use of computational resources. As expected, this had an effect on the number of points that will be detected. The more we decrease the scale of the image, the less points we will get. Although, this can be balanced by parameters such as octaves and scales. The final parameters used are 8 and 15, respectively.
For this task, we will observe four combinations: DoH + SIFT, SURF + SURF, KAZE + KAZE, SIFT + DSP-SIFT.
Also, two pairs of images will be tested and shown on the report but all possible pairs were analyzed (Images 1&2, 1&4).
For this blog post, only pair 1 (Images 1&2) is shown.
From the image above, we see that the combination SIFT + DSP SIFT was the most effective getting more points. We further prove this by quantifying the inliers shown in the table below:
Combination | Detected | Matched | Ratio |
---|---|---|---|
DoH + SIFT | 61 | 31 | 0.5082 |
SURF + SURF | 207 | 103 | 0.5024 |
KAZE + KAZE | 207 | 104 | 0.5024 |
SIFT + DSP-SIFT | 1617 | 809 | 0.5000 |
Other combinations were also able to match points but they were way lesser which we know won’t be helpful for reconstruction. To observe the performance of the inliers, we get their respective inliers. From the ratio results, we see that their performance is similar. The differences lie in the detected points.
Another evaluation was executed by checking if the epipolar line will cross the same point on the other image. We can challenge this evaluation by choosing a specific point where they do not have matching points. Therefore, we chose the corner of the small square and for uniformity, this point was checked through all the combinations. Due to the insignificant translation and rotation, the fundamental matrix performed well.
The warped images as shown above were able to show decent solutions in all cases showing the right projection as expected.
There is not much illumination variation with the used dataset. Although, the challenge comes with the texture of the floor. As we aim to reconstruct the objects found, the detector might not be able to differentiate them. Although checking the matching points, most points are focused on the objects. To verify this, we will see in the next section if this challenge is visible during the reconstruction.
The definition of a good pair for this section is the combination that maximizes the number of good matches and detected points as we need more points for reconstruction. To summarize what have we done for this section, the combinations of descriptors and detectors, along with their results, are summarized in the following table:
Combo | Quality | Quantity |
---|---|---|
DoH + SIFT | - | - |
SURF + SURF | - | + |
KAZE + KAZE | - | + |
SIFT + DSP-SIFT | + | + |
To proceed to the next section, we will use SIFT and DSP-SIFT combination. The chosen pair for the reconstruction is also Images 1 and 2 since they have the most number of matching points. Despite the limited and rescaled photos, we observe a decent amount of matching points that will be hopefully enough for reconstruction.