For this section, we focus on reconstructing the image with four objects. From the conclusion in section two, we will use SIFT+DSP-SIFT and images 1 and 2 for reconstruction.
First, we use the n_view_matching to compute consistent point matchings among the views. As can be seen from the collage of the images below, the point locations and features were obtained from the previously performed experiment. The detected points from this function are the ones to be reconstructed.
In this section, the reconstruction was calculated using images 1 and 2 as it has the most number of matching points.
As observed, the projective reconstruction obtained a reprojection error of 0.1. We expect a small error here as this only includes two cameras and we pick a pair that has a small angle difference.
In this section, we use all images to improve the reconstruction. To implement this method, we do a resectioning step. The goal of resection is to obtain the projection matrices of each view.
As compared to the obtained reprojection error in the first step, we get a higher value of error after resectioning. This is due to the fact the additional views. The next method is the projective bundle adjustment.
After this method, we expect the error to be smaller like the initial computation of error as we correct them based on all cameras. The error went down by approximately 93% which is a good indication that the bundle adjustment method is efficient.
In this section, we use the properties of the essential matrix between two cameras to get the euclidean reconstruction of the images. With the essential matrix, there will be four solutions where the first camera is fixed while the second one varies. After obtaining this, the final reconstruction will be computed as a cloud of points.
As mentioned before, the challenge here is not the illumination changes but the texture of the floor that can be treated as another object or a continuation of the pink box (the image on the image has the same texture and color as the floor). From the figure below, we see the cloud of points that are arranged just like the one in the picture with the two camera views. For the purpose of this report, we zoom in to the points to see the grouping of the points. If you observe well, the items are well-differentiated into clusters.
Checking the previous images, the 3D reconstruction is quite precise, meaning it shows the correct location and distribution of points. It differentiated also the angle between the two layers (first layer: boxes; second layer: mask and album on the floor). Technically, the second layer is perpendicular to the first layer of objects. From the projected scene, it was exhibited.
We try to improve the reconstruction by removing the cameras and viewing it closer. In this reconstruction, the layers that were mentioned before are more obvious. The clustering of points is well grouped as well.
Just out of curiosity, I remember in theory class we talked about COLMAP which is general-purpose pipeline with graphical and command-line interface. I got curious and tried to reconstruct my images using it. Interestingly, it was able to reconstruct it well by defining the edges too with the precise location of the object with respect to each other. Looking closer, the mask was well structured as observed from the reconstructed image. What actually amazed me is it was able to get the points of the mask to be able to structure it with a bulge. In just a matter of seconds, this was already generated using COLMAP.