|
1.IntroductionDense stereo matching is one of the most challenging problems in the field of computer vision. It is an important requirement for many applications, such as three-dimensional (3-D) reconstruction and virtual view synthesis. Generally, the purpose of stereo matching is to find the corresponding pixels between the stereo image pairs captured by two or more cameras in the same scene, and get the disparity map composed by the coordinate difference of corresponding pixels in the stereo image pair. There are plenty of algorithms available to solve the dense stereo problem, the choice of which depends on whether you want to get the area-based solution by a global method or local method. Stereo matching algorithms can be classified as either global or local. The typical global algorithms, such as graph cuts,1 belief propagation,2,3 and dynamic programming,4,5 can generate a dense disparity map precisely based on global energy function and suitable constraints. However, graph cuts and belief propagation usually consume a great deal of time and memory, and dynamic programming needs specific constraints at different times. Local matching algorithms are known for their simplicity and efficiency, and they also can achieve disparity more accurately. The basic idea of local matching is to estimate the disparity of a pixel in the target image by correlating a support window around the pixel with a similar support window in the reference image. One of the typical local matching algorithms is adaptive support weight (ASW), proposed by Yoon.6 The method in Ref. 6 adopts the fixed-size square windows and allocates a support weight to each pixel in the window according to pixel color and position similarities. The disparity map generated by Ref. 6 can get perfect effects similar to that obtained by global algorithms. The gradient information can indicate the variation between neighboring pixels and the structure of the information,7 as well as decreasing the noise presented in the disparity map more. The method that uses the gradient similarity and local ASW to compute the disparity is proposed.8 Considering that the information will be lost when converting the stereo images from RGB vector space to the CIELab color space, the ASW approach in RGB vector space is proposed.9 The gradient similarity is used to compute the support weight in Ref. 9. However, the difficulties in stereo matching are still at the boundaries of objects and fine texture area, which can be reflected by the high-frequency information. In this paper, we propose to utilize the illumination normal similarity of the two-dimensional (2-D) gray image to compute the support weight based on the ASW in RGB vector space. The experimental results prove that the proposed method can improve the accuracy of the disparity map. This paper is organized as follows. Section 2 gives the definition of illumination normal in the image space. Section 3 provides a specific explanation for the proposed method. In Sec. 4, experimental results of the proposed method are compared to that of other methods. Conclusions and future work are provided in Sec. 5. 2.Illumination Normal of Pixels in a 2-D Image PlaneA normal vector almost exists for each point of the object in 3-D space. Given a 2-D gray image, the gray value of every pixel can reflect the illumination information of the object. In order to obtain the illumination normal vector of pixels in a 2-D image, each pixel of the image is regarded as a point in 3-D space. This can be expressed as , where and are the horizontal and vertical coordinates, respectively, and is the pixel value at the position . The current point and the points located below and to the right of it are used to compute its normal vector. Figure 1 illustrates how the illumination normal vector is calculated. Point is the current point, and are the neighboring points used to compute the normal vector of point . The 3-D vectors from to and from to are computed as follows: The illumination normal vector of point is obtained by the cross-product of and : Normalize the illumination normal vector of point : whereThe modulus images of the illumination normal vector of the image pairs, which are used to analyze the illumination normal similarity of the image pairs, are shown in Fig. 2. The features of the illumination normal vector in Fig. 2(b) and 2(d) reflect the high-frequency information of the gray image pairs. The high-frequency information reflects some small-scale details of the image, which is useful for searching the matching pixels in the stereo pair. In this paper, the character mentioned above is utilized, and the illumination normal similarity of the gray image is combined into the ASW method to compute the weights in the support window. 3.Proposed AlgorithmTo assign the support weight more accurately for each pixel in the support window, the similarity measurements are considered. Geng et al.9 adds the gradient similarity in RGB vector space to the gestalt group proposed by Yoon.6 Here, we propose to compute the support weight by a number of multi-similarity measurements, including color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity. The support weight of a pixel in a support window can be expressed by where , are the pixels in the reference image, which have the RGB components; is the pixel in the support window centered at ; , , and are the color difference, spatial distance, and gradient difference between pixel and , respectively; is the illumination normal difference between and ; and are the gray values of and , respectively; and , , and are all constant. , , and are calculated byThe weights calculated for the pixels in the window between the reference image and the target image are combined in the aggregation step. The dissimilarity can be expressed by where and are the corresponding pixels in the target image for and with the disparity ; and are the support windows centered at and , respectively; and is the pixel-based matching cost between and , which is obtained by Here, is the color difference, is the gradient difference, and is the illumination normal difference, as follows: where , , , and are obtained by Eqs. (9)–(12), and , , , and are constant.Find the best disparity of pixel by maximizing the dissimilarity function : where is the search range of possible disparities, which is variable for different image pairs.In order to refine the disparity, the consistency check is used to detect matching errors, as follows: where is the disparity of the pixels regarding the left image as the reference image and is the disparity of the pixels regarding the right image as the reference image. Here, and are computed separately. The pixels that fail during the consistency check are classified as bad. The support weight for each neighboring pixel in the fixed-size support window centered on the bad pixel is recomputed using the proposed method. The disparity of the pixel with the largest support weight when recomputed is considered to be the disparity of the bad pixel.4.Experimental Results4.1.Performance ComparisonThe stereo image pairs “tsukuba,” “venus,” “teddy,” and “cones,” which are provided by the Middlebury stereo benchmark, were used in our experiments. The size of the support window was fixed at , and the constants , , , , , ,9 , and were fixed for all the test stereo image pairs. To evaluate the proposed algorithm, we obtained the ground truth provided by Scharstein and Szeliski10 and the disparity maps of the ASW method by Yoon6 in the Middlebury stereo benchmark. The subjective quality comparison of the disparity maps is shown in Fig. 3. Figure 3(a) and 3(b) are the color image and the ground truth, respectively; while Fig. 3(c), 3(e), 3(g) and Fig. 3(d), 3(f), and 3(h) are the disparity maps and the bad-pixel images of disparity maps produced by our algorithm, ASW,6 and ASW-RGB,9 respectively. The error threshold in our experiment was 0.5. We found that the smaller the area of gray and black was, the more accurate the disparity map was. Figure 3(d), 3(f), and 3(h) shows that the disparity map of our algorithm is more accurate than those of ASW6 and ASW-RGB.9 In order to measure the objective quality of the disparity map, the Middlebury stereo benchmark provides the quality metrics to evaluate the generated disparity map, which can be separated into three parts: all pixels (“all”), nonoccluded regions (“nonocc”), and pixels near depth discontinuities (“disc”). When the absolute difference between generated disparity and ground truth is less than , the generated disparity value can be considered correct. Tables 1 and 2 show two cases of and . To evaluate the proposed algorithm objectively, it is compared to the results of the other local matching ASW methods.6,8,9,11–13 The comparison of results is shown in Tables 1 and 2, and the results of the proposed algorithm (ASW-MS) improve the matching accuracy by different degrees. Table 1Performance comparison of the proposed method with the Middlebury stereo benchmark (error threshold: 1.0).
Table 2Performance comparison of the proposed method with the Middlebury stereo benchmark (error threshold: 0.5).
4.2.Influence of the Illumination NormalIn order to analyze the influence of the illumination normal in the algorithm, the experiment with illumination normal (ASW-MS) and without illumination normal (ASW-MS-outN) were tested. The comparison results are shown in Tables 3 and 4. The data in these tables are the percentages of the bad pixels. For a more accurate disparity map, the smallest percentage of bad pixels is needed. Tables 3 and 4 show that having the illumination normal similarity can get a more accurate result than being without the illumination normal similarity. Table 3Performance comparison of the proposed method with and without illumination normal in the Middlebury stereo benchmark (error threshold: 1.0).
Table 4Performance comparison of the proposed method with and without illumination normal in the Middlebury stereo benchmark (error threshold: 0.5).
4.3.Performance Analysis of the Proposed Method with Different Sizes of the Support WindowThe size () of the support window of the proposed method is the same as that of Refs. 8, 9, and 12. In order to compare our results to Refs. 6 and 11, which used different sizes of support window, we tested the proposed method using the same sizes of support window as those studies used. The other relevant constants in the algorithm are the same as in the case of the support window. The size of the support window in our algorithm is odd and the central point of the window is a pixel, while the size () of the support window in Ref. 13 is even and there is no pixel at the central point of the window, so the comparison result between Ref. 13 and the proposed algorithm is not listed. Tables 5 and 6 show the comparative results between the proposed method and that of Refs. 6 and 11 with the size (, ) of the support window and error threshold 1.0 and 0.5, respectively. The results show that the size of the support window can affect the matching precision when the other relevant constants are fixed in the algorithm. When the error threshold is 1.0, the average percentage of bad pixels of the proposed method is less than that of other references except Ref. 11. For error threshold 0.5, however, all results of the proposed method are better than those of Refs. 6 and 11. Table 5Performance comparison of the proposed method with Refs. 6 and 11 in terms of the size (33×33, 21×21) of the support window in the Middlebury stereo benchmark (error threshold: 1.0).
Table 6Performance Comparison of the proposed method with Refs. 6 and 11 in terms of the size (33×33, 21×21) of the support window in the Middlebury stereo benchmark (error threshold: 0.5).
5.Conclusions and Future WorkIn this paper, based on the multi-similarity measure, we present a new ASW matching algorithm that includes color similarity, Euclidean distance similarity, gradient similarity, and illumination normal similarity. The experimental results show that the algorithm proposed here can improve the matching precision compared to other local ASW matching algorithms. In future research, we plan to investigate other similarity measures to improve our method further. AcknowledgmentsThis work was supported by the National Natural Science Foundation of China under Grant Nos. 61271315 and 61171078, and in part by the Research Fund for Doctorial Program of Higher Education of China under Grant No. 20110061110084. ReferencesL. HongG. Chen,
“Segment-based stereo matching using graph cuts,”
in Proc. IEEE Conf. Comp. Vision Patt. Rec.,
I74
–I78
(2004). Google Scholar
Q. YangL. WangN. Ahuja,
“A constant-space belief propagation algorithm for stereo matching,”
in IEEE Conf. Comp. Vision Patt. Rec.,
1458
–1465
(2010). Google Scholar
J. SunN.-N. ZhengH.-Y. Shum,
“Stereo matching using belief propagation,”
IEEE Trans. Pattern Anal. Machine Intell., 25
(7), 787
–800
(2003). http://dx.doi.org/10.1109/TPAMI.2003.1206509 ITPIDJ 0162-8828 Google Scholar
C. LeiJ. SelzerY. Yang,
“Region-tree based stereo using dynamic programming optimization,”
in IEEE Comp. Soc. Conf. Comp. Vision Pattern Recogn.,
2378
–2385
(2006). http://dx.doi.org/10.1109/CVPR.2006.251 Google Scholar
X. Changet al.,
“Real-time accurate stereo matching using modified two-pass aggregation and winner-take-all guided dynamic programming,”
in Intl. Conf. 3D Imag., Model., Process., Visual., and Trans.,
73
–79
(2011). Google Scholar
K.-J. YoonI.-S. Kweon,
“Adaptive support-weight approach for correspondence search,”
IEEE Trans. Pattern Anal. Mach. Intell., 28
(4), 650
–656
(2006). http://dx.doi.org/10.1109/TPAMI.2006.70 ITPIDJ 0162-8828 Google Scholar
I. ParkH. Byun,
“Depth map refinement using multiple patch-based depth image completion via local stereo warping,”
Opt. Eng., 49
(7), 077003
(2010). http://dx.doi.org/10.1117/1.3463013 OPEGAR 0091-3286 Google Scholar
L. De-MaeztuA. VillanuevaR. Cabeza,
“Stereo matching using gradient similarity and locally adaptive support-weight,”
Pattern Recogn. Lett., 32
(13), 1643
–1651
(2011). http://dx.doi.org/10.1016/j.patrec.2011.06.027 PRLEDG 0167-8655 Google Scholar
Y. GengY. ZhaoH. Chen,
“Stereo matching based on adaptive support-weight approach in RGB vector space,”
J. Appl. Opt., 51
(16), 3538
–3545
(2012). http://dx.doi.org/10.1364/AO.51.003538 JOAOF8 1464-4258 Google Scholar
D. ScharstereinR. Szeliski,
“A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,”
in Proc. IEEE Work. Stereo Multi-Baseline Vision,
131
–140
(2002). Google Scholar
Z. Guet al.,
“Local stereo matching with adaptive support-weight, rank transform and disparity calibration,”
Pattern Recogn. Lett., 29
(9), 1230
–1235
(2008). http://dx.doi.org/10.1016/j.patrec.2008.01.032 PRLEDG 0167-8655 Google Scholar
W. Huet al.,
“Virtual support window for adaptive-weight stereo matching,”
in Vis. Commun. Image Proc.,
1
–4
(2011). Google Scholar
L. NalpantidisA. Gasteratos,
“Biologically and psychophysically inspired adaptive support weight algorithm for stereo correspondence,”
Robot. Auton. Syst., 58
(5), 457
–464
(2010). http://dx.doi.org/10.1016/j.robot.2010.02.002 RASOEJ 0921-8890 Google Scholar
BiographyKai Gao received a BS degree in electronics and information engineering from Changchun University of Science and Technology in 2006, an MS degree in detection technology and automation devices from Changchun University of Science and Technology in 2009. Now he is pursuing his PhD at the College of Communication Engineering at Jilin University. His research interests include image and video coding, stereo matching, and virtual view synthesis. He-xin Chen received MS and PhD degrees in communication and electronics in 1982 and 1990, respectively, from the Jilin University of Technology. He was a visiting scholar at the University of Alberta from 1987 to 1988. In 1993, he was a visiting professor at the Tampere University of Technology in Finland. He currently is a professor of communication engineering at Jilin University. His research interests include image and video coding, multidimensional signal processing, image and video retrieval, and audio and video synchronization. Yan Zhao received a BS degree in communication engineering in 1993 from Changchun Institute of Posts and Telecommunications, an MS degree in communications and electronics in 1999 from the Jilin University of Technology, and a PhD in communications and information systems in 2003 from the Jilin University. She was a postdoctoral researcher at the Digital Media Institute of the Tampere University of Technology in Finland in 2003. In 2008, she was a visiting professor at the Institute of Communications and Radio-Frequency Engineering at the Vienna University of Technology. She currently is an associate professor of communication engineering. Her research interests include image and video coding, multimedia signal processing, and error concealment for audio and video transmitted over unreliable networks. She is a member of IEEE. Ying-nan Geng received BS and MS degrees at the College of Communication Engineering in Jilin University. At present, she is working toward her PhD at the College of Communication Engineering in Jilin University. Her research interests are stereo matching, image and video coding, and virtual view synthesis. Gang Wang received a BS degree in electronics engineering from Changchun University of Technology in 1999, and an MS degree in signal processing from Jilin University in 2005. Now he is pursuing his PhD at the College of Geo-Exploration Science and Technology of Jilin University. His research interests include wireless communication application on geo-exploration and hyperspectral image communication. |