Integrating affinity propagation clustering method with linear discriminant analysis for face recognition

Chunhua Du; Jie Yang; Qiang Wu; Feng Li

doi:10.1117/1.2801735

1 November 2007 Integrating affinity propagation clustering method with linear discriminant analysis for face recognition

Chunhua Du, Jie Yang, Qiang Wu, Feng Li

Author Affiliations +

Optical Engineering, Vol. 46, Issue 11, 110501 (November 2007). https://doi.org/10.1117/1.2801735

Abstract

The Fisherface method suffers from the problem of using all training face images to recognize a test face image. To tackle this problem, we propose combining a novel clustering method, affinity propagation (AP), recently reported in the journal Science, with linear discriminant analysis (LDA) to form a new method, AP-LDA, for face recognition. By using AP, a representative face image for each subject can be obtained. Our AP-LDA method uses only these representative face images rather than all training images for recognition. Thus, it is more computationally efficient than Fisherface. Experimental results on several benchmark face databases also show that AP-LDA outperforms Fisherface in terms of recognition rate.

1. Introduction

Face recognition (FR) has recently received extensive attention due to its broad applications. Many techniques have been developed over the past decades for FR. Subspace analysis is one of the most efficient techniques. The most popular subspace techniques are principal component analysis (PCA),¹ linear discriminant analysis (LDA),² and independent component analysis (ICA).³ Besides directly processing image appearance, subspace methods can also be combined with the Gabor feature⁴ to derive the Gabor-based methods.^{5, 6, 7} However, the recognition time of these methods depends heavily on the size of the training set because a test image is compared to all training images. Such a recognition scheme is not efficient, especially when the training set is too large. Thus, we propose integrating a novel clustering method, affinity propagation (AP),⁸ with LDA to form AP-LDA for FR. By using AP on the low-dimensional features obtained from LDA, a representative face image for each subject can be achieved. Thus, AP-LDA needs to use only representative face images rather than all training face images for recognition.

We combine AP with LDA for the following reasons: AP can cluster data points into different clusters and detect a representative example for each cluster. By using AP, we intend to achieve a representative face image for each subject. If so, we need to use only representative face images for recognition. However, directly using AP on gray pixel values is computationally expensive and inefficient. For example, the difference between two pixels on the same positions in two images of the same subject is obvious when the illumination is different. However, they should be considered to be similar to each other in fact. Thus, more discriminating and efficient low-dimensional features should be extracted before using AP for clustering. Therefore, LDA is adopted, as it can not only be used for dimensionality reduction but also extract discriminative features.

2. Review of the AP Clustering Method and the LDA Method

2.1.

AP Clustering Method

AP⁸ first builds a similarity matrix $s$ , in which $s (i, k)$ between data points $x_{i}$ and $x_{k}$ is their negative Euclidean distance $[s (i, k) = - ∥ x_{i} - x_{k} ∥^{2}]$ . Before clustering, each data point also needs to be assigned a number $P (B)$ , which describes the a priori knowledge of how good point $B$ is as a representative example. The data points with larger values of $P (B)$ are more likely to be chosen as representative examples. These values are referred to as preferences. In fact, the probability of each point being the representative example is the same; thus, the preferences should be set to the same value, which can be varied to produce different numbers of clusters. Generally, such a value takes the median of the $s$ . After the construction of a similarity matrix and the setting of preferences, two kinds of messages (responsibility and availability) are passed between data points. The responsibility $r (i, k)$ , sent from $x_{i}$ to candidate representative example $x_{k}$ , reflects the accumulated evidence for how proper it would be for $x_{k}$ to serve as the representative example for $x_{i}$ . It is updated using the rule:

Eq. 1

r (r, k) \leftarrow s (i, k) - \underset{k^{'} s . t . k^{'} \neq k}{\max [a (i, k^{'}) + s (i, k^{'})]} .

The availability $a (i, k)$ , sent from candidate representative example $x_{k}$ to $x_{i}$ , reflects the accumulated evidence for how well-suited $x_{i}$ is to choose $x_{k}$ as its representative example. It is computed by the rule:

Eq. 2

a (i, k) \leftarrow \min {0, r (k, k) + \sum_{i^{'} s . t . i^{'} \notin {i, k}} [0, r (i^{'}, k)]} .

It is clear that availabilities and responsibilities can be combined to recognize representative examples at any time. For $x_{i}$ , the $k$ that maximizes $a (i, k) + (i, k)$ indicates that $x_{k}$ serves as the representative example for $x_{i}$ .

2.2.

LDA Method

Suppose there is a set of $N$ $d$ -dimensional samples ${x_{1}, x_{2}, \dots, x_{N}}$ belonging to $c$ classes ${X_{1}, X_{2}, \dots, X_{N}}$ . LDA² aims to find $W_{o p t}$ that maximizes the ratio of the between-class scatter matrix to the within-class scatter matrix, i.e.,

Eq. 3

W_{o p t} = \arg \max_{W} \frac{∣ W^{T} S_{B} W ∣}{∣ W^{T} S_{W} W ∣} = (w_{1}, w_{2}, \dots, w_{m}),

where

S_{B}

and

S_{W}

are the between-class matrix and the within-class scatter matrix:

Eq. 4

S_{B} = \sum_{i = 1}^{c} N_{i} (μ_{i} - μ) {(μ_{i} - μ)}^{T},

Eq. 5

S_{W} = \sum_{i = 1}^{c} \sum_{x_{k} ∊ X_{i}} (x_{k} - μ_{i}) {(x_{k} - μ_{i})}^{T},

where

μ_{i}

is the mean of class

X_{i}

,

μ

is the total sample mean, and

N_{i}

is the size of the data points in

X_{i}

.

(w_{1}, w_{2}, \dots, w_{m})

are generalized eigenvectors of

S_{B}

and

S_{W}

corresponding to the

m

largest generalized eigenvalues

(λ_{i} ∣ i = 1, 2, \dots, m)

, i.e.,

Eq. 6

S_{B} w_{i} = λ_{i} S_{W} w_{i} i = 1, 2, \dots, m .

Note that there are at most $c - 1$ nonzero generalized eigenvalues; thus, the dimension of the reduced space is $c - 1$ . At the same time, to make the within-class scatter matrix $S_{W}$ nonsingular, PCA is first used to reduce the dimension of the feature to $N - c$ , and then LDA is applied to reduce the dimension to $c - 1$ .

3. Summary of AP-LDA

Suppose that there is a face data set of $N$ face images belonging to $c$ different subjects. First, $n$ face images per person (hence, $n \times c$ in total) are selected for training using LDA, obtaining $n \times c$ corresponding $c - 1$ dimensional features. Second, AP is used to cluster these features into different clusters and obtain a representative feature for each cluster. Note that the number of clusters obtained by using AP may not equal the number of subjects, as the former is influenced by the values of preference,⁸ while the latter is fixed. To solve this problem, we repeatedly vary the value of the preferences until two such numbers equal to each other. Last, each test image is converted to a low-dimensional feature, which is then compared to $c$ representative features and identified using a nearest-neighbor classifier. Since AP-LDA uses only $c$ representative features for recognition, the recognition time of AP-LDA depends mainly on the number of subjects. It increases linearly with the $c$ , while the recognition time of the Fisherface method increases with the training image size $(n \times c)$ . Obviously, our AP-LDA method is more computationally efficient than the Fisherface method.

4. Experiments and Discussion

In this section, experiments on three benchmark face data sets [Yale⁹; extended Yale¹⁰; and Pose, Illumination, and Expression (PIE)¹¹] are carried out to show the effectiveness of our AP-LDA method and also to compare it to Fisherface. All face images are cropped based on the centers of eyes such that facial areas contain only the face. All cropped images are then normalized to the size of $32 \times 32$ , with 256 gray levels per pixel. In the following experiments, different numbers of images per subject were randomly selected for training, and the rest were used for testing. To minimize the possible misleading results, the final recognition rates were obtained by averaging the results over five random splits.

The Yale database contains 165 images with 11 different images for each of the 15 distinct subjects. Comparative results are summarized in Table 1. It is clear that AP-LDA obviously outperforms Fisherface when the number of training images per person is more than 3.

Table 1

Comparative results on Yale database.

FR method	Number of training face images per person
FR method	3	4	5	6	7	8
Fisherface	60.17%	69.33%	72.00%	77.33%	79.33%	84.89%
AP-LDA	55.33%	69.90%	76.89%	78.13%	83.33%	87.56%

The Extended Yale database is extended from the Yale database. It contains approximately 64 near-frontal images for each of 38 distinct subjects. We randomly selected 20 images per person for our experiments. Table 2 presents the results. We can see that AP-LDA remarkably outperforms Fisherface in all cases.

Table 2

Comparative results on subset from Extended Yale database.

FR method	Number of training face images per person
FR method	4	5	6	7	8	9	10
Fisherface	59.61%	63.30%	70.08%	71.01%	74.08%	77.18%	80.32%
AP-LDA	65.03%	70.49%	74.40%	76.88%	80.53%	81.24%	84.21%

The CMU Pose, Illumination, and Expression (PIE) database contains 41,368 facial images of 68 individuals. We randomly chose 30 images per person for our experiments. The results are shown in Table 3. Again, AP-LDA consistently outperforms Fisherface.

Table 3

Comparative results on subset from CMU PIE database.

FR method	Number of training face images per person
FR method	4	5	6	7	8	9	10
Fisherface	50.84%	58.09%	63.39%	64.59%	68.69%	68.43%	70.66%
AP-LDA	59.24%	65.96%	70.92%	71.71%	73.68%	72.61%	74.15%

Comparative experimental results show that:

1. The recognition rates of AP-LDA increase with the increase in the number of training face images per person because the more training images per person, the better the representative face images that can be detected.
2. Although AP-LDA performs better than Fisherface on the Yale face data sets in most cases, the improvement is not significant. Moreover, it performs not as well as Fisherface when the number of training images per person is 3. This is because the AP method could not find good representative face images in these cases.
3. AP-LDA remarkably outperforms Fisherface on the Extended Yale and CMU PIE databases. In fact, for these two databases, we chose only a fraction of images for our experiments. We found that our AP-LDA method performed significantly better than Fisherface when all images were used. However, such improved performance was achieved at the expense of increase of computational cost.

5. Conclusions

We have introduced in this letter a novel AP-LDA method for face recognition. Unlike Fisherface, which uses all training face images for recognition, our AP-LDA method uses only representative face images. This makes AP-LDA more efficient. Experiments also indicate that AP-LDA outperforms Fisherface in terms of recognition rate.

Acknowledgments

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This research has been supported by the National Natural Science Foundation of China (Grant Nos. 60675023 and 60602012).

References

1.

M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces,” 586 –591 (1991). Google Scholar

2.

P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces versus fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., 19 (7), 711 –720 (1997). https://doi.org/10.1109/34.598228 0162-8828 Google Scholar

3.

M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face recognition by independent component analysis,” IEEE Trans. Neural Netw., 13 (6), 1450 –1464 (2002). https://doi.org/10.1109/TNN.2002.804287 1045-9227 Google Scholar

4.

L. Wiskott, J. M. Fellous, N. Kuiger, and C. von der Malsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. Pattern Anal. Mach. Intell., 19 (7), 775 –779 (1997). https://doi.org/10.1109/34.598235 0162-8828 Google Scholar

5.

C. Liu, “Gabor-based kernel PCA with fractional power polynomial models for face recognition,” IEEE Trans. Pattern Anal. Mach. Intell., 26 (5), 572 –581 (2004). https://doi.org/10.1109/TPAMI.2004.1273927 0162-8828 Google Scholar

6.

C. Liu and H. Wechsler, “Independent component analysis of Gabor features for face recognition,” IEEE Trans. Neural Netw., 14 (4), 919 –928 (2003). https://doi.org/10.1109/TNN.2003.813829 1045-9227 Google Scholar

7.

C. Liu and H. Wechsler, “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition,” IEEE Trans. Image Process., 11 (4), 467 –476 (2002). https://doi.org/10.1109/TIP.2002.999679 1057-7149 Google Scholar

8.

B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, 315 (5814), 972 –976 (2007). https://doi.org/10.1126/science.1136800 0036-8075 Google Scholar

9.

P. N. Belhumeur and D. J. Kriegman, “The Yale face database,” (1997) http://cvc.yale.edu/projects/yalefaces/yalefaces.html Google Scholar

10.

K. C. Lee, J. Ho, and D. J. Kriegman, “Acquiring linear subspaces for face recognition under variable lighting,” IEEE Trans. Pattern Anal. Mach. Intell., 27 (5), 684 –698 (2005). https://doi.org/10.1109/TPAMI.2005.92 0162-8828 Google Scholar

11.

T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression (PIE) database,” 46 –51 (2002). Google Scholar

Citation Download Citation

Chunhua Du, Jie Yang, Qiang Wu, and Feng Li "Integrating affinity propagation clustering method with linear discriminant analysis for face recognition," Optical Engineering 46(11), 110501 (1 November 2007). https://doi.org/10.1117/1.2801735

Published: 1 November 2007

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 8 scholarly publications.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Facial recognition systems

Databases

Image processing

Principal component analysis

Computing systems

Independent component analysis

Optical engineering

1.

Introduction

2.

Review of the AP Clustering Method and the LDA Method

2.1.

AP Clustering Method

Eq. 1

Eq. 2

2.2.

LDA Method

Eq. 3

Eq. 4

Eq. 5

Eq. 6

3.

Summary of AP-LDA

4.

Experiments and Discussion

Table 1

Table 2

Table 3

5.

Conclusions

Acknowledgments

References

Show All Keywords

Keywords/Phrases

Search In:

Publication Years