Mingxin Jin, Huifang Li
Journal of Electronic Imaging, Vol. 29, Issue 01, 013006, (January 2020) https://doi.org/10.1117/1.JEI.29.1.013006
TOPICS: Facial recognition systems, Sensors, Lithium, Network architectures, Data modeling, Detection and tracking algorithms, Convolutional neural networks, Zoom lenses, Convolution, Machine vision
The performance of face detection has achieved great progress with the rapid development of the convolutional neural network. However, detecting faces with a wide range of scales is still a challenging and crucial problem, especially for small faces. We focus on designing a strong network architecture, aiming at generating high-quality feature representation to handle the multiscale problem. Specifically, we conduct an in-depth study of three aspects, including feature fusion, attention mechanism, and receptive field, then propose an effective one-stage face detector. First, we design a feature fusion module, which adopts a more effective manner to sufficiently fuse multiscale features for building discriminative feature pyramid representation. Then, we propose an attention enhancement module for introducing the attention mechanism, which makes the network keep more valid information to emphasize the features from facial regions. Finally, we construct a prediction module, which ensures that each detection layer contains rich receptive fields to bring more useful context information. A large number of experiments on the widely used face detection benchmarks (i.e., WIDER FACE, FDDB, PASCAL FACE, and AFW) indicate that the results of our method outperform overwhelming majority of the existing methods and are comparable to the most recent state-of-the-art face detector, which demonstrate that our method is able to detect faces with a wide range of scales effectively.