Paper
12 April 2021 MSNet: multi-scale network for crowd counting
Author Affiliations +
Abstract
Nowadays, due to various challenges such as large-scale variation of population, mutual occlusion, perspective distortion and so on, crowd counting has gradually become a hot issue in computer vision. To address the large- scale variation exists in the images, in this paper, we propose a novel multi-scale network called MSNet which aims to maintain continuous variations and count the number of pedestrians accurately. While most state-of-the- arts multi-scale and multi-column networks aim to integrate the scale information of heads with different size, lots of researches still need to do to achieve continuous variations. In MSNet, specifically, the first ten layers of the visual geometry group network(VGG) are used as the backbone to extract the rough features of images and a multi-scale block is employed to maintain the scale information which contains several receptive kernels to obtain a better performance towards the difficulty of scale-variation. Inspired by the knowledge that using multiple small receptive field kernels to replace a single large receptive field will get a better performance, we utilize two dilated convolutions with the receptive field of 5 to replace the large kernel. Our MSNet has moderate increase in computation, and we evaluate our method on three benchmark datasets including ShanghaiTech (Part A: MAE=59.6, RMSE=96.1; Part B: MAE=7.5, RMSE=12.1), UCF-CC-50(MAE=207.9, RMSE=273.8) and UCF-QNRF(MAE=93, RMSE=158) to show the outperformance of our method.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ying Shi, Jun Sang, Mohammad S. Alam, Xinyue Liu, and Shaoli Tian "MSNet: multi-scale network for crowd counting", Proc. SPIE 11735, Pattern Recognition and Tracking XXXII, 117350M (12 April 2021); https://doi.org/10.1117/12.2592677
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Computer vision technology

Convolution

Distortion

Machine vision

Visualization

RELATED CONTENT


Back to Top