Paper
1 June 2023 A feature lightweight method in optimized acoustic encoder
Wei Liu, Quanbing Liu, Jiacheng Liu, Yiming Sun
Author Affiliations +
Proceedings Volume 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023); 127180L (2023) https://doi.org/10.1117/12.2681644
Event: International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 2023, Nanjing, China
Abstract
This paper is based on end-to-end speech recognition based on convolutional neural network technology. The problem that convolutional neural networks are difficult to balance accuracy and model size is analysed and studied. A new acoustic encoder is proposed to optimize the extraction of speech features. The effectiveness of the proposed method is verified by an end-to-end speech recognition model. The new acoustic encoder focuses the Global context information with the local information obtained by convolution, Global context information is added. The convolution depth is increased while the convolution kernel size is reduced. It also reduces the amount of parameter calculation. At the same time, RNN-Transducer is used as the model architecture to jointly optimize acoustic features and text features, in order to obtain more effective context information. On the LibriSpeech dataset, the improved acoustic encoder achieves 9.83% word error rate and 4.61% sentence error rate with only 7M model. Compared with the baseline model of convolutional neural network, the word error rate is relatively reduced by 0.52%, and the sentence error rate is relatively reduced by 1.22%.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wei Liu, Quanbing Liu, Jiacheng Liu, and Yiming Sun "A feature lightweight method in optimized acoustic encoder", Proc. SPIE 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 127180L (1 June 2023); https://doi.org/10.1117/12.2681644
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Convolution

Acoustics

Batch normalization

Speech recognition

Convolutional neural networks

Data modeling

Attenuation

Back to Top