Paper
14 November 2023 An analysis of the acoustic features of English intonation for English professionals by integrating image and video analysis
Li Xu
Author Affiliations +
Proceedings Volume 12934, Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023); 129340H (2023) https://doi.org/10.1117/12.3008406
Event: 2023 3rd International Conference on Computer Graphics, Image and Virtualization (ICCGIV 2023), 2023, Nanjing, China
Abstract
Speech recognition has made breakthrough progress and been widely used. Along with the development of speech recognition, new requirements are constantly put forward. First, acoustic parameters are related to the natural attributes of speakers; second, the calculation of acoustic parameters depends on a large range of corpus resources; and in the aspects of language recognition, speaker recognition, speech visualization and automatic speech annotation, more effort needs to be put into research. English contains 48 phonemes, and the correct recognition of phonemes is an important basis for the analysis and study of the acoustic characteristics of continuous intonation. In this paper, the convolutional neural network is first used to extract visual features of different scales, and the image features of different scales are fused effectively, so that the fused feature vector contains more detailed image information, and effectively alleviates the problem of image information loss. Then, an intonation acoustic feature recognition model based on attention mechanism is constructed, which takes into account the early and late fusion of features and improves the effectiveness of information fusion. The experimental results show that the training error of the model in this paper decreases gradually with the increase of the number of iterations and tends to be stable after 1000 iterations. The model basically converges and has reliability and feasibility. In the phoneme recognition experiment, for sentences with more phonemes and sentences with fewer phonemes, the recognition rate of the model in this paper is more than 60% and the loss rate is less than 5%, and about 60 phonemes can be recognized per minute. Therefore, the model presented in this paper improves the results of English intonation acoustic feature recognition to a certain extent, which is successful.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Li Xu "An analysis of the acoustic features of English intonation for English professionals by integrating image and video analysis", Proc. SPIE 12934, Third International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2023), 129340H (14 November 2023); https://doi.org/10.1117/12.3008406
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Acoustics

Image fusion

Video

Image analysis

Image analysis

Education and training

Education and training

Back to Top