D-MelGAN: speech synthesis with specific voiceprint features

Daigang Chen; Hua Jiang; Chengxi Pu; Shaowen Yao

doi:10.1117/12.2627502

22 April 2022 D-MelGAN: speech synthesis with specific voiceprint features

Daigang Chen, Hua Jiang, Chengxi Pu, Shaowen Yao

Proceedings Volume 12163, International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021); 121631I (2022) https://doi.org/10.1117/12.2627502
Event: International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021), 2021, Nanjing, China

Abstract

In recent years, speech synthesis based on machine learning has become more and more popular. At present, there are many kinds of neural network models that can generate synthetic audio which highly imitates human voice. The quality of these generated audio is usually evaluated by mean opinion score (MOS). Voiceprint is an important metric to distinguish the speaker's speech features. Generating voice speech with specific voiceprint features is of great significance to improve the application of speech synthesis. However, the existing speech synthesis models seldom consider the preservation of specific voiceprint features. In this paper, we propose D-MelGAN, a speech synthesis model targeting to high-quality voice speech with specific speaker voiceprint features. The model is based on the non-autoregressive feedforward convolution neural network of GANs. By embedding the d-vector technology used to identify specific voiceprints in GANs, the original audio waveform with the characteristics of specific speaker voiceprints is further generated. The experimental results show that the new model can increase the voiceprint features of the generated audio, and the quality of the synthesized speech can be well maintained, which will make the generated speech have the specific style of a speaker, the text to speech technology will be applied to more fields.

Citation Download Citation

Daigang Chen, Hua Jiang, Chengxi Pu, and Shaowen Yao "D-MelGAN: speech synthesis with specific voiceprint features", Proc. SPIE 12163, International Conference on Statistics, Applied Mathematics, and Computing Science (CSAMCS 2021), 121631I (22 April 2022); https://doi.org/10.1117/12.2627502

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Autoregressive models

Convolution

Speaker recognition

Neural networks

Systems modeling

Performance modeling

RELATED CONTENT

Temporal attention convolution network for energy time series forecasting
Proceedings of SPIE (October 19 2022)

A multiple perceptual news recommendation method based on dynamic multi...
Proceedings of SPIE (December 02 2022)

Traffic volume forecasting model based on composite neural network and...
Proceedings of SPIE (September 27 2022)

Attention enhanced dynamic kernel convolution for TDNN-based speaker verification
Proceedings of SPIE (December 28 2022)

Fast prediction method for dynamic RCS of rotary wing small...
Proceedings of SPIE (March 18 2024)

Research on text classification of telecom user query
Proceedings of SPIE (December 28 2022)

Tactical speaker recognition using feature and classifier fusion
Proceedings of SPIE (March 02 1994)

Subscribe to Digital Library

Receive Erratum Email Alert

Keywords/Phrases

Search In:

Publication Years