Alexandre Trilla, PhD - Research Engineer |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

TTS in the future

21-Nov-2010

In the FALA 2010 conference, Dr. Heiga Zen gave a speech entitled "Fundamentals and recent advances in HMM-based speech synthesis". He reviewed the growth of Hidden Markov Models (HMM) over the last years in the TTS research community. Indeed, this direction was also evident in the Speech Synthesis Albayzin 2010 Evaluation, where out of the 10 systems participating, 3 were purely concatenative, 6 were based on HMM, and one was as a hybrid approach (HMM-based + concatenative). And it was the latter who won the competition.

By the middle of his presentation, he cited Dr. Simon King's speech at the Interspeech 2010 conference stating that TTS synthesis is easy as long as some recommendations are followed. Overall, they suggest to avoid non-professional speakers, to avoid working with small corpora, with noisy recordings and labelling mistakes, and to acquire a deal of knowledge of the language aimed by the system. A core problem redefinition for research to tackle.

Lastly, Dr. Zen encouraged the audience to join the research in TTS synthesis, and he provided some directions to get involved, beginning with text processing, i.e. the first stage in a TTS synthesis system. Thus, it seems that there is an especially nice and promising framework for my Ph.D. :)