Alexandre Trilla, PhD - Research Engineer |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Tutorials Day at InterSpeech 2009

06-Sep-2009

The tutorials before the InterSpeech 2009 conference officially begins (tomorrow) have been held today. In the morning I have attended the "Language and Dialect Recognition" tutorial. This topic wraps the first step on a speech processing pipeline, before a sheer speech recognition engine. It first identifies the language of the speaker and then it spots its dialect. This step permits the use of more refined models for recognition.

The tutorial has mostly been focused on acoustics and phonotactics. Following a probabilistic approach, two aspects have surprised me. The first one deals with acoustic features extraction, the Nuisance Attribute Projection (NAP). Instead of modeling the most discriminant features, this approach models the nuisanes that muddy the picture and then removes them from the original space. The second one deals with phonotactics (capturing the constraints in sequences of phonemes). The tutorial showed the importance of this aspect for a language. According to its order (memory), the phonotactics encode an incrediable amount of information.

In the afternoon, I have attended the "Statistical Approaches to Dialogue Systems" tutorial. This field deals with modeling dialog acts after being predicted by a speech recognizer and a semantic decoder. Accounting for the errors that these systems may yield, this model makes decisions despite of these uncertainties. Based on tracking via Bayesian belief monitoring and a policy optimization via reinforcement learning, the POMDP framework proposed provides the potential for building robust dialogue systems.

By the way, Brighton is a lovely city! ;)