Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 20
Tutorials Day at InterSpeech 2009
06-Sep-2009
The tutorials before the InterSpeech 2009
conference officially begins (tomorrow) have been held today.
In the morning I have attended the "Language and Dialect Recognition" tutorial.
This topic wraps the first step on a speech processing pipeline, before a sheer
speech recognition engine. It first identifies the language of the speaker and
then it spots its dialect. This step permits the use of more refined models
for recognition.
The tutorial has mostly been focused on acoustics and phonotactics.
Following a probabilistic approach, two aspects have surprised me.
The first one deals with acoustic features extraction, the Nuisance
Attribute Projection (NAP). Instead of modeling the most discriminant features,
this approach models the nuisanes that muddy the picture and then removes them
from the original space. The second one deals with phonotactics (capturing
the constraints in sequences of phonemes). The tutorial showed the importance
of this aspect for a language. According to its order (memory), the
phonotactics encode an incrediable amount of information.
In the afternoon, I have attended the "Statistical Approaches to Dialogue
Systems" tutorial. This field deals with modeling dialog acts after
being predicted by a speech recognizer and a semantic decoder. Accounting
for the errors that these systems may yield, this model makes decisions
despite of these uncertainties. Based on tracking via Bayesian belief
monitoring and a policy optimization via reinforcement learning, the
POMDP framework proposed provides the potential for building robust
dialogue systems.
By the way, Brighton is a lovely city! ;)
|
All contents © Alexandre Trilla 2008-2025 |