Alexandre Trilla, PhD - Research Engineer | home publications
 

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome


Post 24

Spanish science doesn't need scissors

07-Oct-2009

Still with the generalized indignation feeling because of the R&D budget cut proposed by the Spanish Government (see Post 22), today lots of Spanish researchers have agreed to post about this misfortune following the initiative La ciencia espanola no necesita tijeras.

As it was stated in yesterday's evening news, in the past five years the R&D budget has been one of the main concerns of the Government. This policy has enabled great advances to be achieved in Spain, which in turn has increased the attractiveness to come here for research purposes. If now the policy changes, what's most sensible to believe is that the attractiveness will begin to decrease at alarming rate. It is ridiculous that all the effort this country has made may be crumbled by such a proposal without any great expectations.

"If you think that research and education are expensive, try with ignorance and mediocrity." (Joan Guinovart)

image



Post 23

Talking RSS Reader for Android

05-Oct-2009

First, Google is the gold sponsor of Interspeech'09. Then, it releases a speech synthesizer for Google's mobile platform, Android, the Talking RSS Reader. What will be next? It seems that as long as Google is interested in speech technologies, speech processing will be on the rise. Is it good news or not?



Post 22

Ridiculous Spanish Government anti-crisis measures threaten R&D budgets

30-Sep-2009

A few days ago this opinion column appeared in a Spanish newspaper. Its author, Dr. Rodriguez, criticizes the budget cut on R&D that the Spanish Government has proposed to fight the present crisis. Since the column is written in Spanish, I am just going to translate it in this post, please excuse my skewed Spanish constructions.

"R&Desolation
I am a professor with more than 20 years of research experience and at least with a decent career with international publications (over 60), projects and Ph.D. Theses advisories. For a long while I believed that this country would be advancing in the scientific aspect, but the figures are stubborn and I have to lend myself to the evidence. The latest absurd proposal of the Government has been a 37% R&D budget cut. But worst of all is that this proposal has barely had any consequence in our society. The so renown social agents (political parties, syndicates, employers, associations) have shown an absolute indifference about the affair, which means that this society considers R&D totally accessory and useless. In summary: we are sentenced to be a country with high unemployment rates, low qualified jobs and derisory salaries and which principal economic activity will be taking care of holidaymakers and retired people from all over Europe. One last note, dedicated to the youth that wish to follow a research career: finish your theses and get out of this country!

J. ENRIC RODRIGUEZ GIL
Artes"

Earlier in this blog (see Post 6) Dr. Goldberg already attributed the present scientific wealth of the U.S. to the big budgets that the U.S. Government invested in the universities for research purposes after WW2. Definitely not the way things look here in Spain.



Post 21

InterSpeech 2009 Opening Ceremony and Sentiment Classification from Text

07-Sep-2009

The InterSpeech 2009 conference Opening Ceremony has been held today. Some of the most remarkable scientists that have passed away last year have been rememebered at the beginning of the event. Among them, Dr. Gunnar Fant, who devoted his life to the study of the vocal tract and the measurement of formant values.

Afterwards, Dr. Stephen Hawking has given a recorded speech on the importance of speech technologies for the wealth of mankind. Thanks to them, he has been able to communicate his ideas to the world despite of his illness.

The ceremony has concluded with a beautiful "All you need is love" (by The Beatles) featured by The Paletine String Quartet.

Finally in the afternoon, our work on sentiment classification from text has seen the light! Now the paper is freely available at the Publications section.



Post 20

Tutorials Day at InterSpeech 2009

06-Sep-2009

The tutorials before the InterSpeech 2009 conference officially begins (tomorrow) have been held today. In the morning I have attended the "Language and Dialect Recognition" tutorial. This topic wraps the first step on a speech processing pipeline, before a sheer speech recognition engine. It first identifies the language of the speaker and then it spots its dialect. This step permits the use of more refined models for recognition.

The tutorial has mostly been focused on acoustics and phonotactics. Following a probabilistic approach, two aspects have surprised me. The first one deals with acoustic features extraction, the Nuisance Attribute Projection (NAP). Instead of modeling the most discriminant features, this approach models the nuisanes that muddy the picture and then removes them from the original space. The second one deals with phonotactics (capturing the constraints in sequences of phonemes). The tutorial showed the importance of this aspect for a language. According to its order (memory), the phonotactics encode an incrediable amount of information.

In the afternoon, I have attended the "Statistical Approaches to Dialogue Systems" tutorial. This field deals with modeling dialog acts after being predicted by a speech recognizer and a semantic decoder. Accounting for the errors that these systems may yield, this model makes decisions despite of these uncertainties. Based on tracking via Bayesian belief monitoring and a policy optimization via reinforcement learning, the POMDP framework proposed provides the potential for building robust dialogue systems.

By the way, Brighton is a lovely city! ;)



Post 19

Long live the robots

29-Jul-2009

One more year, the CampusBot gathers a huge amount of robotics enthusiasts in Valencia, as part of the CampusParty. I had also once made my first approach to this field, it was my final High School project: WaiterBot. My robot was remotely controlled with a home-made transmitter/receiver pair with a joystick and could carry a glass of e.g. water on a tray without spilling its content. Since it had a couple of orthogonal encoders parallel to its plane of movement it could detect the inclination of the surface it roved and correct this angle with a set of motors and gears. Take a look at the pictures (pic01 and pic02) to see what I mean.

Today I thought about that little bot I built eight years ago, it's been a quite a while. I searched my hard drive and recovered the manuscript where the schematics and source code were and put it in my publications space under a Creative Commons license. Eventually that creation is freely available on the Internet.

I must say that the waiter robot served me very well in the university. Apart from the knowledge I obtained from hacking with the PIC16F84 microcontroller, which had become very popular among the satellite television cracking community, on my third year at university I replaced the tray device with an ultrasound SONAR and built a RC car which stopped in case of collision danger. The ultrasound SONAR was built with the auto-focusing device of a Polaroid camera, as indicated in the Encoder e-magazine from the Seattle Robotics Society. Its precision allowed the robot to stop at 62cm. from an obstacle.

After that, looking forward to completing my Bachelor's degree I set to making a line follower. I had to rebuild the motion mechanism because the gears had worn out. Thus, I replaced the original toy device with a couple of hacked RC boat servo-motors, I dismantled the radio controller and finally attached an array of infrared reflective optical sensors to autonomously drive the robot. I have had a lot of fun with robots after all these experiences.

Nowadays there is Arduino, an open-source electronics prototyping platform that is being used not only for robotics, but for many other applications since it has been adopted by many different fields: from art to engineering. It merges free hardware with free software, the best of both worlds ;) It would be great to migrate all academic programs to such open-source frameworks so as to enable/motivate the thorough study of the systems that lay underneath, a goal that cannot be achieved, by definition, with proprietary platforms, be either software or hardware.



Post 18

R, Octave and Scilab

19-Jun-2009

Today the program for the next Jornades de Programari Lliure has been made available on the web portal of this meeting. Although the program is still provisional, the appointed activities schedule a speech on R and a tutorial on Octave and Scilab.

As is reported in the description of these events, the speech on R will be focused on the need of these sort of free statistical tools for the wealth of the scientific community. R is presented as a vehicular tool for the collaboration among different research groups, as well as a means of providing students with quality tools to exercise their technical abilities once they attain their grades and leave the university.

But I would now like to concentrate on Octave and Scilab since I am somewhat more accustomed to working with them, for convenience. It's wonderful that these tools eventually get to the people. With utilities like these, which are developed and supported by hordes of researchers around the world, I do find it hard to argue in favor of more "traditional" products like Matlab and the like. But the world is still imperfect. I have been using them both for about three years now, dealing with their peculiarities. I will put a clear example to show what I mean. Some days ago, developing a Magnitude Difference Function (MDF) pitch and sonority detector, Scilab resulted 3.63 times faster than Octave (using the same base code). At first sight, one would say that Scilab is a deal better choice: it has a nice GUI, a box diagram based dynamic system simulator and a powerful plotting engine. But when then I developed a LPC vocoder and needed a function to retrieve the LPC coefficients, only Octave provided such function, which means that either the previous code had to be recoded into Octave or that the function had to be recoded into Scilab. To my mind, this is not a big problem, because in any case one ends up learning a lot more than what one was supposed to learn at the beginning of the work, and at the end of it, one has the chance to contribute to the wealth of the software of choice and the scientific community as a whole.



Post 17

InterSpeech 2009 conference

16-Jun-2009

Good news. Our work on sentiment classification from sentence-level annotations of emotions regarding models of affect has been selected for poster presentation at the InterSpeech 2009 conference in a session "Prosody, Text Analysis, and Multilingual Models". The conference will be held on September at Brighton, U.K.



Post 16

HMMs considering multiple observations

06-Jun-2009

The original definition of the (discrete) Hidden Markov Models (HMM) in the speech recognition field [Rabiner, 1989], available here, deals with a single symbol sequence, which implies that the models are shown one single aspect of the speech signal (the spectral behavior features). Thus the models are limited to making predictions based on the features of a speech frame, which is previously framed in time in order to take it for a stationary signal. But speech is by definition a non-stationary time-varying signal (phonemes...), therefore the stationarity assumption is not actually true, but we assume it for simplification.

HMMs are meant to model dynamic systems, then it makes little sense not to consider these dynamic features inherent in the signal. The basic idea behind the "multiple observations" concept is that the HMMs in question take into account several observations (discrete symbol sequences) when dealing with data. These additional symbol sequences are extracted from the quantized derivatives and double-derivatives of the speech feature parameters (the velocities and accelerations of these feature parameters). Then enabling the HMMs to deal with this time-varying information they ought to be more accurate in their predictions, and therefore, the HMMs should be able to better predict the behavior of the dynamic system they intend to model.

In order to accomplish this improvement, the HMMs have to track the evolution of these three discrete symbolic spaces: the coefficients (C), the velocities/deltas (D) and the accelerations/double-deltas (DD). From now on, the HMMs should be defined with this consideration, as is shown below:

image

Then, each time the models shall provide the probability of observing the particular symbol triad

image

we assume statistical independence between these three spaces for simplicity and redefine the observation probability function for a given state m as:
image

This new observation probability function enables the HMMs to deal with multiple observations. One must bear in mind that in order to reestimate this function in the Baum- Welch procedure, the resulting probability must update all the matrices that define the feature spaces according to the observed symbols.

With the inclusion of multiple observations into the HMMs, the performance improvement (Word Error Rate) obtained with the datasets provided in the Speech Processing subject at UPC oscillates between 72% and 53% when testing with the training dataset or with another dataset respectively, reaching an absolute WER of 1.11% in the first case and 7.03% in the second one. Note that these figures have been computed using LPCCs. With MFCCs they even got down to 0.74% and 5.07% respectively. Bearing in mind that this technology is more than ten years old, it's simply fantastic.

--
[Rabiner, 1989] L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Proc. IEEE, Vol. 77, No. 2, pp. 257-286, February 1989



Post 15

Thank you, Guadalinex

19-May-2009

The Guadalinex GNU/Linux distribution is one of the most ambitious free software projects in Spain, promoted by the Government of Andalusia (Junta de Andalucia), aimed at spreading the FLOSS culture in the educational sector. This exemplar project deserves my admiration for the technical quality of the distro, the organization of the project and the nice community that supports it.

Last year on the 10th of December I could participate in the virtual meeting of collaborators of Guadalinex with my Master's Thesis, the Magnus project in Spanish, a speech recognition application coded in Java for controlling the mouse of the computer (especially important for physically impaired people). The meeting was held through Gobby (a useful tool I had not heard of before) and the experience was very gratifying. We could discuss some of the features the project still lacks, like speaker adaptation (the Andalusian accent is pretty different from the Catalan, which was the seminal implemented language), the flexibility of the application and the performance of a Java app compared to a natively compiled application.

And today I have received a sweet present from the Guadalinex team: a Guadalinex penguin mascot toy. They had no reason to do it, but they chose to do so, and I now choose to put it flat-out: the Gualinex project is a great deal more than a mere big free software project. Thank you, Guadalinex.



newer | older - RSS - Search


All contents © Alexandre Trilla 2008-2025