|
COMBINED PERCEPTION AND CONTROL FOR TIMING IN ROBOTIC MUSIC PERFORMANCES
|
|
|
Umut Simsekli, Orhan Sonmez, Baris Kurt, Ali Taylan Cemgil
|
|
Interaction with human musicians is a challenging task for robots as it involves online perception and precise
synchronization. In this paper, we present a consistent and theoretically sound framework for combining
perception and control for accurate musical timing. For the perception, we develop a hierarchical hidden Markov
model that combines event detection and tempo tracking. The robot performance is formulated as a linear
quadratic control problem that is able to generate a surprisingly complex timing behavior in adapting the tempo.
We provide results with both simulated and real data. In our experiments, a simple Lego robot percussionist
accompanied the music by detecting the tempo and position of clave patterns in the polyphonic music. The robot
successfully synchronized itself with the music by quickly adapting to the changes in the tempo...
More
|
|
|
MULTI-CANDIDATE MISSING DATA IMPUTATION FOR ROBUST SPEECH RECOGNITION
|
|
|
Yujun Wang, Hugo Van hamme
|
|
The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large
vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply
solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed
frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this
article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems
approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT
solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the
uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The
experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of
accuracy when compared to frontend MDT....
More
|
|
|
SPEAKER DIARIZATION OF BROADCAST NEWS IN ALBAYZIN 2010 EVALUATION CAMPAIGN
|
|
|
Martin Zelenak, Henrik Schulz, Javier Hernando
|
|
In this article, we present the evaluation results for the task of speaker diarization of broadcast news, which was part of
the Albayzin 2010 evaluation campaign of language and speech technologies. The evaluation data consists of a
subset of the Catalan broadcast news database recorded from the 3/24 TV channel. The description of five submitted
systems from five different research labs is given, marking the common as well as the distinctive system features. The
diarization performance is analyzed in the context of the diarization error rate, the number of detected speakers and
also the acoustic background conditions. An effort is also made to put the achieved results in relation to the particular
system design features....
More
|
|