Inventi Journals Home
Inventi Rapid
Audio, Speech & Music Processing

Home Editorial Board Current Issue Past Issues Statistics
Search Article
Open Review Implemented! To Review - Visit Journal's Current-Issue Page


Journal Scope
Inventi Rapid/Impact: Audio, Speech & Music Processing is the peer reviewed journal of Engineering & Technology. It contains the experimental and theoretical paper related to various engineering technologies about audio, speech and music. Sound engineering, recording, electronic production of speech and music, digitization of sound are the fields covered under the papers of the journal.



RECOGNIZING EMOTION FROM TURKISH SPEECH USING ACOUSTIC FEATURES
Caglar Oflazoglu, Serdar Yildirim

Affective computing, especially from speech, is one of the key steps toward building more natural and effective human-machine interaction. In recent years, several emotional speech corpora in different languages have been collected; however, Turkish is not among the languages that have been investigated in the context of emotion recognition. For this purpose, a new Turkish emotional speech database, which includes 5,100 utterances extracted from 55 Turkish movies, was constructed. Each utterance in the database is labeled with emotion categories (happy, surprised, sad, angry, fearful, neutral, and others) and three-dimensional emotional space (valence, activation, and dominance). We performed classification of four basic emotion classes (neutral, sad, happy, and angry) and estimation of emotion primitives using acoustic features. The importance of acoustic features in estimating the emotion primitive values and in classifying emotions into categories was also investigated. An unweighted average recall of 45.5% was obtained for the classification. For emotion dimension estimation, we obtained promising results for activation and dominance dimensions. For valence, however, the correlation between the averaged ratings of the evaluators and the estimates was low. The cross-corpus training and testing also showed good results for activation and dominance dimensions....
More
N-DIMENSIONAL N-MICROPHONE SOUND SOURCE LOCALIZATION
Ali Pourmohammad, Seyed Mohammad Ahadi

This paper investigates real-time N-dimensional wideband sound source localization in outdoor (far-field) and lowdegree reverberation cases, using a simple N-microphone arrangement. Outdoor sound source localization in different climates needs highly sensitive and high-performance microphones, which are very expensive. Reduction of the microphone count is our goal. Time delay estimation (TDE)-based methods are common for N-dimensional wideband sound source localization in outdoor cases using at least N + 1 microphones. These methods need numerical analysis to solve closed-form non-linear equations leading to large computational overheads and a good initial guess to avoid local minima. Combined TDE and intensity level difference or interaural level difference (ILD) methods can reduce microphone counts to two for indoor two-dimensional cases. However, ILD-based methods need only one dominant source for accurate localization. Also, using a linear array, two mirror points are produced simultaneously (half-plane localization). We apply this method to outdoor cases and propose a novel approach for N-dimensional entire-space outdoor far-field and low reverberation localization of a dominant wideband sound source using TDE, ILD, and headrelated transfer function (HRTF) simultaneously and only N microphones. Our proposed TDE-ILD-HRTF method tries to solve the mentioned problems using source counting, noise reduction using spectral subtraction, and HRTF. A special reflector is designed to avoid mirror points and source counting used to make sure that only one dominant source is active in the localization area. The simple microphone arrangement used leads to linearization of the non-linear closedform equations as well as no need for initial guess. Experimental results indicate that our implemented method features less than 0.2 degree error for angle of arrival and less than 10% error for three-dimensional location finding as well as less than 150-ms processing time for localization of a typical wideband sound source such as a flying object (helicopter)....
More
INTRA-FRAME CEPSTRAL SUB-BAND WEIGHTING AND HISTOGRAM EQUALIZATION FOR NOISE-ROBUST SPEECH RECOGNITION
Jeih-weih Hung, Hao-teng Fan

In this paper, we propose a novel noise-robustness method known as weighted sub-band histogram equalization (WS-HEQ) to improve speech recognition accuracy in noise-corrupted environments. Considering the observations that high- and low-pass portions of the intra-frame cepstral features possess unequal importance for noise-corrupted speech recognition, WS-HEQ is intended to reduce the high-pass components of the cepstral features. Furthermore, we provide four types of WS-HEQ, which partially refers to the structure of spatial histogram equalization (S-HEQ). In the experiments conducted on the Aurora-2 noisy-digit database, the presented WS-HEQ yields significant recognition improvements relative to the Mel-scaled filter-bank cepstral coefficient (MFCC) baseline and to cepstral histogram normalization (CHN) in various noise-corrupted situations and exhibits a behavior superior to that of S-HEQ in most cases....
More
Patent Watch
Job Watch

E- ISSN: Awaited


Inventi Rapid
Audio, Speech & Music Processing



Frequency: Quarterly
E- ISSN: Awaited


RI Factor- 1.0
Abstracted/ Indexed in: Ulrich’s International Periodical Directory & Google Scholar, SCIRUS