TY - GEN
T1 - Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition
AU - Planet, Santiago
AU - Iriondo, Ignasi
PY - 2012
Y1 - 2012
N2 - Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of acoustic and linguistic features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme.
AB - Detection of affective states in speech could improve the way users interact with electronic devices. However the analysis of speech at the acoustic level could be not enough to determine the emotion of a user speaking in a realistic scenario. In this paper we analysed the spontaneous speech recordings of the FAU Aibo Corpus at the acoustic and linguistic levels to extract two sets of acoustic and linguistic features. The acoustic set was reduced by a greedy procedure selecting the most relevant features to optimize the learning stage. We experimented with three classification approaches: Naïve-Bayes, a support vector machine and a logistic model tree, and two fusion schemes: decision-level fusion, merging the hard-decisions of the acoustic and linguistic classifiers by means of a decision tree; and feature-level fusion, concatenating both sets of features before the learning stage. Despite the low performance achieved by the linguistic data, a dramatic improvement was achieved after its combination with the acoustic information, improving the results achieved by this second modality on its own. The results achieved by the classifiers using the parameters merged at feature level outperformed the classification results of the decision-level fusion scheme, despite the simplicity of the scheme.
KW - Emotion recognition
KW - acoustic features
KW - decision-level fusion
KW - feature-level fusion
KW - linguistic features
KW - spontaneous speech
UR - http://www.scopus.com/inward/record.url?scp=84869047385&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84869047385
SN - 9789899624771
T3 - Iberian Conference on Information Systems and Technologies, CISTI
BT - Proceedings of the 7th Iberian Conference on Information Systems and Technologies, CISTI 2012
T2 - 7th Iberian Conference on Information Systems and Technologies, CISTI 2012
Y2 - 20 June 2012 through 23 June 2012
ER -