Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification

Xavier Valero; Francesc Alias

doi:10.1109/TMM.2012.2199972

Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification

Xavier Valero, Francesc Alias

Producció científica: Article en revista indexada › Article › Avaluat per experts

222 Cites (Scopus)

Resum

In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.

Idioma original	Anglès
Pàgines (de-a)	1684-1689
Nombre de pàgines	6
Revista	IEEE Transactions on Multimedia
Volum	14
Número	6
DOIs	https://doi.org/10.1109/TMM.2012.2199972
Estat de la publicació	Publicada - 2012

Accés al document

10.1109/TMM.2012.2199972

Altres arxius i enllaços

Link to publication in Scopus

Com citar-ho

@article{a4b2b8951f464545ba775d633b87b9c3,

title = "Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification",

abstract = "In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.",

keywords = "Audio classification, Gammatone cepstral coefficients, audio scene recognition, environmental sound, feature extraction",

author = "Xavier Valero and Francesc Alias",

year = "2012",

doi = "10.1109/TMM.2012.2199972",

language = "English",

volume = "14",

pages = "1684--1689",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - Gammatone cepstral coefficients

T2 - Biologically inspired features for non-speech audio classification

AU - Valero, Xavier

AU - Alias, Francesc

PY - 2012

Y1 - 2012

N2 - In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.

AB - In the context of non-speech audio recognition and classification for multimedia applications, it becomes essential to have a set of features able to accurately represent and discriminate among audio signals. Mel frequency cepstral coefficients (MFCC) have become a de facto standard for audio parameterization. Taking as a basis the MFCC computation scheme, the Gammatone cepstral coefficients (GTCCs) are a biologically inspired modification employing Gammatone filters with equivalent rectangular bandwidth bands. In this letter, the GTCCs, which have been previously employed in the field of speech research, are adapted for non-speech audio classification purposes. Their performance is evaluated on two audio corpora of 4 h each (general sounds and audio scenes), following two cross-validation schemes and four machine learning methods. According to the results, classification accuracies are significantly higher when employing GTCC rather than other state-of-the-art audio features. As a detailed analysis shows, with a similar computational cost, the GTCC are more effective than MFCC in representing the spectral characteristics of non-speech audio signals, especially at low frequencies.

KW - Audio classification

KW - Gammatone cepstral coefficients

KW - audio scene recognition

KW - environmental sound

KW - feature extraction

UR - http://www.scopus.com/inward/record.url?scp=84870567358&partnerID=8YFLogxK

U2 - 10.1109/TMM.2012.2199972

DO - 10.1109/TMM.2012.2199972

M3 - Article

AN - SCOPUS:84870567358

SN - 1520-9210

VL - 14

SP - 1684

EP - 1689

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

IS - 6

ER -

Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification

Resum

Accés al document

Altres arxius i enllaços

Fingerprint

Com citar-ho