written 6.7 years ago by | • modified 6.6 years ago |
Subject: Speech Processing
Topic: Homomorphic Speech Processing
Difficulty: Medium
written 6.7 years ago by | • modified 6.6 years ago |
Subject: Speech Processing
Topic: Homomorphic Speech Processing
Difficulty: Medium
written 6.6 years ago by |
(i) The Mel Frequency Cepstrum (MFC) can be defined as the short-time power spectrum of a speech signal, which is calculated as the linear cosine transform of the log power spectrum on a non-linear Mel scale frequency.
(ii) In the case of the MFC, the frequency bands are equally spaced on the Mel scale.
(iii) This Mel scale approximates the human auditory system's response more closely than the linearly - spaced frequency bands used in case of cepstrum.
(iv) MFCC's can be calculated as follows:
a) Take FFT of window signals.
b) Compute its squared magnitude. Gives power spectrum.
c) Pre-emphasise the spectrum to approximates the unequal sensitivity of human being different frequency.
d) Integrate the power spectrum within the overlapping critical band filter response.
This integration is done using triangular overlapping windows called Mel filters. This effectively reduces the frequency sensitivity over the original spectral estimates, particularly at higher frequency are emphasized because of the wider band.
e) Compress the spectral amplitude by taking log. Optionally the integration of log power spectrum may be done.
f) Take IDFFT. This gives the cepstral coefficients.
g) Perform spectral smoothing, then get MFCC.