written 6.7 years ago by | • modified 6.6 years ago |
Subject: Speech Processing
Topic: Speech Analysis in Time Domain
Difficulty: Medium
written 6.7 years ago by | • modified 6.6 years ago |
Subject: Speech Processing
Topic: Speech Analysis in Time Domain
Difficulty: Medium
written 6.6 years ago by |
(i) U(n) corresponds to short-time energy or amplitude if T in eq. is squaring or absolute magnitude.
$ E_n(n) = \sum_{m-w}^w [S(m)W(n-m)]^2 \\ E_n(n) = S^2(m) \times h(m) \\ M(n) = |S(m)| \times W(m) $
(ii) Squaring of the signal to calculate energy would emphasize high amplitude.
(iii) Magnitude measurements does not emphasize amplitude and are simple to calculate.
(iv) Thus such measurements help the speech to be segmented into smaller phonetic unit.
(v) Voiced and unvoiced speech can be told apart due to the large variation in amplitude.
(vi) The amplitude of unvoiced segmentation is not as high as the amplitude variations is the short time energy of speech signals.
(vii) The nature of short time energy representation depends on the choice of h(n) -> impulse response.
(viii) A longer and constant amplitude window which is equivalent to a narrow-band low pass filter very narrow can be applied to see how the choice of window affects the short-time energy.
(ix) Two windows are worth discussing for observing the effects of window on the time dependent energy, the rectangular window.
h(n) = $ \begin{cases} 1 & 0 \leq n \leq (N-1) \\ 0 & otherwise \end{cases} $
and the Hamming window
h(n) = $ \begin{cases} 0.54 - 0.46 cos[\frac{2 \pi n}{(N-1)}] & 0 \leq n \leq (N-1) \\ 0 & otherwise \end{cases} $
The frequency response is
$ H(e^{jw}) = e \frac{-jw \times (N-1)/2 \times sin(wN/2)}{sin(w/2)} $
(x) It can be first zero along frequency $ F = \frac{F_s}{N} $ where $ F_s = \frac{1}{T} $ is the sampling frequency.