Explain how short time energy (STE) and short time magnitude (STM) can be used to distinguish voice, unvoiced and silence regions of speech signals

56views

written 6.6 years ago by

teamques10 ★ 68k

(i) U(n) corresponds to short-time energy or amplitude if T in eq. is squaring or absolute magnitude.

$ E_n(n) = \sum_{m-w}^w [S(m)W(n-m)]^2 \\ E_n(n) = S^2(m) \times h(m) \\ M(n) = |S(m)| \times W(m) $

enter image description here

(ii) Squaring of the signal to calculate energy would emphasize high amplitude.

(iii) Magnitude measurements does not emphasize amplitude and are simple to calculate.

(iv) Thus such measurements help the speech to be segmented into smaller phonetic unit.

(v) Voiced and unvoiced speech can be told apart due to the large variation in amplitude.

(vi) The amplitude of unvoiced segmentation is not as high as the amplitude variations is the short time energy of speech signals.

(vii) The nature of short time energy representation depends on the choice of h(n) -> impulse response.

(viii) A longer and constant amplitude window which is equivalent to a narrow-band low pass filter very narrow can be applied to see how the choice of window affects the short-time energy.

(ix) Two windows are worth discussing for observing the effects of window on the time dependent energy, the rectangular window.

h(n) = $ \begin{cases} 1 & 0 \leq n \leq (N-1) \\ 0 & otherwise \end{cases} $

and the Hamming window

h(n) = $ \begin{cases} 0.54 - 0.46 cos[\frac{2 \pi n}{(N-1)}] & 0 \leq n \leq (N-1) \\ 0 & otherwise \end{cases} $

The frequency response is

$ H(e^{jw}) = e \frac{-jw \times (N-1)/2 \times sin(wN/2)}{sin(w/2)} $

(x) It can be first zero along frequency $ F = \frac{F_s}{N} $ where $ F_s = \frac{1}{T} $ is the sampling frequency.

ADD COMMENT EDIT