written 6.8 years ago by | • modified 3.0 years ago |
Subject: Speech Processing
Topic: Speech Analysis in Time Domain
Difficulty: Low
written 6.8 years ago by | • modified 3.0 years ago |
Subject: Speech Processing
Topic: Speech Analysis in Time Domain
Difficulty: Low
written 6.7 years ago by |
(i) The short-time processing technique (in both time and frequency) produce parameter signals of the form
$ U(n) = \sum_{m=-\infty}^{\infty} T[S(m)] W(n-m) $ .....(1)
where, S(n): Speech signals
U(n): Non-zero value
W(n): Window function
U(n) corresponds to short-time energy or amplitude if T in eq.(1) is squaring or absolute magnitude operation,
$ E_n(n) = \sum_{m-w}^{w} [S(m)W(n-m)]^2 \\ = \sum_{m-w}^{w} S^2(m)W^2(n-m) \\ = \sum_{m-w}^{w} S^2(m).h(n-m) \\ = S^2(m).h(m) \\ M(n) = \sum_{m-w}^{w} |S(m)|W(n-m) \\ = |S(m)|W(m) $
(ii) Squaring of the signal to calculate energy would emphasize high amplitudes.
(iii) Magnitude (amplitude) measurements does not emphasize amplitudes and are simple to calculate.
(iv) Such a measurements help the speech to be segmented into smaller phonetic units.
(v) Voiced and unvoiced speech can be told part due to the large variation in amplitude. The amplitude of unvoiced segments is not high as the amplitude of voice segments.
Short-Time Average Zero Crossing Rate (ZCR)
(i) The zero-Crossing Rate (ZCR) provides a good spectral information in a cost effective way.
(ii) In speech signals S(n), zero-crossings occurs when S(n) = 0, i.e, when the waveform crosses the time reference axis or changes sign.
(iii) ZCR (in zero crossings) is an authentic spectral measure for narrow-band signals (eg. sinusoids), a sinusoid has two zero crossings/periods, i.e. $F_0 = \frac{ZCR}{2}$
(iv) Whereas, for discrete-time signals with ZCR in zero-crossings/sample, $ F_0 = \frac{(ZCR)(FS)}{2} $, for F$_s$ sample.
(v) The ZCR can be defined as U(n) in eq., with T[S(n)] = 0.5|sgn[S(m)] - sgn[S(m-1)]|, where the algebraic sign of S(n) is given in eq. and W(n) is a rectangular window scaled by $\frac{1}{N}$ as given in eq., would yeild zero-crossings/sample, or by $\frac{FS}{N}$ to yield zero-crossings.
(vi) An appropriate way of defining zero crossings is $ Z(n) = \sum_{m-w}{w} |sgn[S(m)] - sgn[S(m-1)]| w(n-m) $
sgn[S(m)] = $ \begin{cases} 1 & S(m) \geq 0 \\ -1 & S(m) \lt 0 \end{cases} $
$ w(m) = \frac{1}{2N} \,\,\,\,\, 0 \leq m \lt N-1 $
(vii) The ZCR varies slowly with the corresponding vocal tract movements hence U(n) can be subjected to heavy decimation.