0
3.2kviews
Explain pitch period estimation using short-time autocorrelation

Subject: Speech Processing

Topic: Speech Analysis in Time Domain

Difficulty: High

1 Answer
1
63views

(i) The discrete time auto-correlation function of a deterministic signal is given by $ \phi (k) = \sum_{m-w}^{w} x(m)x(m+k) $

(ii) For random or periodic signal is auto-correlation function is given as $$ \phi(k) = \lim_{n \to w} \,\, \frac{1}{2N+1} \sum_{m-w}^{w}x(m)x(m+k) $$

(iii) When the signal is periodic with a period of P samples, meaning that x(m) has period P i.e. x(m) = x(m+P), then $ \phi(k) = \phi (k+P) $.

(iv) Some important properties of auto-correlation function is:

$ \hspace{0.5cm} $ a) It is even function $ \phi(k) = \phi(-k) $

$ \hspace{0.5cm} $ b) Maximum value is attained at k = 0 i.e. $ |\phi(k)| \leq \phi $ for all k

$ \hspace{0.5cm} $ c) $ \phi(0) $ indicates a quantity that equals the energy signals for deterministic signals.

(v) The short time auto-correlation function is $$ R_n (k) = \sum_{m-w}^{w} x(m)W(n-m)x(m+k)w(n-k-m) $$

enter image description here

The equation can be interpreted as:

$ \hspace{0.5cm} $ a) A speech segment is first chosen by multiplying it by the window.

$ \hspace{0.5cm} $ b) Then the deterministic auto-correlation applied to the segment of speech that was windowed.

$$ R_n (-k) = \sum_{m-w}^{w} x(m)W(n-m)x(m-k)w(n+k-m) $$

$ \hspace{0.5cm} $ First note on $$ y_k(n)h_k(n-m) = \sum_{m-w}^{w} x(m)x(m-k)[W(n-m)W(n-m+k)] \\ h_k(n) = W(n)W(n+k) \\ Defining \,\, h_k(n) = W(n)W(n+k) $$

The equation becomes R$_n$(k) = x(m)x(m-k)h(n-m)

(vi) filtering the sequence x(n)x(n-k) with a filter having an impulse response h$_k$(n) would give us the value of the k$^{th}$ auto-correlation lag at time n.

$$ R_n(-k) = r_n(k) = y_k(n) \times h_k(n) $$

Block diagram for short time auto-correlation

(vii) Short time auto-correlation function is usually computed using following equation:

$$ R_n (k) = \sum_{m-w}^{w} x(m)W[-(m-n)]x(m+k)W[-(m-n+k)] \\ = \sum_{m-w}^{w} x(m+n)W'(m)x(m+n+k)W'(m+k) \hspace{1cm} [W'(m) = W(-m)]$$

(viii) If the duration of the window W' is finite then:

$$ R_n(k) = \sum_{n=0}^{N-1-k} [x(n+m)W'(m)][x(n+m+k)W'(m+k)] $$

Auto-correlation for voiced speech with different length of N

(ix) Choice of N is a critical since it should be chosen in a manner that it give a good indication of periodicity.

(x) The requirement conflict due to changing properties of speech signal needs N to be as small as possible.

(xi) Duration of the window must at least cover two period of the waveform in order to get any indication of periodicity in the auto-correlation function.

(x) The modified short time auto-correlation function is given by:

$$ \hat{R}_n (k) = \sum_{m-w}^{w} [x(n) \hat{W}_1(n-m)][x(m+k) \hat{W}_2(n-m-k)] $$

The above expression can also written as

$$ \hat{R}_n (k) = \sum_{m-w}^{w} [x(n+m) \hat{W}_1(m)][x(n+m+k) \hat{W}_2(m+k)] $$

That is,

$$ \hat{W}_1(m) = \begin{cases} 1 \hspace{1cm} 0 \leq 1 \leq N-1 \\ 0 \hspace{1cm} otherwise \end{cases} \\ \hat{W}_2(m) = \begin{cases} 1 \hspace{1cm} 0 \leq 1 \leq N-1+k \\ 0 \hspace{1cm} otherwise \end{cases} $$

Please log in to add an answer.