written 8.1 years ago by | modified 2.9 years ago by |
Mumbai University > Computer Engineering > Sem 7 > Soft Computing
Marks: 10 Marks
Year: May 2016
written 8.1 years ago by | modified 2.9 years ago by |
Mumbai University > Computer Engineering > Sem 7 > Soft Computing
Marks: 10 Marks
Year: May 2016
written 8.1 years ago by | • modified 8.1 years ago |
Back propagation, an abbreviation for "backward propagation of errors", is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of a loss function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, inan attempt to minimize the loss function.
Back propagation requires a known, desired output for each input value in order to calculate the loss function gradient. It is therefore usually considered to be a supervised learning method, although it is also used in some unsupervised networks such as auto encoders. It is a generalization of the delta rule to multi-layered feed forward networks, made possible by using the chain rule to iteratively compute gradients for each layer. Back propagation requires that the activation function used by the artificial neurons (or "nodes") be differentiable.
Given are P training pairs
$$\{z_1,d_1,z_2,d_2……z_p,d_p\}$$ Where zi is (I x 1). Di is (K x 1) and I = 1, 2, … , p. Note that the $I^{th}$ component of each zi is of value -1 since input vector have been augmented. Since J – 1 of the hidden layer having outputs y is selected. Note that the $J'th$ component of y is of value ...1, since hidden layer outputs have also been augmented; y is $(J\times 1)$ and o is $(K \times 1)$.
Step 1: $\eta \gt 0, F_{\max}$, chosen Weights W nd V are initialized at small random values ; W is $(K \times J),$ V is $(J \times 1)$
$$q \leftarrow 1, p \leftarrow, E \leftarrow0$$
Step 2 : Training step starts here (see note 1 at end of list.) Input is presented and the layers' outputs computed [f (net) as in (2.3a)is used]:
$$z\leftarrow z_p, d \leftarrow d_p \\ y_i \leftarrow f(v'_jz), for j=1,2,.....,J$$ where $v_j$ a column vector, is the J,th row of v, and $$o_k\leftarrow f(w'_ky), for k=1,2,....,K$$ where $w_k$ a column vector, is the k'th row of W.
Step 3 : Error value is computed:
$$E\leftarrow \dfrac12 (d_k-o_k)^2+e, for k=1,2,....,K$$
Step 4 : Error signal vectors $\delta_0$ and $\delta_y$ of both layers are computed. Vector $\delta_0$ is $(K \times 1)$. $\delta_y is (J \times 1)$. (see note 2 at end of list.)
The error signal terms of the output layer in this step are
$$\delta_{ok}=\dfrac12 (d_k-o_k)(1-o^2_k), for k=1,2,....,K$$
The error signal terms of the hidden layer in this step are
$$\delta_{yj}=\dfrac12 (1-y^2_j)\sum^K_k=1 \delta_{ok}w'_{kj}, for j=1,2,....,j$$
Step 5 : output layer weights are adjusted :
$$w_{kj}\leftarrow w_{kj}+\eta \delta_{ok}y_j , for k=1,2,....,K and j=1,2,....,J$$
Step 6 : Hidden layer weights are adjusted:
$$v_{ji}\leftarrow v_{ji}+ \eta \delta_{yj}z_i,for j=1,2,....,J and i=1,2,....,I$$
Step 7 : If p
Step 8 : The training cycle is completed.
For $E \lt E_{\max}$ terminate the training session, output weights W.V.q and E.
If $E \gt E_{\max}$, then $E\leftarrow 0, p\leftarrow1$ , and initiate the new training cycle by going to step2
$$\delta_{ok}=(d_k-o_k)(1-o_k)o_k, for k =1,2,...., K \\ \delta_{yj}=y_j(1-y_j)\sum^K_{k=1}\delta_{ok}w_{kj}, for j=1,2,....,J$$