The Steepest Descent

146views

written 6.1 years ago by

gauravsaha • 0

The Steepest Descent method, also called the Cauchy Method ,was put forward by Cauchy in 1857. In this method, we choose an initial point $X_1$ and iteratively move along the steepest descent direction to find the optimum point. Minimization occurs in the negative direction of the gradient vector . The steps in this method are as follows:-

Step 1: Start with an arbitrary point $X_1$ . Consider the iteration number be represented by i. Then set $i=1.$

Step 2: Find the search direction. $S_i$ as given $S_i=-\nabla f_i=-\nabla f(X_i).$

Step 3: Find the optimal step length $\lambda$ * in the direction $X_i$ as given in eq. $X_{i+1}= X_i + \lambda_i^* S_i = X_i - \lambda_i^* \nabla f_i.$

Step 4: Test the new point, $X_{i+1}$ , for optimality condition whose derivative equals to 0 .In other words, check if the new point is lesser than the previous one. If it is lesser, then the direction of movement is ideal and $X_{i+1}$ becomes the optimum point. If $X_{i+1}$ is the optimum point, then stop the process , otherwise goto step 5.

Step 5: See the new iteration number $i =i+1$ and goto step 2.

Q:- Minimize $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$ starting from point $X_1 =\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\}$

solution:- We have, $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$

The gradient f is given by $\nabla f = \begin{cases} \frac { \partial f }{ \partial x_{ 1 } } \\ \frac { \partial f }{ \partial x_{ 2 } } \end{cases}= \begin{cases} 4+4x_{ 1 }+2x_{ 2 } \\ -2+2x_1+2x_2 \end{cases}$

$\nabla f_1=\nabla f(X_1)= \left\{ \begin{matrix} 4 \\ -2 \end{matrix} \right\}$

Step 2:- $S_i=-\nabla f_i= -\nabla f(X_i)$

Hence $S_1 =-\nabla f_1= \left\{ \begin{matrix} -4 \\ 2 \end{matrix} \right\}$

Step 3:- Determine the optimal step length $\lambda_i$ * in the direction $S_i$ .To find $X_2$ we need to find the optimal step length $\lambda_1$ .

For this we minimize $f(X_1+\lambda_1S_1)$ with respect to $\lambda_1$ . $f(X_1 +\lambda_1S_1) =f(-\lambda_1,\lambda_1) = \lambda_1^2 -2\lambda_1=X_1 1+\lambda_1S_1= \begin{bmatrix}0\\0\end{bmatrix}+\lambda_1 \begin{bmatrix}-4\\2\end{bmatrix}$

We have $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$

Hence,

$f(\lambda_1,\lambda_2) = -4\lambda_1-2\lambda_1+2\lambda_1^2-2\lambda_1\lambda_1+\lambda_1^2 =\lambda_1^2-6\lambda_1.......................(1)$

Substituting $\dfrac{\partial f}{\partial \lambda_1}=0$ in equation(1) , we get $\dfrac{\partial( \lambda_1^2-6\lambda_1)}{\partial \lambda_1}=2\lambda_1-6=0$

$\lambda_1= 3$

$X_2 = X_1+\lambda_1^*S_1=\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\}+3\left\{ \begin{matrix} -4 \\ 2 \end{matrix} \right\}=\left\{ \begin{matrix} -12 \\ 6 \end{matrix} \right\}$

As $\nabla f_2=\nabla f(X_2)\neq\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\},X_2$ is not optimum

ADD COMMENT EDIT