Gradient Descent

If you are familiar with maths, you might know that minimum or maximum value of a function is got by equating its derivative to zero. In other words, the value obtained when the slope(m) is zero considered as the min or max of that line(function).

Gradient descent is one approach to find the minimum value of cost function. The algorithm goes through the θ values step by step, in direction of negative slopes and stops at slope zero.
The above image from hackernoon explains this concept well. (Please note, the parameter representation here is w not θ).

Gradient descent algorithm is given by,

θj :=θj αθj J(θ0 ,θ1 )

α - Learning rate (Length of each step)
J(θ₀,θ₁) - Cost function
j - Iterator

The value of α plays a key role in determining the gradient descent. A smaller value of α will reduce the speed of computation. On the other hand, a larger value of α may skip the converging point(the situation is called as overshooting).

The screenshot from Andrew NG's course explains both issues perfectly.

In the equation,

θj:=θjαθjJ(θ0,θ1)  
the derivative part or slope will reduce and reaches zero at the minimum value. i.e,

θj:=θjα * 0 =>
𝜃 = 𝜃𝚥 => Minimum value.











Comments

Popular posts from this blog

Unsupervised Learning

Automate Blog Post creation using Blogger APIs and Python

Setting up Python Flask server on internet via Port forwarding

The beginning of Data Quest

Setting up Jupyter Lab integrated with Python, Julia, R on Windows Subsystem for Linux (Ubuntu)