机器学习笔记之学习速率

xiaoxiao2021-02-27  193

GradientDescent in Practice II - Learning Rate

Note: [5:20 - the x -axis label in theright graph should be θ rather than No. of iterations ]

Debugginggradient descent. Make a plot with number of iterations onthe x-axis. Now plot the cost function, J(θ) over the number of iterations ofgradient descent. If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test. Declareconvergence if J(θ) decreases by less than E in one iteration, where E is somesmall value such as 10−3. However in practice it's difficult to choose thisthreshold value.

It has been proven that if learning rate α is sufficiently small, then J(θ) will decreaseon every iteration.

To summarize:

If α is toosmall: slow convergence.

If α is toolarge: may not decrease on every iteration and thus may not converge.

 

转载请注明原文地址: https://www.6miu.com/read-12118.html

最新回复(0)