[机器学习入门] 李宏毅机器学习笔记-10 (Tips for Deep Learning;深度学习小贴士)
PDFVIDEO
Recipe of Deep Learning
Deeper usually does not imply better
Vanishing Gradient Problem
ReLU(Rectified Linear Unit)
ReLU - variant
**那么除了ReLU有没有别的activation function了呢? 所以我们用 Maxout来根据training data自动生成activation function** ReLU is a special cases of Maxout
Maxout
ReLU is a special cases of Maxout
More than ReLU
Maxout - Training
Adaptive Learning Rate
RMSProp
Hard to find optimal network parameters
Momentum(gradient descent 融入惯性作用)
所以,加了momentum后:
Adam
Early Stopping
Regularization
Regularization - Weight Decay
Dropout
Dropout- Intuitive Reason
Dropout is a kind of ensemble