深度学习中Dropout和Layer Normalization技术的使用

xiaoxiao2021-02-28  4

两者的论文:

Dropout:http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Layer Normalization:  https://arxiv.org/abs/1607.06450

RECURRENT NEURAL NETWORK REGULARIZATION https://arxiv.org/pdf/1409.2329.pdf

两者的实现(以nematus为例子):

https://github.com/EdinburghNLP/nematus/blob/master/nematus/layers.py

GUR中搞Dropout的地方:

readout那一层的操作:

疑问:

1. 为什么Dropout放在LN前面?

其他人不是这个顺序

https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout-in-tensorflow

BatchNorm -> ReLu(or other activation) -> Dropout 

2. 为什么 state_below_,pctx_也要做LN?(后面没有直接上激活函数呢?)

在gru_layer中,state_below_做LN(输入的是src):

在gru_cond_layer中,state_below_又不做LN(输入的是trg):

3. Dropout以在Scan里面生成不行:https://groups.google.com/forum/#!topic/lasagne-users/3eyaV3P0Y-E 

                                                         https://groups.google.com/forum/#!topic/theano-users/KAN1j7iey68

4. Dropout in RNN

RECURRENT NEURAL NETWORK REGULARIZATION里介绍上一个hidden state传进来不要记性dropout(Figure 2),但是Nematus里面却搞了...

5. residual connections

关于residual connections,https://github.com/harvardnlp/seq2seq-attn写着:res_net: Use residual connections between LSTM stacks whereby the input to the l-th LSTM layer of the hidden state of the l-1-th LSTM layer summed with hidden state of the l-2th LSTM layer. We didn't find this to really help in our experiments.

转载请注明原文地址: https://www.6miu.com/read-1150353.html

最新回复(0)