两者的论文:
Dropout:http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Layer Normalization: https://arxiv.org/abs/1607.06450
RECURRENT NEURAL NETWORK REGULARIZATION https://arxiv.org/pdf/1409.2329.pdf
两者的实现(以nematus为例子):
https://github.com/EdinburghNLP/nematus/blob/master/nematus/layers.py
GUR中搞Dropout的地方:
readout那一层的操作:
疑问:
1. 为什么Dropout放在LN前面?
其他人不是这个顺序
https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout-in-tensorflow
BatchNorm -> ReLu(or other activation) -> Dropout
2. 为什么 state_below_,pctx_也要做LN?(后面没有直接上激活函数呢?)
在gru_layer中,state_below_做LN(输入的是src):
在gru_cond_layer中,state_below_又不做LN(输入的是trg):
3. Dropout以在Scan里面生成不行:https://groups.google.com/forum/#!topic/lasagne-users/3eyaV3P0Y-E
https://groups.google.com/forum/#!topic/theano-users/KAN1j7iey68
4. Dropout in RNN
RECURRENT NEURAL NETWORK REGULARIZATION里介绍上一个hidden state传进来不要记性dropout(Figure 2),但是Nematus里面却搞了...
5. residual connections
关于residual connections,https://github.com/harvardnlp/seq2seq-attn写着:res_net: Use residual connections between LSTM stacks whereby the input to the l-th LSTM layer of the hidden state of the l-1-th LSTM layer summed with hidden state of the l-2th LSTM layer. We didn't find this to really help in our experiments.