深度学习中Dropout和Layer Normalization技术的使用

xiaoxiao2021-02-28 17

两者的论文：

Dropout：http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Layer Normalization: https://arxiv.org/abs/1607.06450

RECURRENT NEURAL NETWORK REGULARIZATION https://arxiv.org/pdf/1409.2329.pdf

两者的实现（以nematus为例子）：

https://github.com/EdinburghNLP/nematus/blob/master/nematus/layers.py

GUR中搞Dropout的地方：

readout那一层的操作：

疑问：

1. 为什么Dropout放在LN前面？

其他人不是这个顺序

https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout-in-tensorflow

BatchNorm -> ReLu(or other activation) -> Dropout

2. 为什么 state_below_，pctx_也要做LN？（后面没有直接上激活函数呢？）

在gru_layer中，state_below_做LN（输入的是src）：

在gru_cond_layer中，state_below_又不做LN（输入的是trg）：

3. Dropout以在Scan里面生成不行：https://groups.google.com/forum/#!topic/lasagne-users/3eyaV3P0Y-E

https://groups.google.com/forum/#!topic/theano-users/KAN1j7iey68

4. Dropout in RNN

RECURRENT NEURAL NETWORK REGULARIZATION里介绍上一个hidden state传进来不要记性dropout(Figure 2)，但是Nematus里面却搞了...

5. residual connections

关于residual connections，https://github.com/harvardnlp/seq2seq-attn写着：res_net: Use residual connections between LSTM stacks whereby the input to the l-th LSTM layer of the hidden state of the l-1-th LSTM layer summed with hidden state of the l-2th LSTM layer. We didn't find this to really help in our experiments.

转载请注明原文地址: https://www.6miu.com/read-1150353.html

技术

最新回复(0)