RNN变体之dropout

xiaoxiao2021-02-28 130

问题

RNN在迭代运用状态转换操作“输入到隐状态”实现任意长序列的定长表示时，会遭遇到“对隐状态扰动过于敏感”的困境。

dropout

dropout的数学形式化:

y=f(W⋅d(x)) , 其中

d(x)={mask∗x, if train phaseing(1−p)x,otherwise

p 为dropout率，mask为以1-p为概率的贝努力分布生成的二值向量

rnn dropout

改变传统做法“在每个时间步采用不同的mask对隐节点进行屏蔽”，提出新的策略(如下图所示),其特点是：1）generates the dropout mask only at the beginning of each training sequence and fixes it through the sequence；2）dropping both the non-recurrent and recurrent connections。 Moon T, Choi H, Lee H, et al. RNNDROP: A novel dropout for RNNS in ASR[C]// Automatic Speech Recognition and Understanding. IEEE, 2016:65-70.

recurrent dropout

思想：通过dropout LSTM/GRU中的input或update门以prevents the loss of long-term memories built up in the states/cells 。

简单RNN及其dropout: RNN: ht=f(Wh⊙[xt,ht−1] bh); dropout: ht=f(Wh⊙[xt,d(ht−1)]+bh) , d(⋅) 为dropout函数 LSTM： ct=ft⊙ct−1+it⊙d(gt) GRU: ht=(1−zt)⊙ct−1+zt⊙d(gt) 从理论上讲， masks can be applied to any subset of the gates, cells, and states. 文献：Semeniuta S, Severyn A, Barth E. Recurrent Dropout without Memory Loss[J]. 2016.

垂直连接的dropout

针对多层LSTM网络，对其垂直连接进行随机dropout, 也即是否允许 L 层某个LSTM单元的隐状态信息流入L 1层对应单元图中虚线是进行随机dropout的操作对象。 dropout操作后的信息流文献: Zaremba W, Sutskever I, Vinyals O. Recurrent Neural Network Regularization[C]. ICLR 2015. 源码：https://github.com/wojzaremba/lstm .

基于变分推理的dropout

图中虚线代表不进行dropout,而不同颜色的实线表示不同的mask。传统dropout rnn: use different masks at different time steps 基于变分推理的dropout: uses the same dropout mask at each time step, including the recurrent layers 基于变分推理的dropout的具体实现（上图（b）的实线颜色可知）：为每个连接矩阵一次性生成贝努力随机变量的mask，然后在后续的每个时间点上都采用相同的mask. 文献：Gal Y. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks[J]. Statistics, 2015:285-290. 源码: http://yarin.co/BRNN

Zoneout

ct=dct⊙ct−1+(1−dct)⊙ft⊙ct−1+it⊙gt

ht=dht⊙ht−1+(1−dht)⊙ot⊙tanh(ct−1⊙ft+it⊙gt) 其中 dht 为0与1的二值随机向量。

文献：Krueger D, Maharaj T, Kramár J, et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations[C]. ICLR 2017 源码：http://github.com/teganmaharaj/zoneout

转载请注明原文地址: https://www.6miu.com/read-70511.html

技术

最新回复(0)