tensorflow RNN实例

xiaoxiao2021-02-28 54

我的书：淘宝购买链接当当购买链接京东购买链接

本实例基于谷歌tensorflow官网RNN tutorial，Basic LSTM，侧重代码分析，包括数据预处理。

##read.py ###_read_words函数

读取ptb文件，按utf-8格式读入，换行符使用替换，读取到的将组成list,可以通过如下的命令行模式下进行测试。

with tf.gfile.GFile("/home/gsc/envtensorflow/deep_learn/models/tutorials/simple-examples/data/ptb.train.txt", "r") as f: data =f.read().decode("utf-8").replace("\n", "<eos>").split()

图ptb_1

###_build_vocab函数统计每个单词出现的次数，counter是字典格式，key是单词，value是该单词出现的次数

counter = collections.Counter(data)

得到一个list，list的每个元素是一个元组，list中单词出现的次数是降序排序过的。如图ptb_2

count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

图ptb_2 将单词和出现的次数分开存放在words和_中。_这种表示一般就是不用。

words, _ = list(zip(*count_pairs))

将单词和其顺序编码后，以字典的形式存在word_to_id中，顺序从0开始，见ptb_3

word_to_id = dict(zip(words, range(len(words))))

图ptb_3 最后返回单词序列和其对应的值如{'the'， 0}, {'<unk>',1}, ..., {'federal', 100}

train_data = _file_to_word_ids(train_path, word_to_id)

这里把这个函数展开成如下：

train_data = [] for word in data: if word in word_to_id: train_data.append(word_to_id[word])

train_data 存放的就是每一个每一个单词对应在word_to_id的索引值，比如aer在word_to_id中的索引值是9970，则train_data的第一个元素就是train_data[0]=9970… 获得总的单词的个数

vocabulary = len(word_to_id)

###ptb_producer函数：参数raw_data：train_data, batch_size:20, num_steps:20,以small方式进行分析。将numpy的array转换成tensorflow需要的tensor，见ptb_5 ptb_5

raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

把原始一位数组的数据，转换成20行，batch_len的列，batch_len 是数据总长度除以batch_size的值。

data = tf.reshape(raw_data[0 : batch_size * batch_len], [batch_size, batch_len])

这里创建的i类似于c语言中中的for循环中i的作用。shuffle表示不要重拍，i的值就是从0-epoch_size-1.

i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()

将data数据进行了维度切分，分成了[batch_size-0, (i+1)*num_steps - i *num_steps]= [20, 20],实际上是对data数据按照20列的次序进行切割。

x = tf.strided_slice(data, [0, i * num_steps], [batch_size, (i + 1) * num_steps])

y和x基本上是一样的，即相当于是y[n]=x[n-1]，即把对应的元素后移一个单位，这是可以理解的，x做为训练输入数据，y做为label，label的标准就是判断其下一个输出。

##ptb_word_lm.py ###获得配置根据

config = get_config()

根据配置的模式，获取配置参数，这里假设Small模式

PTBInput类： self.input_data self.targets 将都是[20, 20]的tensor。见ptb_6 图ptb_6

train_input = PTBInput(config=config, data=train_data, name="TrainInput")

train_input就是一个class的实例，后面在使用使，需要使用诸如：

train_input.targets train_input.input_data

###class PTBModel(object):训练模型核心 ####basic Cell 这是继数据预处理之后的另一个核心模块。从2017年3月17号起（tensorflow1.0之前），tf.contrib.rnn.BasicLSTMCell的参数中，并没有reuse参数。这里是兼容新旧两种版本。

if 'reuse' in inspect.getargspec( tf.contrib.rnn.BasicLSTMCell.__init__).args: return tf.contrib.rnn.BasicLSTMCell( size, forget_bias=0.0, state_is_tuple=True, reuse=tf.get_variable_scope().reuse) else: return tf.contrib.rnn.BasicLSTMCell( size, forget_bias=0.0, state_is_tuple=True)

创建的BasicLSTMCell放在了attn_cell里，或者说attn_cell是一个实例。

attn_cell = lstm_cell

这里实现的LSTM，是最基本的LSTM，其论文在http://arxiv.org/abs/1409.2329，这里直接粘贴公式：图ptb_8 图ptb_9 有dropout 看结构：图ptb_10 为了便于看ptb_10中的tensor和维度关系，这里需要对公式重新按上图罗列一下（和前一篇有些重复了，参看http://blog.csdn.net/shichaog/article/details/72853665 ）：

$h_t^j=o_t^j\odot \tanh(c_t^j)$ $c_t^j=f_t^j\odot c_{t-1}^j+i_t^j\odot j_t^j$ $o_t^j=\sigma(W_{xo}X_t+W_{ho}h_{t-1}+b_o)^j$ $f_t^j=\tanh(W_{xf}x_t+W_{hf}h_{t-1}+b_f)^j$ $j_t^j=\sigma(W_{xj}x_t+W_{hj}h_{t-1}+b_j)^j$ $i_t^j=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i)^j$ 这里我要对上面的公式进行重组一下。把权重和输入组合成一个大矩阵。 $x_t h_(t-1)][W_{xi} W_{xj} W_{xf} W{xo} W_{hi} W_{hj}W_{hf}W_{ho}]$ 将输入也进行重组那么就有如下的重组计算公式： $[i_t^{'} j_t^{'} f_t^{'} o_t^{'}]=[X_{t(2*200)}h_{t-1(2*200)}]\odot[W_{xi(400*100)} W_{xj(400*100)} W_{xf(400*100)} W_{xo(400*100)}W_{hi(400*100)} W_{hj(400*100)}W_{hf(400*100)}W_{ho(400*100)}]$ 上面的推导，就是basic_lstm_cell_1中的数据维度的关系。相乘后矩阵在进行split，得到20*200的维度矩阵，再分别basic_lstm_cell中做 $\sigma \tanh$ 操作，这些计算就是basic LSTM中给定的操作。将tensorboard打开后，可以看到如下的具体细节：图ptb_11 split的上一层各node的连线从左到右一次对应于 $f_t, i_t,j_t,o_t$ 上的连线。看图ptb_10， $i_t和j_t$ 相乘得到 $mul_1$ 节点， $m u l$ 是 $f_t和c_{t-1}$ 节点的乘积。后面把 $c_t和h_t$ 这两个tensor传递给mlti_rnn_cell_1，把 $h_t和输入x_t$ 传递给basic_lstm_cell_1. 总结来说就是把multi_rnn_cell的cell_0的 $c_t和h_t$ 传递给multi_rnn_cell_1的cell_0的 $c_t和h_t$ ;把multi_rnn_cell的cell_1的 $c_t和h_t$ 传递给multi_rnn_cell_1的cell_1的 $c_t和h_t$ ; 由于config.num_layers的值等于2，所以创建了循环两次。

cell = tf.contrib.rnn.MultiRNNCell( [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True) self._initial_state = cell.zero_state(batch_size, data_type()) 得到的结构如下ptb_12

ptb_12

with tf.device("/cpu:0"): embedding = tf.get_variable( "embedding", [vocab_size, size], dtype=data_type()) inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

见ptb_13&ptb_14 ptb_13 ptb_14 ####RNN堆叠接下来创建了一个RNN的变量空间。

with tf.variable_scope("RNN"): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output)

ptb_15 这里根据time_step的值，共进行了20次，这里ptb_15只截屏到了几个，为了让细节看的更清楚，每个multi_rnn_cell之间都有四个tensor，这四个tensor分别是cell0和cell1的 $c_t和h_t$ 。最后总共输出20个tensor。每一个tensor都是20200维度的。图ptb_16 然后经过stack和reshape操作，得到400200的矩阵。

output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])

接下来定义了权重和bias

softmax_w = tf.get_variable( "softmax_w", [size, vocab_size], dtype=data_type()) softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())

ptb_17 它们的维度如上图。至此，可以看看embedding,RNN，w和b的关系。 ptb_18 ###损失函数

logits = tf.matmul(output, softmax_w) + softmax_b loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example( [logits], [tf.reshape(input_.targets, [-1])], [tf.ones([batch_size * num_steps], dtype=data_type())]) self._cost = cost = tf.reduce_sum(loss) / batch_size self._final_state = state

seq2seq模型，这里不解释，放到seq2seq。 ####learning rate跟新

self._lr = tf.Variable(0.0, trainable=False) tvars = tf.trainable_variables() grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), config.max_grad_norm)

为了处理gradient explosion和gradient vanishing，使用clip方式，将梯度限制在合理范围。

##训练过程

for i in range(config.max_max_epoch):

根据重复13（max_max_epoch）次遍历所有训练数据。初始化模型的学习率。

lr_decay = config.lr_decay ** max(i + 1 - config.max_epoch, 0.0) m.assign_lr(session, config.learning_rate * lr_decay) print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr))) train_perplexity = run_epoch(session, m, eval_op=m.train_op, verbose=True)

###run_epoch 这个函数首先初始化模型的初始化状态。

state = session.run(model.initial_state)

然后将模型的cost和state存到fetches字典里。

fetches={ return self._input "cost": model.cost, "final_state": model.final_state, @property } if eval_op is not None:

训练过程

//epoch_size==13 ,所以这里执行了13次 for step in range(model.input.epoch_size): feed_dict = {} //每一次都获取LSTM的状态，$c_t$和$h_t$,并把新的状态放到填充字典中。enumerate是内置遍历函数，i变成从0开始增加的非负整数，(c,h)是state组成的元组 for i, (c, h) in enumerate(model.initial_state): feed_dict[c] = state[i].c feed_dict[h] = state[i].h //启动计算图，获得$model.cost$和$model.final_state$节点计算值 vals = session.run(fetches, feed_dict) 这里取出cost，和state是为了计算perplexity值，perplexity值可以看成是备选词的数量，所以该值越小越好。 cost = vals["cost"] state = vals["final_state"] costs += cost iters += model.input.num_steps

##付录，BasicLSTM实现源码这个不难，看ptb_10和ptb_11就可以明白，这里不具体分析代码

class BasicLSTMCell(RNNCell): """Basic LSTM recurrent network cell. The implementation is based on: http://arxiv.org/abs/1409.2329. We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training. It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline. For advanced models, please use the full LSTMCell that follows. """ def __init__(self, num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=tanh, reuse=None): """Initialize the basic LSTM cell. Args: num_units: int, The number of units in the LSTM cell. forget_bias: float, The bias added to forget gates (see above). input_size: Deprecated and unused. state_is_tuple: If True, accepted and returned states are 2-tuples of the `c_state` and `m_state`. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated. activation: Activation function of the inner states. reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. """ if not state_is_tuple: logging.warn("%s: Using a concatenated state is slower and will soon be " "deprecated. Use state_is_tuple=True.", self) if input_size is not None: logging.warn("%s: The input_size parameter is deprecated.", self) self._num_units = num_units self._forget_bias = forget_bias self._state_is_tuple = state_is_tuple self._activation = activation self._reuse = reuse @property def state_size(self): return (LSTMStateTuple(self._num_units, self._num_units) if self._state_is_tuple else 2 * self._num_units) @property def output_size(self): return self._num_units def __call__(self, inputs, state, scope=None): """Long short-term memory cell (LSTM).""" with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse): # Parameters of gates are concatenated into one multiply for efficiency. if self._state_is_tuple: c, h = state else: c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1) concat = _linear([inputs, h], 4 * self._num_units, True) # i = input_gate, j = new_input, f = forget_gate, o = output_gate i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1) new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) * self._activation(j)) new_h = self._activation(new_c) * sigmoid(o) if self._state_is_tuple: new_state = LSTMStateTuple(new_c, new_h) else: new_state = array_ops.concat([new_c, new_h], 1) return new_h, new_state

转载请注明原文地址: https://www.6miu.com/read-69693.html

技术

最新回复(0)