训练一个神经网络需要一些步骤。比如指定训练数据的输入,模型参数初始化,执行前向和后向计算,梯度下降并更新参数,模型的保存和恢复等。在预测的时候,这些步骤也需要进行多次。对于初学者和经验丰富的开发者来说,这些都是能让人头疼的问题。
幸运的是,MXNet把这些常用的操作模块化在Module包内。Module提供高层和中间层API来操作定义好的网络。我们可以切换使用这些方法,这篇文章将介绍这些方法的使用。
我们需要以下工具:
MXNetJupyter Notebook和Python Requests包 pip install jupyter requests在这篇教程里我们将在 UCI letter recognition(英文字母识别)数据集上训练一个多层感知机模型。
首先下载数据集,根据80:20的比例分割训练、测试数据。并且初始化2个batch size为32的数据迭代器,分别用于训练集和测试集。
import logging logging.getLogger().setLevel(logging.INFO) import mxnet as mx import numpy as np fname = mx.test_utils.download('http://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.data') data = np.genfromtxt(fname, delimiter=',')[:,1:] label = np.array([ord(l.split(',')[0])-ord('A') for l in open(fname, 'r')]) batch_size = 32 ntrain = int(data.shape[0]*0.8) train_iter = mx.io.NDArrayIter(data[:ntrain, :], label[:ntrain], batch_size, shuffle=True) val_iter = mx.io.NDArrayIter(data[ntrain:, :], label[ntrain:], batch_size)接下来,定义网络结构。
net = mx.sym.Variable('data') net = mx.sym.FullyConnected(net, name='fc1', num_hidden=64) net = mx.sym.Activation(net, name='relu1', act_type="relu") net = mx.sym.FullyConnected(net, name='fc2', num_hidden=26) net = mx.sym.SoftmaxOutput(net, name='softmax') mx.viz.plot_network(net) 现在我们开始介绍Module的使用。我们可以用以下参数构建一个Module:
symbol:定义好的网络context:用于执行计算的硬件列表(CPU,GPU)data_names:输入数据的变量名列表label_names:输入标签的变量名列表在上面的net里,我们只有一个输入数据data,和一个标签数据softmax_label。softmax_label是自动生成的变量名,因为我们使用了SoftmaxOutput操作符。
mod = mx.mod.Module(symbol=net, context=mx.cpu(), data_names=['data'], label_names=['softmax_label'])创建好了Module之后,我们来看看怎么使用中间层接口执行模型的训练和预测。这些API可以方便的让开发者逐步计算前向和后向传播,对于debug也很有帮助。
训练module,需要以下步骤:
bind:分配内存以准备计算init_params:指定并初始化参数init_optimizer:初始化优化器,默认是sgdmetric.create:定义评价函数forward:执行前向传播update_metric:计算并累加前向传播的损失值backward:后向传播update:根据定义的优化器以及前向、后向传播得到的导数更新参数代码:
# allocate memory given the input data and label shapes mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label) # initialize parameters by uniform random numbers mod.init_params(initializer=mx.init.Uniform(scale=.1)) # use SGD with learning rate 0.1 to train mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.1), )) # use accuracy as the metric metric = mx.metric.create('acc') # train 5 epochs, i.e. going over the data iter one pass for epoch in range(5): train_iter.reset() metric.reset() for batch in train_iter: mod.forward(batch, is_train=True) # compute predictions mod.update_metric(metric, batch.label) # accumulate prediction accuracy mod.backward() # compute gradients mod.update() # update parameters print('Epoch %d, Training %s' % (epoch, metric.get())) Epoch 0, Training ('accuracy', 0.4554375) Epoch 1, Training ('accuracy', 0.6485625) Epoch 2, Training ('accuracy', 0.7055625) Epoch 3, Training ('accuracy', 0.7396875) Epoch 4, Training ('accuracy', 0.764375)除了上一步的接口,我们还可以更简单的调用fit方法执行同样的步骤。
# reset train_iter to the beginning train_iter.reset() # create a module mod = mx.mod.Module(symbol=net, context=mx.cpu(), data_names=['data'], label_names=['softmax_label']) # fit the module mod.fit(train_iter, eval_data=val_iter, optimizer='sgd', optimizer_params={'learning_rate':0.1}, eval_metric='acc', num_epoch=8) INFO:root:Epoch[0] Train-accuracy=0.364625 INFO:root:Epoch[0] Time cost=0.388 INFO:root:Epoch[0] Validation-accuracy=0.557250 INFO:root:Epoch[1] Train-accuracy=0.633625 INFO:root:Epoch[1] Time cost=0.470 INFO:root:Epoch[1] Validation-accuracy=0.634750 INFO:root:Epoch[2] Train-accuracy=0.697187 INFO:root:Epoch[2] Time cost=0.402 INFO:root:Epoch[2] Validation-accuracy=0.665500 INFO:root:Epoch[3] Train-accuracy=0.735062 INFO:root:Epoch[3] Time cost=0.402 INFO:root:Epoch[3] Validation-accuracy=0.713000 INFO:root:Epoch[4] Train-accuracy=0.762563 INFO:root:Epoch[4] Time cost=0.408 INFO:root:Epoch[4] Validation-accuracy=0.742000 INFO:root:Epoch[5] Train-accuracy=0.782312 INFO:root:Epoch[5] Time cost=0.400 INFO:root:Epoch[5] Validation-accuracy=0.778500 INFO:root:Epoch[6] Train-accuracy=0.797188 INFO:root:Epoch[6] Time cost=0.392 INFO:root:Epoch[6] Validation-accuracy=0.798250 INFO:root:Epoch[7] Train-accuracy=0.807750 INFO:root:Epoch[7] Time cost=0.401 INFO:root:Epoch[7] Validation-accuracy=0.789250fit函数中eval_metric 默认值accuracy, optimizer默认值 sgd ,optimizer_params默认值 (('learning_rate', 0.01),)。
我们可以使用predict()预测,返回预测值
y = mod.predict(val_iter) assert y.shape == (4000, 26)如果我们不需要预测值,但是需要评价在测试集上的表现,可以调用score()方法。它会根据指定的评价函数评估模型。
score = mod.score(val_iter, ['acc']) print("Accuracy score is %f" % (score[0][1])) assert score[0][1] > 0.77, "Achieved accuracy (%f) is less than expected (0.77)" % score[0][1]其他评价函数(指标)还包括top_k_acc,F1,RMSE, MSE,MAE, ce。参考 Evaluation metric。
我们可以调整epoch数,学习率,优化器等参数来获得更好的表现。
在每次epoch训练结束之后,我们可以通过checkpoint接口保存模型参数。
# construct a callback function to save checkpoints model_prefix = 'mx_mlp' checkpoint = mx.callback.do_checkpoint(model_prefix) mod = mx.mod.Module(symbol=net) mod.fit(train_iter, num_epoch=5, epoch_end_callback=checkpoint) INFO:root:Epoch[0] Train-accuracy=0.101062 INFO:root:Epoch[0] Time cost=0.422 INFO:root:Saved checkpoint to "mx_mlp-0001.params" INFO:root:Epoch[1] Train-accuracy=0.263313 INFO:root:Epoch[1] Time cost=0.785 INFO:root:Saved checkpoint to "mx_mlp-0002.params" INFO:root:Epoch[2] Train-accuracy=0.452188 INFO:root:Epoch[2] Time cost=0.624 INFO:root:Saved checkpoint to "mx_mlp-0003.params" INFO:root:Epoch[3] Train-accuracy=0.544125 INFO:root:Epoch[3] Time cost=0.427 INFO:root:Saved checkpoint to "mx_mlp-0004.params" INFO:root:Epoch[4] Train-accuracy=0.605250 INFO:root:Epoch[4] Time cost=0.399 INFO:root:Saved checkpoint to "mx_mlp-0005.params"调用load_checkpoint方法加载模型,包括模型结构(Symbol)和参数。然后赋值给module。
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 3) assert sym.tojson() == net.tojson() # assign the loaded parameters to the module mod.set_params(arg_params, aux_params)如果我们想从保存的模型上继续训练,可以调用fit(),而不是set_params()。将加载的参数传进去,然后fit()知道从这些参数开始训练而不是随机初始化。我们通常也设置一个begin_epoch参数,这样fit()就知道是从哪一个epoch恢复训练的。
mod = mx.mod.Module(symbol=sym) mod.fit(train_iter, num_epoch=21, arg_params=arg_params, aux_params=aux_params, begin_epoch=3) assert score[0][1] > 0.77, "Achieved accuracy (%f) is less than expected (0.77)" % score[0][1] INFO:root:Epoch[3] Train-accuracy=0.544125 INFO:root:Epoch[3] Time cost=0.398 INFO:root:Epoch[4] Train-accuracy=0.605250 INFO:root:Epoch[4] Time cost=0.545 INFO:root:Epoch[5] Train-accuracy=0.644312 INFO:root:Epoch[5] Time cost=0.592 INFO:root:Epoch[6] Train-accuracy=0.675000 INFO:root:Epoch[6] Time cost=0.491 INFO:root:Epoch[7] Train-accuracy=0.695812 INFO:root:Epoch[7] Time cost=0.363原文地址:Module - Neural network training and inference
