训练一个简单的神经网络(基于CS231n)

xiaoxiao2021-02-28  116

训练一个神经网络完整代码

本文是基于Stanford CS231n的学习笔记,搭建一个简单的神经网络分类器。 承接上文

训练一个神经网络

很显然,一个线性分类器对于一个螺旋数据集效果是不够好的,所以我们决定使用神经网络。增加一个隐藏层就能够满足这个简单的数据集,现在我们需要两组权重和偏移值(对于第一和第二层)。

# initialize parameters randomly h = 100 # size of hidden layer W = 0.01 * np.random.randn(D,h) b = np.zeros((1,h)) W2 = 0.01 * np.random.randn(h,K) b2 = np.zeros((1,K))

然后计算得分的函数现在变成以下:

# evaluate class scores with a 2-layer Neural Network hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation scores = np.dot(hidden_layer, W2) + b2

注意:唯一的变化是,我们先计算出一个值基于隐藏层,然后再在这个计算出的值的基础上计算得分scores。 关键地,我们添加了一个非线性,在这个例子中使用的是ReLU激活函数。

其他东西还是保持一样,我们基于得分scores计算损失值loss还是跟之前一样,然后得到得分scores的梯度dscores也是跟之前一样。 然而,我们后向传播梯度到模型现在改变了。首先,让我们后向传播到神经网络的第二层,这看起来跟之前的代码几乎一样,除了我们把 x 换成了变量 hidden_layer:

# backpropate the gradient to the parameters # first backprop into parameters W2 and b2 dW2 = np.dot(hidden_layer.T, dscores) db2 = np.sum(dscores, axis=0, keepdims=True)

然而,由于 hidden_layer 本身也是个包含其他参数的函数,我们还需后向传播通过这个变量:

dhidden = np.dot(dscores, W2.T)

现在我们把梯度传播到了隐藏层的输出,下一步,我们需要后向传播到ReLU。 ReLU的构造相对简单, r=max(0,x) ,可得 drdx=1(x>0) 。 结合复合函数求导规则,我们可以看到ReLU单元不会改变梯度,如果它的输入是大于0,但是会消除提低,假如它的输入小于0。 所以,我们可以后向传到ReLU如下:

# backprop the ReLU non-linearity dhidden[hidden_layer <= 0] = 0

最后我们终于传播到了第一层的权重和偏移值:

# finally into W,b dW = np.dot(X.T, dhidden) db = np.sum(dhidden, axis=0, keepdims=True)

接下来就是更新参数。

完整代码

以下是完整代码:

import numpy as np import matplotlib.pyplot as plt # initialize parameters randomly N = 100 # number of points per class D = 2 # dimensionality K = 3 # number of classes X = np.zeros((N*K,D)) # data matrix (each row = single example) y = np.zeros(N*K, dtype='uint8') # class labels for j in xrange(K): ix = range(N*j,N*(j+1)) r = np.linspace(0.0,1,N) # radius t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta X[ix] = np.c_[r*np.sin(t), r*np.cos(t)] y[ix] = j plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral) D=2 K=3 W = 0.01 * np.random.randn(D,K) b = np.zeros((1,K)) # initialize parameters randomly h = 100 # size of hidden layer W = 0.01 * np.random.randn(D,h) b = np.zeros((1,h)) W2 = 0.01 * np.random.randn(h,K) b2 = np.zeros((1,K)) # some hyperparameters step_size = 1e-0 reg = 1e-3 # regularization strength # gradient descent loop num_examples = X.shape[0] for i in xrange(10000): # evaluate class scores, [N x K] hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation scores = np.dot(hidden_layer, W2) + b2 # compute the class probabilities exp_scores = np.exp(scores) probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K] # compute the loss: average cross-entropy loss and regularization corect_logprobs = -np.log(probs[range(num_examples),y]) data_loss = np.sum(corect_logprobs)/num_examples reg_loss = 0.5*reg*np.sum(W*W) + 0.5*reg*np.sum(W2*W2) loss = data_loss + reg_loss if i % 1000 == 0: print "iteration %d: loss %f" % (i, loss) # compute the gradient on scores dscores = probs dscores[range(num_examples),y] -= 1 dscores /= num_examples # backpropate the gradient to the parameters # first backprop into parameters W2 and b2 dW2 = np.dot(hidden_layer.T, dscores) db2 = np.sum(dscores, axis=0, keepdims=True) # next backprop into hidden layer dhidden = np.dot(dscores, W2.T) # backprop the ReLU non-linearity dhidden[hidden_layer <= 0] = 0 # finally into W,b dW = np.dot(X.T, dhidden) db = np.sum(dhidden, axis=0, keepdims=True) # add regularization gradient contribution dW2 += reg * W2 dW += reg * W # perform a parameter update W += -step_size * dW b += -step_size * db W2 += -step_size * dW2 b2 += -step_size * db2 hidden_layer = np.maximum(0, np.dot(X, W) + b) scores = np.dot(hidden_layer, W2) + b2 predicted_class = np.argmax(scores, axis=1) print 'training accuracy: %.2f' % (np.mean(predicted_class == y))

Desicion Doundary:

转载请注明原文地址: https://www.6miu.com/read-49496.html

最新回复(0)