斯坦福cs231n计算机视觉训练营----Softmax exercise

xiaoxiao2025-04-26  11

第一部分:作业内容

implement a fully-vectorized loss function for the Softmax classifierimplement the fully-vectorized expression for its analytic gradientcheck your implementation with numerical gradientuse a validation set to tune the learning rate and regularization strengthoptimize the loss function with SGDvisualize the final learned weights

其中SGD:随机梯度下降(Stochastic gradient descent)

第二部分:主要代码以及注释

############################################################################# # TODO: Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# for i in range(X.shape[0]): score = np.dot(X[i], W) score -= max(score) #为了数值稳定性 score = np.exp(score) #取指数 softmax_sum = np.sum(score) #得到分母 score /= softmax_sum #除以分母得到softmax #计算梯度 for j in range(W.shape[1]): if j != y[i]: dW[:,j] += score[j]*X[i] else: dW[:,j] -= (1-score[j])*X[i] loss -= np.log(score[y[i]]) #得到交叉熵 loss /= X.shape[0] #求平均 dW /= X.shape[0] #求平均 loss += reg*np.sum(W*W) #正则项 dW += 2*reg*W ############################################################################# # END OF YOUR CODE # #############################################################################

Inline Question 1:

Why do we expect our loss to be close to -log(0.1)? Explain briefly.**

Your answer: 因为w随机初始化,所以每个类计算的得分是相同的,经过softmax之后的概率是一样的,而这是一个十分类的问题,因此每一个类的概率都是0.1,求得交叉熵也就是-log(0.1)

############################################################################# # TODO: Compute the softmax loss and its gradient using no explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# scores = np.dot(X,W) scores -= np.max(scores, axis=1, keepdims=True) #为了数值稳定性 scores = np.exp(scores) #取指数 scores /= np.sum(scores, axis=1, keepdims= True) #除以分母得到softmax ds = np.copy(scores) ds[np.arange(X.shape[0]),y] -= 1 dW = np.dot(X.T,ds) # X*W=S 求导链式法则 loss = scores[np.arange(X.shape[0]),y] loss = -np.log(loss).sum() loss /= X.shape[0] #求平均 dW /= X.shape[0] #求平均 loss += reg * np.sum(W * W) # 正则项 dW += 2 * reg * W ############################################################################# # END OF YOUR CODE # ############################################################################# # Use the validation set to tune hyperparameters (regularization strength and # learning rate). You should experiment with different ranges for the learning # rates and regularization strengths; if you are careful you should be able to # get a classification accuracy of over 0.35 on the validation set. from cs231n.classifiers import Softmax results = {} best_val = -1 best_softmax = None learning_rates = [1e-7, 5e-7] regularization_strengths = [2.5e3, 5e3, 7e3] ################################################################################ # TODO: # # Use the validation set to set the learning rate and regularization strength. # # This should be identical to the validation that you did for the SVM; save # # the best trained softmax classifer in best_softmax. # ################################################################################ from copy import deepcopy for lr in learning_rates: for reg in regularization_strengths: softmax = Softmax() softmax.train(X_train, y_train, lr, reg, 1500, 128) train_pred = softmax.predict(X_train) train_acc = np.mean(train_pred == y_train) val_pred = softmax.predict(X_val) val_acc = np.mean(val_pred == y_val) results[(lr, reg)] = [train_acc, val_acc] if val_acc > best_val: best_val = val_acc best_softmax = deepcopy(softmax) ################################################################################ # END OF YOUR CODE # ################################################################################ # Print out results. for lr, reg in sorted(results): train_accuracy, val_accuracy = results[(lr, reg)] print('lr %e reg %e train accuracy: %f val accuracy: %f' % ( lr, reg, train_accuracy, val_accuracy)) print('best validation accuracy achieved during cross-validation: %f' % best_val)

Inline Question - True or False

It's possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.

Your answer:True

Your explanation:因为 根据svm 的公式有可能加的数据点对svm来讲比较好辨识,所以取max之后都是0,但是对于softmax而言,总会得到一个概率分布,然后算出交叉熵,换言之,softmax的loss总会加上一个量,即使是一个很小的量。

转载请注明原文地址: https://www.6miu.com/read-5029146.html
最新回复(0)