计算两个矩阵之间的欧式距离

xiaoxiao2021-02-28 55

在我们使用k-NN模型时，需要计算测试集中每一点到训练集中每一点的欧氏距离，即需要求得两矩阵之间的欧氏距离。在实现k-NN算法时通常有三种方案，分别是使用两层循环，使用一层循环和不使用循环。

使用两层循环

分别对训练集和测试集中的数据进行循环遍历，计算每两个点之间的欧式距离，然后赋值给dist矩阵。此算法没有经过任何优化。

num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) for i in xrange(num_test): for j in xrange(num_train): ##################################################################### # TODO: # # Compute the l2 distance between the ith test point and the jth # # training point, and store the result in dists[i, j]. You should # # not use a loop over dimension. # ##################################################################### # pass dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j]))) ##################################################################### # END OF YOUR CODE # ##################################################################### return dists

使用一层循环

使用矩阵表示训练集的数据，计算测试集中每一点到训练集矩阵的距离，可以对算法优化为只使用一层循环。

def compute_distances_one_loop(self, X): """ Compute the distance between each test point in X and each training point in self.X_train using a single loop over the test data. Input / Output: Same as compute_distances_two_loops """ num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) for i in xrange(num_test): ####################################################################### # TODO: # # Compute the l2 distance between the ith test point and all training # # points, and store the result in dists[i, :]. # ####################################################################### # pass dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1)) ####################################################################### # END OF YOUR CODE # ####################################################################### return dists

不使用循环

运算效率最高的算法是将训练集和测试集都使用矩阵表示，然后使用矩阵运算的方法替代之前的循环操作。但此操作需要我们对矩阵的运算规则非常熟悉。接下来着重记录如何计算两个矩阵之间的欧式距离。

记录测试集矩阵P的大小为M*D，训练集矩阵C的大小为N*D（测试集中共有M个点，每个点为D维特征向量。训练集中共有N个点，每个点为D维特征向量）记 Pi 是P的第i行，记 Cj 是C的第j行 Pi=[Pi1Pi2⋯PiD] Cj=[Cj1Cj2⋯CjD]

首先计算 Pi 和 Cj 之间的距离dist(i,j) d(Pi,Cj)=(Pi1−Cj1)2+(Pi2−Cj2)2+⋯+(PiD−CjD)2−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−√=(P2i1+P2i2+⋯+P2iD)+(C2j1+C2j2+⋯+C2jD)−2×(Pi1Cj1+Pi2Cj2+⋯+PiDCiD)−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−√=∥Pi∥2+∥Cj∥2−2×PiCTj−−−−−−−−−−−−−−−−−−−−−√

我们可以推广到距离矩阵的第i行的计算公式 dist[i]=(∥Pi∥2∥Pi∥2⋯∥Pi∥2)+(∥C1∥2∥C2∥2⋯∥CN∥2)−2×Pi(CT1CT2⋯CTN)−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−√=(∥Pi∥2∥Pi∥2⋯∥Pi∥2)+(∥C1∥2∥C2∥2⋯∥CN∥2)−2×PiCT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−√

继续将公式推广为整个距离矩阵 dist=⎛⎝⎜⎜⎜⎜⎜⎜∥P1∥2∥P2∥2⋮∥PM∥2∥P1∥2∥P2∥2⋮∥PM∥2⋯⋯⋱⋯∥P1∥2∥P2∥2⋮∥PM∥2⎞⎠⎟⎟⎟⎟⎟⎟+⎛⎝⎜⎜⎜⎜⎜⎜∥C1∥2∥C1∥2⋮∥C1∥2∥C2∥2∥C2∥2⋮∥C2∥2⋯⋯⋱⋯∥CN∥2∥CN∥2⋮∥CN∥2⎞⎠⎟⎟⎟⎟⎟⎟−2×PCT−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−⎷

表示为python代码：

def compute_distances_no_loops(self, X): """ Compute the distance between each test point in X and each training point in self.X_train using no explicit loops. Input / Output: Same as compute_distances_two_loops """ num_test = X.shape[0] num_train = self.X_train.shape[0] dists = np.zeros((num_test, num_train)) ######################################################################### # TODO: # # Compute the l2 distance between all test points and all training # # points without using any explicit loops, and store the result in # # dists. # # # # You should implement this function using only basic array operations; # # in particular you should not use functions from scipy. # # # # HINT: Try to formulate the l2 distance using matrix multiplication # # and two broadcast sums. # ######################################################################### # pass dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)])) ######################################################################### # END OF YOUR CODE # ######################################################################### return dists

转载请注明原文地址: https://www.6miu.com/read-1650000.html

技术

最新回复(0)