tensorflow复现yolov3（参考keras-yolo3）

xiaoxiao2021-03-01 33

前言

网上找了几个复现的python代码，就数这个keras-yolo3好了，由于想体验下yolov3作者给出的权重文件经历的过程，所以想自己走一遍（c++又菜），看后面计算loss时候的矩阵操作看的头大，还有这个输入给模型的数据结构也不清晰，写个文章记录下，顺便整理下思路，有错误的帮忙指正，谢谢啦~（我用的是windows平台）

一、网络模型复现

首先下载c++对应的代码并运行，由于作者只提供了linux下的代码，网上有大神提供了windows下的，对应地址: AlexeyAB/darknet 相关的配置参考 https://blog.csdn.net/baidu_36669549/article/details/79798587 配置完成后编译生成对应的darknet.exe执行文件，执行如下命令查看网络模型

darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -i 0 -thresh 0.25 dog.jpg layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1 5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5 9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8 12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12 16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15 19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18 22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21 25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24 28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27 31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30 34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33 37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37 41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40 44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43 47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46 50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49 53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52 56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55 59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58 62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62 66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65 69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68 72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71 75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 255 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 255 0.088 BF 82 yolo 83 route 79 84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 255 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 255 0.177 BF 94 yolo 95 route 91 96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 99 conv 128 1 x 1 / 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255 0.353 BF 106 yolo

网络结构及原理参考链接 https://www.cnblogs.com/makefile/p/YOLOv3.html https://blog.csdn.net/chandanyan8568/article/details/81089083 Shortcut Layer（残差），route（直接网络跳到的行数）这两个后面跟的是行数，upsample（上采样）对应到yolo层的代码

def yolo_body(images, num_classes=80): with tf.variable_scope('yolo'): with slim.arg_scope([slim.conv2d, slim.conv2d_transpose, slim.fully_connected], activation_fn=tf.nn.leaky_relu, weights_initializer=tf.truncated_normal_initializer(0.0, 0.01), weights_regularizer=slim.l2_regularizer(0.0005)): net = slim.conv2d(images, 32, 3, scope='conv_1') first_layer = slim.conv2d(net, 64, 3, 2, scope='conv_2') net = slim.conv2d(first_layer, 32, 1, scope='conv_3') net = slim.conv2d(net, 64, 3, scope='conv_4') net = tf.nn.leaky_relu(tf.add(net, first_layer), alpha=0.2) second_layer = slim.conv2d(net, 128, 3, 2, scope='conv_5') for i in range(2): net = slim.conv2d(second_layer, 64, 1, scope='conv_%s' % (str(6 + i * 2))) net = slim.conv2d(net, 128, 3, scope='conv_%s' % (str(7 + i * 2))) second_layer = tf.nn.leaky_relu(tf.add(net, second_layer), alpha=0.2) third_layer = slim.conv2d(second_layer, 256, 3, 2, scope='conv_10') for i in range(8): net = slim.conv2d(third_layer, 128, 1, scope='conv_%s' % (str(11 + i * 2))) net = slim.conv2d(net, 256, 3, scope='conv_%s' % (str(12 + i * 2))) third_layer = tf.nn.leaky_relu(tf.add(net, third_layer), alpha=0.2) fourth_layer = slim.conv2d(third_layer, 512, 3, 2, scope='conv_27') for i in range(8): net = slim.conv2d(fourth_layer, 256, 1, scope='conv_%s' % (str(28 + i * 2))) net = slim.conv2d(net, 512, 3, scope='conv_%s' % (str(29 + i * 2))) fourth_layer = tf.nn.leaky_relu(tf.add(net, fourth_layer), alpha=0.2) fifth_layer = slim.conv2d(fourth_layer, 1024, 3, 2, scope='conv_44') for i in range(4): net = slim.conv2d(fifth_layer, 512, 1, scope='conv_%s' % (str(45 + i * 2))) net = slim.conv2d(net, 1024, 3, scope='conv_%s' % (str(46 + i * 2))) fifth_layer = tf.nn.leaky_relu(tf.add(net, fifth_layer), alpha=0.2) net = slim.conv2d(fifth_layer, 512, 1, scope='conv_53') net = slim.conv2d(net, 1024, 3, scope='conv_54') net = slim.conv2d(net, 512, 1, scope='conv_55') net = slim.conv2d(net, 1024, 3, scope='conv_56') scale_one = slim.conv2d(net, 512, 1, scope='conv_57') net = slim.conv2d(scale_one, 1024, 3, scope='conv_58') detection_one = slim.conv2d(net, 3 * (5 + num_classes), 3, scope='conv_59') scale_two = slim.conv2d(scale_one, 256, 3, scope='conv_60') scale_two = slim.conv2d_transpose(scale_two, 256, 3, 2, scope='conv2d_transpose1') net = tf.concat([scale_two, fourth_layer], axis=3) net = slim.conv2d(net, 256, 1, scope='conv_61') net = slim.conv2d(net, 512, 3, scope='conv_62') net = slim.conv2d(net, 256, 1, scope='conv_63') net = slim.conv2d(net, 512, 3, scope='conv_64') scale_two = slim.conv2d(net, 256, 1, scope='conv_65') net = slim.conv2d(scale_two, 512, 3, scope='conv_66') detection_two = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_67') scale_three = slim.conv2d(scale_two, 128, 1, scope='conv_68') scale_three = slim.conv2d_transpose(scale_three, 128, 3, 2, scope='conv2d_transpose2') net = tf.concat([scale_three, third_layer], axis=3) net = slim.conv2d(net, 128, 1, scope='conv_69') net = slim.conv2d(net, 256, 3, scope='conv_70') net = slim.conv2d(net, 128, 1, scope='conv_71') net = slim.conv2d(net, 256, 3, scope='conv_72') net = slim.conv2d(net, 128, 1, scope='conv_73') net = slim.conv2d(net, 256, 3, scope='conv_74') detection_three = slim.conv2d(net, 3 * (5 + num_classes), 1, scope='conv_75') return detection_one, detection_two, detection_three

网络模型有了后还要计算loss，目前还是大部分的keras项目的代码

def yolo_loss(feats, num_classes, y_true, ignore_thresh=.5): # y_true = [Input(shape=(416 // {0: 32, 1: 16, 2: 8}[l], 416 // {0: 32, 1: 16, 2: 8}[l], \ # 9 // 3, num_classes + 5)) for l in range(3)] loss = 0 m = K.shape(feats[0])[0] # batch size, tensor mf = K.cast(m, K.dtype(feats[0])) grid_shapes = [K.cast(K.shape(feats[l])[1:3], K.dtype(y_true[0])) for l in range(3)] input_shape = K.cast(K.shape(feats[0])[1:3] * 32, K.dtype(y_true[0])) # 10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326 anchors = [[[10, 13], [16, 30], [33, 23]], [[30, 61], [62, 45], [59, 119]], [[116, 90], [156, 198], [373, 326]]] for i in range(3): object_mask = y_true[i][..., 4:5] true_class_probs = y_true[i][..., 5:] # 13 * 13, 16 * 16, 32 * 32 预测的box的大小及位置 grid, raw_pred, pred_xy, pred_wh = yolo_head(feats[i], anchors[i], num_classes, calc_loss=True) pred_box = tf.concat([pred_xy, pred_wh], axis=-1) # Darknet raw box to calculate loss. raw_true_xy = y_true[i][..., :2]*grid_shapes[i][::-1] - grid raw_true_wh = K.log(y_true[i][..., 2:4] / anchors[i] * input_shape[::-1]) raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf box_loss_scale = 2 - y_true[i][...,2:3] * y_true[i][...,3:4] # Find ignore mask, iterate over each of batch. ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True) object_mask_bool = K.cast(object_mask, 'bool') def loop_body(b, ignore_mask): true_box = tf.boolean_mask(y_true[i][b, ..., 0:4], object_mask_bool[b, ..., 0]) iou = box_iou(pred_box[b], true_box) best_iou = tf.reduce_max(iou, axis=-1, keepdims=False) ignore_mask = ignore_mask.write(b, tf.cast(best_iou < ignore_thresh, true_box.dtype)) return b + 1, ignore_mask _, ignore_mask = K.control_flow_ops.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask]) ignore_mask = ignore_mask.stack() ignore_mask = K.expand_dims(ignore_mask, -1) # K.binary_crossentropy is helpful to avoid exp overflow. xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2], from_logits=True) wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4]) confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \ (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) * ignore_mask class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True) xy_loss = K.sum(xy_loss) / mf wh_loss = K.sum(wh_loss) / mf confidence_loss = K.sum(confidence_loss) / mf class_loss = K.sum(class_loss) / mf loss += xy_loss + wh_loss + confidence_loss + class_loss loss = tf.Print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], message='loss: ') return loss

loss有了后就可以定义优化函数进行训练。

二、构建输入数据

安装coco数据集，参考链接：https://blog.csdn.net/oYouHuo/article/details/81114875 安装测试后使用如下代码处理数据集，因为我只检测人，所以写了一个种类

from pycocotools.coco import COCO dataType = 'train2017' annFile = './annotations/instances_{}.json'.format(dataType) def deal_data(): coco = COCO(annFile) cat_ids = coco.getCatIds(catNms=['person']) img_ids = coco.getImgIds(catIds=cat_ids) with open('./deal_data.txt', 'w') as f: for img_id in img_ids: img = coco.loadImgs(img_id)[0] f.write('./images/{}/{}\t'.format(dataType, img['file_name'])) annIds = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None) anns = coco.loadAnns(annIds) for ann in anns: f.write('{},{},{},{}'.format(ann['bbox'][0], ann['bbox'][1], ann['bbox'][0] + ann['bbox'][2], ann['bbox'][1] + ann['bbox'][3])) for index in range(len(cat_ids)): if ann['category_id'] == cat_ids[index]: f.write(',{}'.format(index)) break f.write('\t') f.write('\n') def main(): deal_data() if __name__ == '__main__': main()

上面这段代码执行完后会得到数据集的预处理文件，样子如下：注意：上图中的矩形框的意思分别是left, top, right, bottom，因为我一开始错误处理成了x, y, width, height 上面的标注图片又懒得换了，所以结果和你不一样。上面对数据初步处理后还要将数据处理成yolo对应的数据结构，因为13， 26， 52三个尺寸，每个尺寸对应3个输出，所以对应的数据结构分别是（batch_size, 13, 13, 3, 5+classes_size), （batch_size, 26, 26, 3, 5+classes_size), （batch_size, 52, 52, 3, 5+classes_size), 使用如下代码实现（keras-yolo3作者的代码很6，拿来主义）：

import numpy as np from PIL import Image from matplotlib.colors import rgb_to_hsv, hsv_to_rgb def rand(a=0, b=1): return np.random.rand()*(b-a) + a def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, proc_img=True): '''random preprocessing for real-time data augmentation''' line = annotation_line.split() image = Image.open(line[0]) iw, ih = image.size h, w = input_shape box = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]]) box = np.floor(box) box = box.astype(np.int16) if not random: # resize image scale = min(w/iw, h/ih) nw = int(iw*scale) nh = int(ih*scale) dx = (w-nw)//2 dy = (h-nh)//2 image_data=0 if proc_img: image = image.resize((nw,nh), Image.BICUBIC) new_image = Image.new('RGB', (w,h), (128,128,128)) new_image.paste(image, (dx, dy)) image_data = np.array(new_image)/255. # correct boxes box_data = np.zeros((max_boxes,5)) if len(box)>0: np.random.shuffle(box) if len(box)>max_boxes: box = box[:max_boxes] box[:, [0,2]] = box[:, [0,2]]*scale + dx box[:, [1,3]] = box[:, [1,3]]*scale + dy box_data[:len(box)] = box return image_data, box_data # resize image new_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter) scale = rand(.25, 2) if new_ar < 1: nh = int(scale*h) nw = int(nh*new_ar) else: nw = int(scale*w) nh = int(nw/new_ar) image = image.resize((nw,nh), Image.BICUBIC) # place image dx = int(rand(0, w-nw)) dy = int(rand(0, h-nh)) new_image = Image.new('RGB', (w,h), (128,128,128)) new_image.paste(image, (dx, dy)) image = new_image # # image or not flip = rand()<.5 if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) # distort image hue = rand(-hue, hue) sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat) val = rand(1, val) if rand()<.5 else 1/rand(1, val) x = rgb_to_hsv(np.array(image)/255.) x[..., 0] += hue x[..., 0][x[..., 0]>1] -= 1 x[..., 0][x[..., 0]<0] += 1 x[..., 1] *= sat x[..., 2] *= val x[x>1] = 1 x[x<0] = 0 image_data = hsv_to_rgb(x) # numpy array, 0 to 1 # correct boxes box_data = np.zeros((max_boxes,5)) if len(box)>0: np.random.shuffle(box) box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy if flip: box[:, [0,2]] = w - box[:, [2,0]] box[:, 0:2][box[:, 0:2]<0] = 0 box[:, 2][box[:, 2]>w] = w box[:, 3][box[:, 3]>h] = h box_w = box[:, 2] - box[:, 0] box_h = box[:, 3] - box[:, 1] box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box if len(box)>max_boxes: box = box[:max_boxes] box_data[:len(box)] = box return image_data, box_data def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes): '''Preprocess true boxes to training input format Parameters ---------- true_boxes: array, shape=(m, T, 5) Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape. input_shape: array-like, hw, multiples of 32 anchors: array, shape=(N, 2), wh num_classes: integer Returns ------- y_true: list of array, shape like yolo_outputs, xywh are reletive value ''' assert (true_boxes[..., 4]<num_classes).all(), 'class id must be less than num_classes' num_layers = len(anchors)//3 # default setting anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] true_boxes = np.array(true_boxes, dtype='float32') input_shape = np.array(input_shape, dtype='int32') boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2 boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2] true_boxes[..., 0:2] = boxes_xy/input_shape[::-1] true_boxes[..., 2:4] = boxes_wh/input_shape[::-1] m = true_boxes.shape[0] grid_shapes = [input_shape//{0:32, 1:16, 2:8}[l] for l in range(num_layers)] y_true = [np.zeros((m,grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5+num_classes), dtype='float32') for l in range(num_layers)] # Expand dim to apply broadcasting. anchors = np.expand_dims(anchors, 0) anchor_maxes = anchors / 2. anchor_mins = -anchor_maxes valid_mask = boxes_wh[..., 0] > 0 for b in range(m): # Discard zero rows. wh = boxes_wh[b, valid_mask[b]] if len(wh) == 0: continue # Expand dim to apply broadcasting. wh = np.expand_dims(wh, -2) box_maxes = wh / 2. box_mins = -box_maxes intersect_mins = np.maximum(box_mins, anchor_mins) intersect_maxes = np.minimum(box_maxes, anchor_maxes) intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.) intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] box_area = wh[..., 0] * wh[..., 1] anchor_area = anchors[..., 0] * anchors[..., 1] iou = intersect_area / (box_area + anchor_area - intersect_area) # Find best anchor for each true box best_anchor = np.argmax(iou, axis=-1) for t, n in enumerate(best_anchor): for l in range(num_layers): if n in anchor_mask[l]: i = np.floor(true_boxes[b,t,0]*grid_shapes[l][1]).astype('int32') j = np.floor(true_boxes[b,t,1]*grid_shapes[l][0]).astype('int32') k = anchor_mask[l].index(n) c = true_boxes[b,t, 4].astype('int32') y_true[l][b, j, i, k, 0:4] = true_boxes[b,t, 0:4] y_true[l][b, j, i, k, 4] = 1 y_true[l][b, j, i, k, 5+c] = 1 return y_true # '''data generator for fit_generator''' def data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes): n = len(annotation_lines) # 每个epoch随机用 i = 0 while True: image_data = [] box_data = [] for b in range(batch_size): if i == 0: np.random.shuffle(annotation_lines) image, box = get_random_data(annotation_lines[i], input_shape, random=False) image_data.append(image) box_data.append(box) i = (i+1) % n image_data = np.array(image_data) box_data = np.array(box_data) y_true = preprocess_true_boxes(box_data, input_shape, anchors, num_classes) yield [image_data, *y_true], np.zeros(batch_size) def get_anchors(): with open('./yolo_anchors.txt', 'r') as f: anchors = f.readline() anchors = [float(x) for x in anchors.split(',')] return np.array(anchors).reshape(-1, 2) def main(): is_training = True with open('./deal_data.txt', 'r') as f: lines = f.readlines() np.random.seed(10101) np.random.shuffle(lines) val_split = 0.1 num_val = int(len(lines)*val_split) num_train = len(lines) - num_val if is_training: lines = lines[:num_train] else: lines = lines[num_train:] anchors = get_anchors() for data in data_generator(lines, 10, (416, 416), anchors=anchors, num_classes=1): print('aaa') if __name__ == '__main__': main()

data就是一个batch的数据。着重看一下get_random_data，preprocess_true_boxes，处理的很精彩。anchors是在之前聚类出的九个类别尺寸。这个和待识别目标尺寸息息相关。另外get_random_data里面的max_boxes=20，所以如果你的目标识别在一张图中数目超过这个数字，修改一下。另外对数据的处理方式也对模型的训练有关系，所以可以看情况自己写处理方式。c++的源码没看，有兴趣的可以看看。

三、开始训练

转载请注明原文地址: https://www.6miu.com/read-4200154.html

技术

最新回复(0)