YOLO学习笔记1——json标签文件转YOLO格式

一、COCO数据集——目标实例json文件内容与格式解析

本部分参照下面这篇文章，有部分补充修改~COCO数据集（目标检测任务json文件内容总结）https://zhuanlan.zhihu.com/p/309549190?utm_id=0COCO数据集现有三种标注类型：object instances（目标实例），object keypoints（目标上的关键点），和image captions（看图说话），使用json文件存储。笔者使用的数据集是针对object instances（目标实例）的json文件。以下介绍其内容与格式。

1.1简介

COCO数据集中目标实例的json文件整体是以字典的形式来存储内容的。主要包括5个key（info、licenses、images、annotations、categories）。

{

“info”：info， # 文件数据信息

“licenses”：[licenses]， # 协议信息

“images”：[image],     # 图片信息

        “annotations” : [annataton], # 预测框信息

        “categories” : [category] # 种类信息

}

1.2每个key对应的详细内容

1.2.1 info

info

{

        “year” ： int，                     # 年份

        “version” : str，                            # 版本

        “description” : str，   # 详细描述信息

        “contributor” : str，                      # 作者

        “url” : str，    # 协议链接

        “date_created” : datetime，        # 生成日期

}

1.2.2 licenses

“licenses”：

[{

        “id”: 1, # int 协议id号在images中遵循的license即1

        “name”: null, # str 协议名

        “url”: null # str 协议链接

}]

1.2.3 images

“images”：

[{

        “id”：0,    # int 图像id，可从0开始

        “file_name”： “0.jpg”,                         # str 文件名

        “width”： 512, # int 图像的宽

        “height”： 512, # int 图像的高

        “date_captured”： “2020-04-14 01:45:07.508146”, # datatime 获取日期

        “license”：1,    # int 遵循哪个协议

        “coco_url”：””,                                 # str coco图片链接url

        “flickr_url”：””        # str flick图片链接url

}]

1.2.4 annotations

“annotations”：

[{

        “id”: 0,                      # int 图片中每个被标记物体的id编号

        “image_id”: 0,                # int 该物体所在图片的编号

        “category_id”: 2,                # int 被标记物体的类别id编号

        “iscrowd”: 0,                  # 0 or 1 目标是否被遮盖，默认为0

        “area”: 4095.9999999999986, # float 被检测物体的面积（64 * 64 = 4096)          “bbox”: [542.0, 698.0, 220.0, 271.0], # [x, y, width, height] 目标检测框的坐标信息          “segmentation”: [[621, 703, 573, 744, 542, 885, 580, 945, 650, 969, 711, 883, 762, 807, 748, 741, 649, 698]] #表示多边形坐标polygon格式,

}]

“bbox”里[x, y, width, height]x, y代表的是框体的左上角的x, y的坐标值。x，y取值为segmentation里面的x最小值与y最小值，由x（y）最大值减最小值得出的宽度（高度）

“segmentation”里[x1, y1, x2, y2, x3, y3, ……,xn,yn]是以左上角坐标为起始，顺时针依次选取坐标点。

1.2.5 categories

“categories”：

[{

        “id”: 1, # int 类别id编号

        “name”：”rectangle”,   # str 类别名字

        “supercategory”：”None” # str 类别所属的父类

}，

{

        “id”: 2,

        “name”：”circle”,

        “supercategory”: “None”

}]

二、YOLO数据label格式

YOLO数据集标签格式通常为：

：目标类别

：目标框的中心坐标x

：目标框的中心坐标y

：目标框的宽度

：目标框的高度

注意！YOLO数据皆以图像宽度和高度的比例表示！！！

笔者这里的模型所需数据格式为

三、代码

运行软件：pycharm

# json文件转换为YOLO格式标签
import re
import os
import json

# json文件地址
path_json = r'R:\Python\yolov5_finished\train_quadrant_enumeration_disease.json'
# txt存储目录
path_label = r'R:\Python\yolov5_finished\teeth_data\labels\a/'

def json2txt(path_json, path_label):
    s = " "
    t = "\n"
    j = 0
    # 定义正则表达式
    pattern = r'\d+'

    # 读取json文件句柄，结果转为Python的dic对象
    with open(path_json, 'r', encoding='utf-8') as json_doc:
        js = json.load(json_doc)

    # 获取图片数量信息
    count_images = len(js['images'])
    count = len(js['annotations'])

    for i in range(count_images):
        # txt文件名称“train_i.txt”
        label_num = str(re.findall(pattern, js['images'][i]['file_name'])).strip("'[]")
        #print(label_num)
        file_name = 'train_' + label_num + '.txt'
        full_path = path_label + file_name

        # 在指定路径创建一个txt
        f = open(full_path, 'a')

        # 同一张图片中的多个标签写入同一个txt
        while j + 1 < count:
            if js['annotations'][j]['image_id'] == js['annotations'][j+1]['image_id']:
                # 类别
                category = str(js['annotations'][j]['category_id_3'])
                # 图片大小
                image_w = js['images'][i]['width']
                image_h = js['images'][i]['height']
                # 预测框大小
                pre_box_w = js['annotations'][j]['bbox'][2]
                pre_box_h = js['annotations'][j]['bbox'][3]
                # 左上角x,y坐标值
                pre_top_left_x = float(js['annotations'][j]['bbox'][0])
                pre_top_left_y = float(js['annotations'][j]['bbox'][1])
                # 右下角x,y坐标值
                pre_bottom_right_x = float(js['annotations'][j]['bbox'][0]) + pre_box_w
                pre_bottom_right_y = float(js['annotations'][j]['bbox'][1]) + pre_box_h
                # 框中心x,y坐标值
                pre_center_x = pre_top_left_x + pre_box_w / 2
                pre_center_y = pre_top_left_y + pre_box_h / 2
                # 结果归一化
                center_x = str(pre_center_x / image_w)
                center_y = str(pre_center_y / image_h)
                top_left_x = str(pre_top_left_x / image_w)
                top_left_y = str(pre_top_left_y / image_h)
                bottom_right_x = str(pre_bottom_right_x / image_w)
                bottom_right_y = str(pre_bottom_right_y / image_h)
                box_w = str(pre_box_w / image_w)
                box_h = str(pre_box_h / image_h)
                # 将数据写入txt
                # c_x,c_y,w,h
                #f.write(category + s + center_x + s + center_y + s + box_w + s + box_h + t)
                # x1,y1,x2,y2
                f.write(category + s + top_left_x + s + top_left_y + s + bottom_right_x + s + bottom_right_y + t)
                j += 1
            else:
                # 类别
                category = str(js['annotations'][j]['category_id_3'])
                # 图片大小
                image_w = js['images'][i]['width']
                image_h = js['images'][i]['height']
                # 预测框大小
                pre_box_w = js['annotations'][j]['bbox'][2]
                pre_box_h = js['annotations'][j]['bbox'][3]
                # 左上角x,y坐标值
                pre_top_left_x = float(js['annotations'][j]['bbox'][0])
                pre_top_left_y = float(js['annotations'][j]['bbox'][1])
                # 右下角x,y坐标值
                pre_bottom_right_x = float(js['annotations'][j]['bbox'][0]) + pre_box_w
                pre_bottom_right_y = float(js['annotations'][j]['bbox'][1]) + pre_box_h
                # 框中心x,y坐标值
                pre_center_x = pre_top_left_x + pre_box_w / 2
                pre_center_y = pre_top_left_y + pre_box_h / 2
                # 结果归一化
                center_x = str(pre_center_x / image_w)
                center_y = str(pre_center_y / image_h)
                top_left_x = str(pre_top_left_x / image_w)
                top_left_y = str(pre_top_left_y / image_h)
                bottom_right_x = str(pre_bottom_right_x / image_w)
                bottom_right_y = str(pre_bottom_right_y / image_h)
                box_w = str(pre_box_w / image_w)
                box_h = str(pre_box_h / image_h)
                # 将数据写入txt
                # c_x,c_y,w,h
                # f.write(category + s + center_x + s + center_y + s + box_w + s + box_h + t)
                # x1,y1,x2,y2
                f.write(category + s + top_left_x + s + top_left_y + s + bottom_right_x + s + bottom_right_y + t)
                j += 1
                break
        if j + 1 == count:
            # 类别
            category = str(js['annotations'][j]['category_id_3'])
            # 图片大小
            image_w = js['images'][i]['width']
            image_h = js['images'][i]['height']
            # 预测框大小
            pre_box_w = js['annotations'][j]['bbox'][2]
            pre_box_h = js['annotations'][j]['bbox'][3]
            # 左上角x,y坐标值
            pre_top_left_x = float(js['annotations'][j]['bbox'][0])
            pre_top_left_y = float(js['annotations'][j]['bbox'][1])
            # 右下角x,y坐标值
            pre_bottom_right_x = float(js['annotations'][j]['bbox'][0]) + pre_box_w
            pre_bottom_right_y = float(js['annotations'][j]['bbox'][1]) + pre_box_h
            # 框中心x,y坐标值
            pre_center_x = pre_top_left_x + pre_box_w / 2
            pre_center_y = pre_top_left_y + pre_box_h / 2
            # 结果归一化
            center_x = str(pre_center_x / image_w)
            center_y = str(pre_center_y / image_h)
            top_left_x = str(pre_top_left_x / image_w)
            top_left_y = str(pre_top_left_y / image_h)
            bottom_right_x = str(pre_bottom_right_x / image_w)
            bottom_right_y = str(pre_bottom_right_y / image_h)
            box_w = str(pre_box_w / image_w)
            box_h = str(pre_box_h / image_h)
            # 将数据写入txt
            # c_x,c_y,w,h
            # f.write(category + s + center_x + s + center_y + s + box_w + s + box_h + t)
            # x1,y1,x2,y2
            f.write(category + s + top_left_x + s + top_left_y + s + bottom_right_x + s + bottom_right_y + t)

# 主函数
json2txt(path_json, path_label)