# 参考文档 [yolov5-github](https://github.com/ultralytics/yolov5) [yolov5-github-训练文档](https://docs.ultralytics.com/zh/modes/train/#_3) [csdn训练博客](https://blog.csdn.net/a_cheng_/article/details/111401500) # 一、配置环境 ## 1.1 安装依赖包 前往[清华源官方地址](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) 选择适合自己的版本替换自己的源 ```bash # 备份源文件 sudo cp /etc/apt/sources.list /etc/apt/sources.list_bak # 修改源文件 # 更新 sudo apt update && sudo apt upgrade -y ``` 安装必要的环境依赖包 ```bash sudo apt-get install -y build-essential ubuntu-drivers-common net-tools python3 python-is-python3 python3-pip # 修改pip源为清华源 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple ``` 或者修改配置文件替换pipe国内源地址 ```bash mkdir ~/.pip/ cd ~/.pip/ sudo vi pip.conf ``` 输入以下内容: ```bash [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple [install] trusted-host=pypi.tuna.tsinghua.edu.cn ``` 执行验证: ```bash pip config list ``` ## 1.2 安装docker 具体安装步骤参考[ubuntu安装docker官方文档](https://docs.docker.com/engine/install/ubuntu/) 简单直接的安装方式参考如下: ```bash sudo apt install -y docker.io # 将当前用户加入docker组 sudo usermod -aG docker ${USER} ``` ## 1.3 拉取pytorch docker镜像 前往[pytorch 官方docker镜像](https://hub.docker.com/r/pytorch/pytorch/tags)寻找自己合适版本,yolov5要求1.8以上版本,我拉取1.13版本,执行命令: ```bash sudo docker pull pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime ``` ## 1.4 安装nvidia驱动 [桌面版参考链接](https://blog.csdn.net/Perfect886/article/details/119109380) [服务器版参考链接](https://www.wangliguang.org/nvidia-installer/) 我们使用pytorch-docker环境无需安装cuda,NVIDIA驱动简单安装如下 1. 禁用nouveau驱动 编辑 ```/etc/modprobe.d/blacklist-nouveau.conf ```文件,添加以下内容: ```bash blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off ``` 2. 关闭nouveau ```bash echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf ``` 3. 重新生成内核并重启 ```bash sudo update-initramfs -u sudo reboot ``` 4. 重启后验证 重启后,执行:`lsmod | grep nouveau`。**如果没有屏幕输出,说明禁用nouveau成功** 5. 查找推荐驱动 ```bash ubuntu-drivers devices # 输出如下 # modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00 # vendor : NVIDIA Corporation # model : TU104GL [Tesla T4] # driver : nvidia-driver-450-server - distro non-free # driver : nvidia-driver-525-server - distro non-free # driver : nvidia-driver-535-server - distro non-free # driver : nvidia-driver-418-server - distro non-free # driver : nvidia-driver-525 - distro non-free # driver : nvidia-driver-470 - distro non-free # driver : nvidia-driver-470-server - distro non-free # driver : nvidia-driver-535 - distro non-free recommended # driver : xserver-xorg-video-nouveau - distro free builtin ``` 6. 安装推荐的驱动程序 根据自己系统选择安装,安装完成后重启 ```bash sudo apt install nvidia-driver-535-server ``` 7. 重启后验证 ```nvidia-smi ```命令能够输出显卡信息则验证成功 # 二、训练数据集 ## 2.1 下载yolov5代码 前往github下载代码,或者准备自己的yolov5训练代码,如果是拷贝他人代码,将**git目录删除**,否则后续训练时检查git信息会报错。 ```bash git clone git@github.com:ultralytics/yolov5.git ``` ## 2.2 启动进入```pytorch-docker``` ```bash # 映射宿主机地址到docker内部,根据显卡实际情况指定显存容量 docker run -v /home/lishi/object-detect/:/workspace --gpus all --ipc=host -p 6006:6006 -it pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime /bin/bash ``` 后续都将在docker中执行; ## 2.3 安装依赖项 在docker下进入```yolov5```代码目录下将```request.txt```的```opencv```注释掉然后执行依赖项安装  ```bash pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com ``` 继续安装```opencv-python-headless```版本opencv; ```bash pip3 install opencv-python-headless -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com ``` ## 2.4 准备训练数据集 ### 2.4.1 VOC数据集 VOC数据集需要转换为YOLO数据集训练,转换下方转换代码,**修改CLASSES和PATH**: ```python import xml.etree.ElementTree as ET import pickle import os from os import listdir, getcwd from os.path import join import random from shutil import copyfile from PIL import Image # 只要改下面的CLASSES和PATH就可以了,其他的不用改,这个脚本会自动划分数据集,生成YOLO格式的标签文件 # 分类名称 这里改成数据集的分类名称,一定要改!!!请查看数据集目录下的txt文件 CLASSES = ["belt", "nobelt"] # 数据集目录 这里改成数据集的根目录,根目录下有两个文件夹Annotations和JPEGImages,一定要改!!! PATH = r'/home/lishi/object-detect/Helmet/data-images' # 训练集占比80% 训练集:验证集=8:2 这里划分数据集 不用改 TRAIN_RATIO = 80 def clear_hidden_files(path): dir_list = os.listdir(path) for i in dir_list: abspath = os.path.join(os.path.abspath(path), i) if os.path.isfile(abspath): if i.startswith("._"): os.remove(abspath) else: clear_hidden_files(abspath) def convert(size, box): dw = 1. / size[0] dh = 1. / size[1] x = (box[0] + box[1]) / 2.0 y = (box[2] + box[3]) / 2.0 w = box[1] - box[0] h = box[3] - box[2] x = x * dw w = w * dw y = y * dh h = h * dh return (x, y, w, h) def convert_annotation(image_id): # Assuming the image format is jpg image_path = os.path.join(image_dir, f"{image_id}.jpg") img = Image.open(image_path) w, h = img.size in_file = open(PATH+'/Annotations/%s.xml' % image_id, encoding='utf-8') out_file = open(PATH+'/YOLOLabels/%s.txt' % image_id, 'w', encoding='utf-8') tree = ET.parse(in_file) root = tree.getroot() size = root.find('size') # w = int(size.find('width').text) # h = int(size.find('height').text) difficult = 0 for obj in root.iter('object'): if obj.find('difficult'): difficult = obj.find('difficult').text cls = obj.find('name').text if cls not in CLASSES or int(difficult) == 1: continue cls_id = CLASSES.index(cls) xmlbox = obj.find('bndbox') b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb = convert((w, h), b) out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') in_file.close() out_file.close() wd = os.getcwd() wd = os.getcwd() work_sapce_dir = os.path.join(wd, PATH+"/") annotation_dir = os.path.join(work_sapce_dir, "Annotations/") if not os.path.isdir(annotation_dir): os.mkdir(annotation_dir) clear_hidden_files(annotation_dir) image_dir = os.path.join(work_sapce_dir, "JPEGImages/") if not os.path.isdir(image_dir): os.mkdir(image_dir) clear_hidden_files(image_dir) yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/") if not os.path.isdir(yolo_labels_dir): os.mkdir(yolo_labels_dir) clear_hidden_files(yolo_labels_dir) yolov5_train_dir = os.path.join(work_sapce_dir, "train/") if not os.path.isdir(yolov5_train_dir): os.mkdir(yolov5_train_dir) clear_hidden_files(yolov5_train_dir) yolov5_images_train_dir = os.path.join(yolov5_train_dir, "images/") if not os.path.isdir(yolov5_images_train_dir): os.mkdir(yolov5_images_train_dir) clear_hidden_files(yolov5_images_train_dir) yolov5_labels_train_dir = os.path.join(yolov5_train_dir, "labels/") if not os.path.isdir(yolov5_labels_train_dir): os.mkdir(yolov5_labels_train_dir) clear_hidden_files(yolov5_labels_train_dir) yolov5_test_dir = os.path.join(work_sapce_dir, "val/") if not os.path.isdir(yolov5_test_dir): os.mkdir(yolov5_test_dir) clear_hidden_files(yolov5_test_dir) yolov5_images_test_dir = os.path.join(yolov5_test_dir, "images/") if not os.path.isdir(yolov5_images_test_dir): os.mkdir(yolov5_images_test_dir) clear_hidden_files(yolov5_images_test_dir) yolov5_labels_test_dir = os.path.join(yolov5_test_dir, "labels/") if not os.path.isdir(yolov5_labels_test_dir): os.mkdir(yolov5_labels_test_dir) clear_hidden_files(yolov5_labels_test_dir) train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w', encoding='utf-8') test_file = open(os.path.join(wd, "yolov5_valid.txt"), 'w', encoding='utf-8') train_file.close() test_file.close() train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a', encoding='utf-8') test_file = open(os.path.join(wd, "yolov5_valid.txt"), 'a', encoding='utf-8') list_imgs = os.listdir(image_dir) # list image files prob = random.randint(1, 100) print("数据集: %d个" % len(list_imgs)) for i in range(0, len(list_imgs)): path = os.path.join(image_dir, list_imgs[i]) if os.path.isfile(path): image_path = image_dir + list_imgs[i] voc_path = list_imgs[i] (nameWithoutExtention, extention) = os.path.splitext( os.path.basename(image_path)) (voc_nameWithoutExtention, voc_extention) = os.path.splitext( os.path.basename(voc_path)) annotation_name = nameWithoutExtention + '.xml' annotation_path = os.path.join(annotation_dir, annotation_name) label_name = nameWithoutExtention + '.txt' label_path = os.path.join(yolo_labels_dir, label_name) prob = random.randint(1, 100) print("Probability: %d" % prob, i, list_imgs[i]) if (prob < TRAIN_RATIO): # train dataset if os.path.exists(annotation_path): train_file.write(image_path + '\n') convert_annotation(nameWithoutExtention) # convert label copyfile(image_path, yolov5_images_train_dir + voc_path) copyfile(label_path, yolov5_labels_train_dir + label_name) else: # test dataset if os.path.exists(annotation_path): test_file.write(image_path + '\n') convert_annotation(nameWithoutExtention) # convert label copyfile(image_path, yolov5_images_test_dir + voc_path) copyfile(label_path, yolov5_labels_test_dir + label_name) train_file.close() test_file.close() ``` ### 2.4.2 YOLO格式数据集 yolo格式数据集通常为如下结构:  这种数据集不用做任何处理,只需将```dataset.yaml```内容修改为```pytorch-docker```环境下的绝对路径即可,如下参考: ```yaml train: /workspace/Helmet/image-data/train/images val: /workspace/Helmet/image-data/valid/images nc: 2 names: - belt - nobelt ``` ```train```: 容器环境下训练数据集绝对路径; ```val```: 容器环境下验证数据集绝对路径; ```nc```: 数据集类别数量 ```names```: 数据集类别名称 修改后将```dataset.yaml```拷贝到源码data目录下,后续在```train.py```训练代码中```--data```参数设置为```data/dataset.yaml```。 ## 2.5 修改模型文件 ```models```下有5个模型,```smlx```需要训练的时间依次增加,按照需求选择一个文件进行修改即可,我选择**yolov5s.yaml**; 只需将nc改为实际值即可;  ```yolov5s.yaml```修改nc为实际值;  ## 2.6 修改训练tran.py 这里需要对train.py文件内的参数进行修改,```weights```,```cfg```,```data```按照自己所需文件的路径修改,```weights```如果使用参考博客的文件,将```yolov5s.pt```下载放到代码根目录下即可,如果使用官方则无需修改,会自行下载。具体参数含义,查看官方文档。我修改内容如下:  主要修改了以下几项: **cfg** :训练配置文件; **data** : 训练数据集配置文件; **epochs** : 训练迭代轮数; **batch-size** :每次迭代送入神经网络进行训练的图片数量,需为16倍数,根据GPU显存设置,越大越耗显存,训练速度越快; **imgsz** : 训练指定输入图片的尺寸,所有输入图片在送入模型前都会被resize成指定的大小,尺寸大检测小目标效果好,训练慢,反之精度差,训练快; **patience** :早停机制,迭代多少轮验证指标(如val_loss、mAP)不提升,就提前停止训练,以防止过拟合、节省资源; ## 2.7 开始训练 执行```python train.py``` 可能报以下错误: 1. ```All git commands will error until this is rectified```错误提示  按照提示执行```export GIT_PYTHON_REFRESH=quiet```继续执行训练命令,就可以开始训练了。 2. 卡在```not pretrained!!!!!!!!!!!!!!!!!!!!``` 如果误将```data/images/```自带的两张测试图片删除,会卡在```not pretrained!!!!!!!!!!!!!!!!!!!!```后,```AMP: checks passed```前,解决方式可以将图片下载下来重新放回,也可以修改```utils/general.py```按照如下修改: ```python im = f if f.exists() else 'https://ultralytics.com/images/bus.jpg' if False else np.ones((640, 640, 3)) # im = f if f.exists() else 'https://ultralytics.com/images/bus.jpg' if check_online() else np.ones((640, 640, 3)) ```  3. 下载```Arial.ttf ```字体错误 手动下载```Arial.ttf ```字体文件,放到docker环境下的```/root/.config/Ultralytics/```文件夹下; 4. 获取git信息错误 删除.git文件夹 ## 2.8 验证训练结果 训练结束后在代码根目录下执行检测命令,可以将待检测图片放到```data/samples```目录下执行 ```bash python detect.py --weights runs/train/exp/weights/best.pt --source data/samples/ --device 0 --data data/fall.yaml ``` **注意:** 每训练一次都会在```runs/train/```目录下新创建一个exp加数字文件夹,运行测试用例时选择最新的,测试结果也会保存在```runs/detect```目录下最新的exp文件夹下 ## 2.9 导出模型 以导出onnx为例,执行以下命令: ```bash # 安装onnx支持 pip3 install onnxsim -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com # 导出[640,480]模型 # parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640, 640], help='image (h, w)') 尺寸 h在前 python export.py --weights runs/train/exp8/weights/best.pt --img-size 480 640 --batch 1 --device 0 --opset 12 --optimize --dynamic --include onnx --device cpu ``` # 3. 异常解决 ## 3.1. docker: Error response from daemon: could not select device driver ““ with capabilities: [[gpu]]. ```bash could not select device driver "" with capabilities: [[gpu]] ``` 1. 检查主机 GPU 和 NVIDIA 驱动是否正常工作 ```bash nvidia-smi ``` 2. 检查 NVIDIA 容器工具包是否安装 ```bash dpkg -l | grep nvidia-container-toolkit ``` 如果没有任何信息,则使用以下命令安装: ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt update sudo apt install -y nvidia-container-toolkit sudo systemctl restart docker ``` 3. 为Docker添加NVIDIA配置 ```bash sudo vim /etc/docker/daemon.json ``` 追加以下内容: ```json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } ``` 保存后重启docker ```bash sudo systemctl restart docker ``` ## 3.1. RuntimeError: Numpy is not available 这是因为Numpy 版本太高,将现有Numpy卸载 ```bash pip uninstall numpy ``` 安装numpy=1.26.4,解决此问题 ``` pip install numpy==1.26.4 -i https://pypi.tuna.tsinghua.edu.cn/simple ``` Loading... # 参考文档 [yolov5-github](https://github.com/ultralytics/yolov5) [yolov5-github-训练文档](https://docs.ultralytics.com/zh/modes/train/#_3) [csdn训练博客](https://blog.csdn.net/a_cheng_/article/details/111401500) # 一、配置环境 ## 1.1 安装依赖包 前往[清华源官方地址](https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/) 选择适合自己的版本替换自己的源 ```bash # 备份源文件 sudo cp /etc/apt/sources.list /etc/apt/sources.list_bak # 修改源文件 # 更新 sudo apt update && sudo apt upgrade -y ``` 安装必要的环境依赖包 ```bash sudo apt-get install -y build-essential ubuntu-drivers-common net-tools python3 python-is-python3 python3-pip # 修改pip源为清华源 pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple ``` 或者修改配置文件替换pipe国内源地址 ```bash mkdir ~/.pip/ cd ~/.pip/ sudo vi pip.conf ``` 输入以下内容: ```bash [global] index-url = https://pypi.tuna.tsinghua.edu.cn/simple [install] trusted-host=pypi.tuna.tsinghua.edu.cn ``` 执行验证: ```bash pip config list ``` ## 1.2 安装docker 具体安装步骤参考[ubuntu安装docker官方文档](https://docs.docker.com/engine/install/ubuntu/) 简单直接的安装方式参考如下: ```bash sudo apt install -y docker.io # 将当前用户加入docker组 sudo usermod -aG docker ${USER} ``` ## 1.3 拉取pytorch docker镜像 前往[pytorch 官方docker镜像](https://hub.docker.com/r/pytorch/pytorch/tags)寻找自己合适版本,yolov5要求1.8以上版本,我拉取1.13版本,执行命令: ```bash sudo docker pull pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime ``` ## 1.4 安装nvidia驱动 [桌面版参考链接](https://blog.csdn.net/Perfect886/article/details/119109380) [服务器版参考链接](https://www.wangliguang.org/nvidia-installer/) 我们使用pytorch-docker环境无需安装cuda,NVIDIA驱动简单安装如下 1. 禁用nouveau驱动 编辑 ```/etc/modprobe.d/blacklist-nouveau.conf ```文件,添加以下内容: ```bash blacklist nouveau blacklist lbm-nouveau options nouveau modeset=0 alias nouveau off alias lbm-nouveau off ``` 2. 关闭nouveau ```bash echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf ``` 3. 重新生成内核并重启 ```bash sudo update-initramfs -u sudo reboot ``` 4. 重启后验证 重启后,执行:`lsmod | grep nouveau`。**如果没有屏幕输出,说明禁用nouveau成功** 5. 查找推荐驱动 ```bash ubuntu-drivers devices # 输出如下 # modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00 # vendor : NVIDIA Corporation # model : TU104GL [Tesla T4] # driver : nvidia-driver-450-server - distro non-free # driver : nvidia-driver-525-server - distro non-free # driver : nvidia-driver-535-server - distro non-free # driver : nvidia-driver-418-server - distro non-free # driver : nvidia-driver-525 - distro non-free # driver : nvidia-driver-470 - distro non-free # driver : nvidia-driver-470-server - distro non-free # driver : nvidia-driver-535 - distro non-free recommended # driver : xserver-xorg-video-nouveau - distro free builtin ``` 6. 安装推荐的驱动程序 根据自己系统选择安装,安装完成后重启 ```bash sudo apt install nvidia-driver-535-server ``` 7. 重启后验证 ```nvidia-smi ```命令能够输出显卡信息则验证成功 # 二、训练数据集 ## 2.1 下载yolov5代码 前往github下载代码,或者准备自己的yolov5训练代码,如果是拷贝他人代码,将**git目录删除**,否则后续训练时检查git信息会报错。 ```bash git clone git@github.com:ultralytics/yolov5.git ``` ## 2.2 启动进入```pytorch-docker``` ```bash # 映射宿主机地址到docker内部,根据显卡实际情况指定显存容量 docker run -v /home/lishi/object-detect/:/workspace --gpus all --ipc=host -p 6006:6006 -it pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime /bin/bash ``` 后续都将在docker中执行; ## 2.3 安装依赖项 在docker下进入```yolov5```代码目录下将```request.txt```的```opencv```注释掉然后执行依赖项安装  ```bash pip3 install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com ``` 继续安装```opencv-python-headless```版本opencv; ```bash pip3 install opencv-python-headless -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com ``` ## 2.4 准备训练数据集 ### 2.4.1 VOC数据集 VOC数据集需要转换为YOLO数据集训练,转换下方转换代码,**修改CLASSES和PATH**: ```python import xml.etree.ElementTree as ET import pickle import os from os import listdir, getcwd from os.path import join import random from shutil import copyfile from PIL import Image # 只要改下面的CLASSES和PATH就可以了,其他的不用改,这个脚本会自动划分数据集,生成YOLO格式的标签文件 # 分类名称 这里改成数据集的分类名称,一定要改!!!请查看数据集目录下的txt文件 CLASSES = ["belt", "nobelt"] # 数据集目录 这里改成数据集的根目录,根目录下有两个文件夹Annotations和JPEGImages,一定要改!!! PATH = r'/home/lishi/object-detect/Helmet/data-images' # 训练集占比80% 训练集:验证集=8:2 这里划分数据集 不用改 TRAIN_RATIO = 80 def clear_hidden_files(path): dir_list = os.listdir(path) for i in dir_list: abspath = os.path.join(os.path.abspath(path), i) if os.path.isfile(abspath): if i.startswith("._"): os.remove(abspath) else: clear_hidden_files(abspath) def convert(size, box): dw = 1. / size[0] dh = 1. / size[1] x = (box[0] + box[1]) / 2.0 y = (box[2] + box[3]) / 2.0 w = box[1] - box[0] h = box[3] - box[2] x = x * dw w = w * dw y = y * dh h = h * dh return (x, y, w, h) def convert_annotation(image_id): # Assuming the image format is jpg image_path = os.path.join(image_dir, f"{image_id}.jpg") img = Image.open(image_path) w, h = img.size in_file = open(PATH+'/Annotations/%s.xml' % image_id, encoding='utf-8') out_file = open(PATH+'/YOLOLabels/%s.txt' % image_id, 'w', encoding='utf-8') tree = ET.parse(in_file) root = tree.getroot() size = root.find('size') # w = int(size.find('width').text) # h = int(size.find('height').text) difficult = 0 for obj in root.iter('object'): if obj.find('difficult'): difficult = obj.find('difficult').text cls = obj.find('name').text if cls not in CLASSES or int(difficult) == 1: continue cls_id = CLASSES.index(cls) xmlbox = obj.find('bndbox') b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text)) bb = convert((w, h), b) out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') in_file.close() out_file.close() wd = os.getcwd() wd = os.getcwd() work_sapce_dir = os.path.join(wd, PATH+"/") annotation_dir = os.path.join(work_sapce_dir, "Annotations/") if not os.path.isdir(annotation_dir): os.mkdir(annotation_dir) clear_hidden_files(annotation_dir) image_dir = os.path.join(work_sapce_dir, "JPEGImages/") if not os.path.isdir(image_dir): os.mkdir(image_dir) clear_hidden_files(image_dir) yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/") if not os.path.isdir(yolo_labels_dir): os.mkdir(yolo_labels_dir) clear_hidden_files(yolo_labels_dir) yolov5_train_dir = os.path.join(work_sapce_dir, "train/") if not os.path.isdir(yolov5_train_dir): os.mkdir(yolov5_train_dir) clear_hidden_files(yolov5_train_dir) yolov5_images_train_dir = os.path.join(yolov5_train_dir, "images/") if not os.path.isdir(yolov5_images_train_dir): os.mkdir(yolov5_images_train_dir) clear_hidden_files(yolov5_images_train_dir) yolov5_labels_train_dir = os.path.join(yolov5_train_dir, "labels/") if not os.path.isdir(yolov5_labels_train_dir): os.mkdir(yolov5_labels_train_dir) clear_hidden_files(yolov5_labels_train_dir) yolov5_test_dir = os.path.join(work_sapce_dir, "val/") if not os.path.isdir(yolov5_test_dir): os.mkdir(yolov5_test_dir) clear_hidden_files(yolov5_test_dir) yolov5_images_test_dir = os.path.join(yolov5_test_dir, "images/") if not os.path.isdir(yolov5_images_test_dir): os.mkdir(yolov5_images_test_dir) clear_hidden_files(yolov5_images_test_dir) yolov5_labels_test_dir = os.path.join(yolov5_test_dir, "labels/") if not os.path.isdir(yolov5_labels_test_dir): os.mkdir(yolov5_labels_test_dir) clear_hidden_files(yolov5_labels_test_dir) train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w', encoding='utf-8') test_file = open(os.path.join(wd, "yolov5_valid.txt"), 'w', encoding='utf-8') train_file.close() test_file.close() train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a', encoding='utf-8') test_file = open(os.path.join(wd, "yolov5_valid.txt"), 'a', encoding='utf-8') list_imgs = os.listdir(image_dir) # list image files prob = random.randint(1, 100) print("数据集: %d个" % len(list_imgs)) for i in range(0, len(list_imgs)): path = os.path.join(image_dir, list_imgs[i]) if os.path.isfile(path): image_path = image_dir + list_imgs[i] voc_path = list_imgs[i] (nameWithoutExtention, extention) = os.path.splitext( os.path.basename(image_path)) (voc_nameWithoutExtention, voc_extention) = os.path.splitext( os.path.basename(voc_path)) annotation_name = nameWithoutExtention + '.xml' annotation_path = os.path.join(annotation_dir, annotation_name) label_name = nameWithoutExtention + '.txt' label_path = os.path.join(yolo_labels_dir, label_name) prob = random.randint(1, 100) print("Probability: %d" % prob, i, list_imgs[i]) if (prob < TRAIN_RATIO): # train dataset if os.path.exists(annotation_path): train_file.write(image_path + '\n') convert_annotation(nameWithoutExtention) # convert label copyfile(image_path, yolov5_images_train_dir + voc_path) copyfile(label_path, yolov5_labels_train_dir + label_name) else: # test dataset if os.path.exists(annotation_path): test_file.write(image_path + '\n') convert_annotation(nameWithoutExtention) # convert label copyfile(image_path, yolov5_images_test_dir + voc_path) copyfile(label_path, yolov5_labels_test_dir + label_name) train_file.close() test_file.close() ``` ### 2.4.2 YOLO格式数据集 yolo格式数据集通常为如下结构:  这种数据集不用做任何处理,只需将```dataset.yaml```内容修改为```pytorch-docker```环境下的绝对路径即可,如下参考: ```yaml train: /workspace/Helmet/image-data/train/images val: /workspace/Helmet/image-data/valid/images nc: 2 names: - belt - nobelt ``` ```train```: 容器环境下训练数据集绝对路径; ```val```: 容器环境下验证数据集绝对路径; ```nc```: 数据集类别数量 ```names```: 数据集类别名称 修改后将```dataset.yaml```拷贝到源码data目录下,后续在```train.py```训练代码中```--data```参数设置为```data/dataset.yaml```。 ## 2.5 修改模型文件 ```models```下有5个模型,```smlx```需要训练的时间依次增加,按照需求选择一个文件进行修改即可,我选择**yolov5s.yaml**; 只需将nc改为实际值即可;  ```yolov5s.yaml```修改nc为实际值;  ## 2.6 修改训练tran.py 这里需要对train.py文件内的参数进行修改,```weights```,```cfg```,```data```按照自己所需文件的路径修改,```weights```如果使用参考博客的文件,将```yolov5s.pt```下载放到代码根目录下即可,如果使用官方则无需修改,会自行下载。具体参数含义,查看官方文档。我修改内容如下:  主要修改了以下几项: **cfg** :训练配置文件; **data** : 训练数据集配置文件; **epochs** : 训练迭代轮数; **batch-size** :每次迭代送入神经网络进行训练的图片数量,需为16倍数,根据GPU显存设置,越大越耗显存,训练速度越快; **imgsz** : 训练指定输入图片的尺寸,所有输入图片在送入模型前都会被resize成指定的大小,尺寸大检测小目标效果好,训练慢,反之精度差,训练快; **patience** :早停机制,迭代多少轮验证指标(如val_loss、mAP)不提升,就提前停止训练,以防止过拟合、节省资源; ## 2.7 开始训练 执行```python train.py``` 可能报以下错误: 1. ```All git commands will error until this is rectified```错误提示  按照提示执行```export GIT_PYTHON_REFRESH=quiet```继续执行训练命令,就可以开始训练了。 2. 卡在```not pretrained!!!!!!!!!!!!!!!!!!!!``` 如果误将```data/images/```自带的两张测试图片删除,会卡在```not pretrained!!!!!!!!!!!!!!!!!!!!```后,```AMP: checks passed```前,解决方式可以将图片下载下来重新放回,也可以修改```utils/general.py```按照如下修改: ```python im = f if f.exists() else 'https://ultralytics.com/images/bus.jpg' if False else np.ones((640, 640, 3)) # im = f if f.exists() else 'https://ultralytics.com/images/bus.jpg' if check_online() else np.ones((640, 640, 3)) ```  3. 下载```Arial.ttf ```字体错误 手动下载```Arial.ttf ```字体文件,放到docker环境下的```/root/.config/Ultralytics/```文件夹下; 4. 获取git信息错误 删除.git文件夹 ## 2.8 验证训练结果 训练结束后在代码根目录下执行检测命令,可以将待检测图片放到```data/samples```目录下执行 ```bash python detect.py --weights runs/train/exp/weights/best.pt --source data/samples/ --device 0 --data data/fall.yaml ``` **注意:** 每训练一次都会在```runs/train/```目录下新创建一个exp加数字文件夹,运行测试用例时选择最新的,测试结果也会保存在```runs/detect```目录下最新的exp文件夹下 ## 2.9 导出模型 以导出onnx为例,执行以下命令: ```bash # 安装onnx支持 pip3 install onnxsim -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com # 导出[640,480]模型 # parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640, 640], help='image (h, w)') 尺寸 h在前 python export.py --weights runs/train/exp8/weights/best.pt --img-size 480 640 --batch 1 --device 0 --opset 12 --optimize --dynamic --include onnx --device cpu ``` # 3. 异常解决 ## 3.1. docker: Error response from daemon: could not select device driver ““ with capabilities: [[gpu]]. ```bash could not select device driver "" with capabilities: [[gpu]] ``` 1. 检查主机 GPU 和 NVIDIA 驱动是否正常工作 ```bash nvidia-smi ``` 2. 检查 NVIDIA 容器工具包是否安装 ```bash dpkg -l | grep nvidia-container-toolkit ``` 如果没有任何信息,则使用以下命令安装: ```bash distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt update sudo apt install -y nvidia-container-toolkit sudo systemctl restart docker ``` 3. 为Docker添加NVIDIA配置 ```bash sudo vim /etc/docker/daemon.json ``` 追加以下内容: ```json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } ``` 保存后重启docker ```bash sudo systemctl restart docker ``` ## 3.1. RuntimeError: Numpy is not available 这是因为Numpy 版本太高,将现有Numpy卸载 ```bash pip uninstall numpy ``` 安装numpy=1.26.4,解决此问题 ``` pip install numpy==1.26.4 -i https://pypi.tuna.tsinghua.edu.cn/simple ``` 最后修改:2026 年 03 月 17 日 © 允许规范转载 打赏 赞赏作者 支付宝微信 赞 如果觉得我的文章对你有用,请随意赞赏