2025年detr源码（detr代码）

科技前沿 • 2025-04-16 13:23 • 阅读 59

大家好，我是讯享网，很高兴认识大家。

 <svg xmlns="http://www.w3.org/2000/svg" style="display: none;"> <path stroke-linecap="round" d="M5,0 0,2.5 5,5z" id="raphael-marker-block" style="-webkit-tap-highlight-color: rgba(0, 0, 0, 0);"></path> </svg> <p><img src="https://i-blog.csdnimg.cn/blog_migrate/7ce7ddd23fbbade037ec4db6d.png" alt="在这里插入图片描述" />官方源码地址&#xff1a;https://github.com/facebookresearch/detr</p>

讯享网

标注文件.json，格式：在这里插入图片描述
讯享网
目标检测任务中，主要使用“image_id”图片名，“bbox”目标的边界框(left_x, left_y, w, h)，"category_id"目标类别。
各字段详细说明，可参考：

【沐枫8023】https://blog.csdn.net/weixin_/article/details/

1. 创建数据集
导入数据、数据预处理（norm、resize)。如果是训练集，则resize尺度为scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]之一。min(w,h)缩放到scales尺度，另一属性做相应变化，从而满足多尺度。

讯享网

2. DataLoader

3. 获取数据
注： train数据直接通过data_loader_train获取，不经过此步骤。

讯享网

在这里插入图片描述
boxes处理：

orig_box（x1, y1, x2, y2)，ratio_w = resize_w / orig_w，ratio_h = resize_h / orig_h。
rescale目标框：orig_box * (ratio_w, ratio_h, ratio_w, ratio_h)
中心化：(x1, y1, x2, y2）转（center_x, center_y, w, h)
(resize_w, resize_h）归一化目标框：（center_x, center_y, w, h) / (resize_w, resize_h, resize_w, resize_h)

批处理batch中图像rescale_size：w取每个image缩放后的max(w1,w2,w3, …)；h取每个image缩放后的max(h1, h2, h3, …)。

处理通道boxes数据复原：以val集000000000285.jpg为例
在这里插入图片描述
反向执行boxes处理步骤：

transformer: encoder + decoder

构建model、评估准则、后处理器

讯享网

类DETR

匈牙利匹配

讯享网

评估标准：loss

讯享网

3.1、模型：

在这里插入图片描述
backbone：

CNN特征features(list类型)：NestedTensor—tensors（torch.Size([2, 2048, 28, 38])）、mask
位置编码pos(list类型)：torch.Size([2, 256, 28, 38])

transformer:

讯享网

预测：

类别预测：nn.Linear(hs)
坐标预测：MLP(hs).sigmoid()
得：

即最后模型输出结果为out。包含keys-values：
pred_logits：类别标签值；
pred_boxes：检测框，归一化（center_x, center_y, h, w)；
aux_outputs：（可选），用于辅助损失计算，返回的每个解码层结果，不包含最后一层解码层（已包含在pred_logits, pred_boxes中）。

3.2、特征评估：

在这里插入图片描述
matcher：giou、匈牙利分配，预测（最后一层输出（即out前两个key)）与targets.

先对pred_logits使用softmax()计算类别概率值，然后根据targets的类别索引取相应类别的概率值，计算分类损失cost_class。
计算boxes的L1损失cost_bbox
计算boxes giou损失cost_giou

损失矩阵：

讯享网

改进的匈牙利算法分配：

对批处理中的每个图像predicts和targets的损失矩阵计算分配，返回匹配成功的索引值。linear_sum_assignment执行次数由batch_size决定。c[i]大小为[len(predicts), len(targets)]。

losses：
postprocessors：pred_boxes转原始图像，xyxy格式。

检测结果为：
在这里插入图片描述

可阅读博客:

https://blog.csdn.net/baidu_/article/details/

2025年detr源码（detr代码）

3.1、模型：

3.2、特征评估：

相关推荐