CoDETR – SOTA object detection with transformers
This video talks about CoDETR – current state of the art, transformer-based model that builds on top of previous generations, such as DETR, Deformable Detr and DINO, by adding extra “auxiliary heads” during training, introducing no overhead during inference.
This video is focused on the explanation of the model itself. My next video will be a “source code read” with more in-depth explanations about all the little details about the model.
Important links:
– CoDETR paper: https://arxiv.org/pdf/2211.12860
– Papers With Code COCO benchmark: https://paperswithcode.com/sota/object-detection-on-coco
00:00 – Intro
03:48 – DETR, One-to-One label matching
11:55 – Problems with DETR
18:00 – One-to-Many label assignment, Convnets (RPN, RCNN)
22:27 – Aux heads for encoder training
28:04 – Aux heads for decoder training
35:57 – Results
source