YOLO v4 object detector test / YOLO version 4 example detections
YOLO v4 test by Prof. Dr. Jürgen Brauer, University of Applied Sciences Kempten
www.juergenbrauer.org
Introduction:
==========
In this video I test the performance of the new YOLO v4 object detector (“You Only Look Once”), which sets the new state of the Art (SotA) in object detection (regarding the trade-off between speed and object detection performance, measure by the mAP metric).
Input data:
=========
The test frames are from this video:
It shows a 24 minute walk by Manfred Auer through the nice city of Kempten, located in south Germany near to the Alps.
This video provides us with very interesting test objects: persons (mainly pedestrians), cars, bikes, motor bikes, busses, chairs, etc.
The 24 minute video was splitted into 86473 individual frames using ffmpeg with this command:
“ffmpeg -i input_video.mp4 img_%06d.png”
YOLO v4 implementation used:
=========================
For object detection I used the new YOLO v4 object detector, available as the original implementation / source code:
https://github.com/AlexeyAB/darknet
It is NOT based on PyTorch or TensorFlow/Keras, but uses its own deep learning framework, called “Darknet”. Darknet is written in C and the code is easy to understand.
I compiled the code with GPU support for a faster single frame processing. You can read here a little bit how I got the code running:
http://www.juergenbrauer.org/teaching/mss/ex10_yolo.pdf
Compared to TensorFlow’s object detection API (which does not support the new TF 2.x framework, but is based on the old TF 1.x framework), it is a piece of cake to get the code working.
How YOLO was called to produce the output:
====================================
After I got the code compiling, I downloaded the pre-trained YOLO v4 weights from here:
https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights (about 255MB of weight data)
This model was trained on the MS COCO dataset:
https://cocodataset.org/#home
which contains training images for 80 different object categories: https://cocodataset.org/#explore
I further did a small modification of the code in order to save each prediction image as an individual image.
In /src/detector.c I therefore changed the test_detector() function such that each prediction image is saved (I said, that the code is easy to understand and thus easy to modify…)
Then I used ffmpeg again in order to compile a video from the 86473 prediction images.
This is how I called YOLO v4:
./darknet detector test ./cfg/coco.data ./cfg/yolov4.cfg weights/yolov4.weights list_of_frames.txt
where list_of_frames.txt is a simple text file with the absolute paths of all the 86473 raw images of the original video.
Results:
=======
I have to say that I am really impressed about YOLO v4.
It is extremely fast: a single prediction took less than 50ms on my notebook GPU:
Which GPU?
“hwinfo –gfxcard –short” (on command line)
gave me as result “nVidia GP104GLM Quadro P5200 Mobile”
So I had a framerate of about 20 frames per second and the complete prediction for the 24 minutes video took about:
86473/20 seconds = 72 minutes
… which was roughly my lunch time yesterday 😉
On the other hand the predictions are incredibly good! I did my PhD in computer vision in an era when the best detectors were the Viola-Jones detector, the HOG detector and the Implicit Shape Model (ISM), and later the Deformable Parts Model (DPM). Actually, object detection at that time did not work (to be honest). Now, things have changed as you can see in this video!
Link to YOLO v4 original paper:
=========================
“YOLOv4: Optimal Speed and Accuracy of Object Detection”
by Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao
first published on 23.04.2020 on arxiv.org (preprint server)
Keywords: YOLOv4, YOLO v4, YOLO version4, “You Only Look Once” v4, object detector, object detection, object recognition, object localization, Deep Convolutional Neural Network for object detection, YOLO v4 performance test, YOLO v4 object detection examples, Deep Learning
Another similar video I have produced:
===============================
“Canny Edge World”
(edge detection results for the same input video)
source