Computer Vision, Machine Learning, and AI

Published on by Maiara Araújo

In this article, I will show my first tests with Machine Learning and Computer Vision. Since I started my classes as a special student in the Master's program in Computer Science, I have learned a lot about these topics and I share here my findings.

Article under construction.

Mediapipe and Unity

Under construction.

Computer Vision System for Box Counting with YOLO and Roboflow

I developed a Computer Vision system that uses YOLO (You Only Look Once) to detect and track boxes of different colors from the sample video. To create the dataset, I used Roboflow, a platform I recently discovered that greatly facilitated the object labeling process; before, I used Make sense.

First, I labeled the boxes in the images using Roboflow Annotate, converted the video into a sequence of JPG images, and imported them to the platform. Since it was for educational purposes, out of 300 images, I labeled 87 of them.

I imported them to my machine and trained YOLO locally. In Python, using libraries like OpenCV, I implemented the video and with BotSort, I tracked the objects to ensure continuous detection of the same object and count only 1 box.

The box count is done through a line defined in the video, where each box that crosses it is counted.

And that's it! The result is in the small GIF I made from the output. This is my first EVER test!

Below, without the need for training, as I used the pure COCO Dataset (Common Objects in Context), I used the "cross-line counting" method to solve the occlusion problem when IDs are lost due to movement or object overlap. The algorithm also uses the BotSort tracker model to perform object detection and tracking in the video, focusing mainly on people and means of transportation, such as cars. The system is capable of identifying and counting objects that cross imaginary lines in the frame, being able to count entries and exits in specific regions, such as the vehicle entry and exit line and the crossing line for people.

If an object crosses the line, a counter is incremented, and the detection is marked to avoid repeated counts.

There are still errors, such as the fact that, due to the low resolution of the video, the occlusion of people is constant, mainly because of the umbrellas. Also, I would need more lines to count the people on the sidewalk as well.