Welcome to TexoBot Learning! Today, we’re diving into the world of computer vision. We’ll build a real-time object detector using the power of OpenCV, Python, and the YOLOv3 model. In this tutorial, you’ll learn how to identify objects directly from your webcam feed.
Before we get into the details, let’s see what we’re building. Here’s a quick demo of our object detection project in action:
Before we get started, let's make sure you have the right tools for the job.
Computer with a decent CPU: A modern multi-core processor will help with faster processing.
8GB of RAM: Recommended, especially if you’re working with high-resolution video streams.
Webcam: Any basic webcam should work, but a higher quality one will give you better results. If possible, connect your webcam to a USB 3.0 port for faster performance.
Operating System: This demonstration uses Windows 11, but it should also work on Windows 10 or older versions like Windows 7 with some adjustments.
Python 3.7.0: It’s important to stick with this version because different Python versions can sometimes lead to compatibility issues with libraries like OpenCV. You can download Python 3.7.0 from the official Python website.
After installing Python 3.7.0, open your command prompt or terminal and type python --version. Ensure the output shows Python 3.7.0. If Python is correctly installed, you can install the necessary libraries using the following pip commands:
pip install numpy==1.21.6
pip install opencv-contrib-python==4.0.1.24
You’ll need the following files, which can be downloaded from the YOLO website or other reputable sources:
yolov3.cfg: This configuration file defines the architecture of the YOLOv3 neural network.
yolov3.weights: This file contains the pre-trained weights of the YOLOv3 model.
coco.names: This file contains the names of the object classes that YOLOv3 is trained to detect.
Make sure to place these files in a convenient location and update the paths in your Python code accordingly.
Download YOLOv3 config files from Google Drive Link
Now that we have our tools ready, let’s dive into the core concepts of object detection.
OpenCV is our toolbox for image and video processing. It provides functions for everything from reading images to complex tasks like object detection.
YOLOv3 (You Only Look Once) is a cutting-edge neural network model that’s incredibly fast and accurate at recognizing objects.
Imagine you have a picture of a busy street. Object detection is like having a super-smart detective who can instantly pinpoint and label all the cars in that image.
Under the hood, object detectors use neural networks to analyze the image. They look for patterns, shapes, and features that are characteristic of cars. The model divides the image into a grid and examines each section. If it finds something that looks like a car, it draws a box around it (a bounding box) and labels it as a car.
The better the training data the model has seen, the more accurate it becomes at recognizing cars in different situations.
YOLOv3 is particularly good at being fast and efficient. It divides the image into a grid and makes predictions about the objects in each cell. It then refines those predictions to give us precise bounding boxes and labels.
Here’s a simple diagram to illustrate this process:
I’ll be using PyCharm as my Integrated Development Environment (IDE), but you can use any code editor you’re comfortable with.
Let’s break down the code step by step. Each part of the script contributes to our object detector:
============================================================================================
import numpy as np #Imports the NumPy library for numerical operations
import cv2 #Imports the OpenCV library for computer vision tasks.
whT = 608 #Defines the width and height of the input image for the YOLO model (width=608 and height=608).
confThreshold = 0.5 #Sets the confidence threshold for bounding box detection. Only boxes with confidence scores above this value are considered
nmsThreshold = 0.4 #Sets the Non-Maximum Suppression (NMS) threshold. NMS helps remove redundant bounding boxes for the same object.
# Loads the class names (like "person", "car") from the coco.names file.
classes = open('../YOLOv3/coco.names').read().strip().split('\n')
# Load YOLO model
modelConfig = '../YOLOv3/yolov3.cfg' #Path to the YOLOv3 configuration file (yolov3.cfg).
modelWeights = '../YOLOv3/yolov3.weights' #Path to the YOLOv3 weights file (yolov3.weights)
net = cv2.dnn.readNetFromDarknet(modelConfig, modelWeights) # Loads the YOLO model using cv2.dnn.readNetFromDarknet
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV) #Sets the preferred backend for the model to OpenCV.
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU) #Sets the preferred target device to CPU for faster inference on most systems
# Get the names of the output layers
def getOutputsNames(net): #Retrieves the output layer names from the YOLO model.
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
return output_layers
# Draw the predicted bounding box
def drawPred(classId, conf, left, top, right, bottom): # Draws the predicted bounding box with class label and confidence score on the frame.
cv2.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)
label = '%.2f' % conf
if classes:
assert(classId < len(classes))
label = '%s:%s' % (classes[classId], label)
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
top = max(top, labelSize[1])
cv2.rectangle(frame, (left, top - round(1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255, 255, 255), cv2.FILLED)
cv2.putText(frame, label, (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 0), 1)
# Remove the bounding boxes with low confidence using non-maxima suppression
def postprocess(frame, outp): #Extracts relevant information from the model's output.
frameHeight = frame.shape[0]
frameWidth = frame.shape[1]
classIds = []
confidences = []
boxes = []
for out in outp:
for detection in out:
scores = detection[5:]
classId = np.argmax(scores)
confidence = scores[classId]
if confidence > confThreshold:
center_x = int(detection[0] * frameWidth)
center_y = int(detection[1] * frameHeight)
width = int(detection[2] * frameWidth)
height = int(detection[3] * frameHeight)
left = int(center_x - width / 2)
top = int(center_y - height / 2)
classIds.append(classId)
confidences.append(float(confidence))
boxes.append([left, top, width, height])
indices = cv2.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
for i in indices:
i = i[0]
box = boxes[i]
left = box[0]
top = box[1]
width = box[2]
height = box[3]
drawPred(classIds[i], confidences[i], left, top, left + width, top + height)
# Open webcam
cap = cv2.VideoCapture(0) # Use 0 for the default webcam
while True: #loop continuously captures frames from the webcam:
success, frame = cap.read() #Reads a frame from the webcam using cap.read().
if not success:
print("Failed to capture frame from webcam")
break
blob = cv2.dnn.blobFromImage(frame, 1 / 255, (whT, whT), [0, 0, 0], 1, crop=False) #Prepares the frame for the YOLO model using cv2.dnn.blobFromImage
net.setInput(blob) #Sets the pre-processed image as input to the model using net.setInput(blob)
outputs = net.forward(getOutputsNames(net)) #Performs forward pass using net.forward(getOutputsNames(net)) to get detections.
postprocess(frame, outputs) #Applies post-processing using postprocess(frame, outputs) to filter and visualize detections.
cv2.imshow('Webcam Object Detection', frame) #Displays the resulting frame with bounding boxes using cv2.imshow
if cv2.waitKey(1) & 0xFF == ord('q'): # Exits the loop if 'q' key is pressed
break
cap.release() #Releases the webcam resource.
cv2.destroyAllWindows() #Closes all OpenCV windows.
==========================================================================================
Run your code live to see the object detection in action. The script will identify objects and draw bounding boxes around them. You can also see the confidence score displayed, indicating how certain the script is about the detection.
This is just a basic demonstration, but it shows the power of object detection with Python. With some modifications, you can customize this script for your specific needs.
If you want to detect specific objects not in the standard YOLOv3 list, you can train your own custom models. This is a more advanced topic but definitely worth exploring.
I hope you found this tutorial helpful. Before we wrap up, I’d love to hear your thoughts! What other creative applications can you imagine for this object detection project? Let me know in the comments below, and maybe your idea will inspire my next article. Thanks for reading! If you enjoyed this tutorial, please like, share, and follow for more computer vision projects.