Open-CV object detection on windows 11

Introduction

Demo

Hardware and Software Requirements

Hardware

Software

Installation

YOLOv3 Configuration and Weights Files

Core Concepts

What is OpenCV?

What is YOLOv3?

How Does Object Detection Work?

How Does YOLOv3 Work?

Project Demonstration

Code Walkthrough

Advanced Topics (Optional)

Conclusion and Audience Engagement

Introduction

Welcome to TexoBot Learning! Today, we’re diving into the world of computer vision. We’ll build a real-time object detector using the power of OpenCV, Python, and the YOLOv3 model. In this tutorial, you’ll learn how to identify objects directly from your webcam feed.

Demo

Before we get into the details, let’s see what we’re building. Here’s a quick demo of our object detection project in action:

Hardware and Software Requirements

Before we get started, let's make sure you have the right tools for the job.

Hardware

Computer with a decent CPU: A modern multi-core processor will help with faster processing.
8GB of RAM: Recommended, especially if you’re working with high-resolution video streams.
Webcam: Any basic webcam should work, but a higher quality one will give you better results. If possible, connect your webcam to a USB 3.0 port for faster performance.

Software

Operating System: This demonstration uses Windows 11, but it should also work on Windows 10 or older versions like Windows 7 with some adjustments.
Python 3.7.0: It’s important to stick with this version because different Python versions can sometimes lead to compatibility issues with libraries like OpenCV. You can download Python 3.7.0 from the official Python website.

Installation

After installing Python 3.7.0, open your command prompt or terminal and type python --version. Ensure the output shows Python 3.7.0. If Python is correctly installed, you can install the necessary libraries using the following pip commands:

pip install numpy==1.21.6

pip install opencv-contrib-python==4.0.1.24

YOLOv3 Configuration and Weights Files

You’ll need the following files, which can be downloaded from the YOLO website or other reputable sources:

yolov3.cfg: This configuration file defines the architecture of the YOLOv3 neural network.
yolov3.weights: This file contains the pre-trained weights of the YOLOv3 model.
coco.names: This file contains the names of the object classes that YOLOv3 is trained to detect.

Make sure to place these files in a convenient location and update the paths in your Python code accordingly.

Download YOLOv3 config files from Google Drive Link

Core Concepts

Now that we have our tools ready, let’s dive into the core concepts of object detection.

What is OpenCV?

OpenCV is our toolbox for image and video processing. It provides functions for everything from reading images to complex tasks like object detection.

What is YOLOv3?

YOLOv3 (You Only Look Once) is a cutting-edge neural network model that’s incredibly fast and accurate at recognizing objects.

How Does Object Detection Work?

Imagine you have a picture of a busy street. Object detection is like having a super-smart detective who can instantly pinpoint and label all the cars in that image.

Under the hood, object detectors use neural networks to analyze the image. They look for patterns, shapes, and features that are characteristic of cars. The model divides the image into a grid and examines each section. If it finds something that looks like a car, it draws a box around it (a bounding box) and labels it as a car.

The better the training data the model has seen, the more accurate it becomes at recognizing cars in different situations.

How Does YOLOv3 Work?

YOLOv3 is particularly good at being fast and efficient. It divides the image into a grid and makes predictions about the objects in each cell. It then refines those predictions to give us precise bounding boxes and labels.

Here’s a simple diagram to illustrate this process:

Project Demonstration

I’ll be using PyCharm as my Integrated Development Environment (IDE), but you can use any code editor you’re comfortable with.

Code Walkthrough

Let’s break down the code step by step. Each part of the script contributes to our object detector:

============================================================================================

import numpy as np #Imports the NumPy library for numerical operations

import cv2 #Imports the OpenCV library for computer vision tasks.

whT = 608 #Defines the width and height of the input image for the YOLO model (width=608 and height=608).

confThreshold = 0.5 #Sets the confidence threshold for bounding box detection. Only boxes with confidence scores above this value are considered

nmsThreshold = 0.4 #Sets the Non-Maximum Suppression (NMS) threshold. NMS helps remove redundant bounding boxes for the same object.

# Loads the class names (like "person", "car") from the coco.names file.

classes = open('../YOLOv3/coco.names').read().strip().split('\n')

# Load YOLO model

modelConfig = '../YOLOv3/yolov3.cfg' #Path to the YOLOv3 configuration file (yolov3.cfg).

modelWeights = '../YOLOv3/yolov3.weights' #Path to the YOLOv3 weights file (yolov3.weights)

net = cv2.dnn.readNetFromDarknet(modelConfig, modelWeights) # Loads the YOLO model using cv2.dnn.readNetFromDarknet

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV) #Sets the preferred backend for the model to OpenCV.

net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU) #Sets the preferred target device to CPU for faster inference on most systems

# Get the names of the output layers

def getOutputsNames(net): #Retrieves the output layer names from the YOLO model.

layer_names = net.getLayerNames()

output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

return output_layers

# Draw the predicted bounding box

def drawPred(classId, conf, left, top, right, bottom): # Draws the predicted bounding box with class label and confidence score on the frame.

cv2.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)

label = '%.2f' % conf

if classes:

assert(classId < len(classes))

label = '%s:%s' % (classes[classId], label)

labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)

top = max(top, labelSize[1])

cv2.rectangle(frame, (left, top - round(1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255, 255, 255), cv2.FILLED)

cv2.putText(frame, label, (left, top), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0, 0, 0), 1)

# Remove the bounding boxes with low confidence using non-maxima suppression

def postprocess(frame, outp): #Extracts relevant information from the model's output.

frameHeight = frame.shape[0]

frameWidth = frame.shape[1]

classIds = []

confidences = []

boxes = []

for out in outp:

for detection in out:

scores = detection[5:]

classId = np.argmax(scores)

confidence = scores[classId]

if confidence > confThreshold:

center_x = int(detection[0] * frameWidth)

center_y = int(detection[1] * frameHeight)

width = int(detection[2] * frameWidth)

height = int(detection[3] * frameHeight)

left = int(center_x - width / 2)

top = int(center_y - height / 2)

classIds.append(classId)

confidences.append(float(confidence))

boxes.append([left, top, width, height])

indices = cv2.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)

for i in indices:

i = i[0]

box = boxes[i]

left = box[0]

top = box[1]

width = box[2]

height = box[3]

drawPred(classIds[i], confidences[i], left, top, left + width, top + height)

# Open webcam

cap = cv2.VideoCapture(0) # Use 0 for the default webcam

while True: #loop continuously captures frames from the webcam:

success, frame = cap.read() #Reads a frame from the webcam using cap.read().

if not success:

print("Failed to capture frame from webcam")

break

blob = cv2.dnn.blobFromImage(frame, 1 / 255, (whT, whT), [0, 0, 0], 1, crop=False) #Prepares the frame for the YOLO model using cv2.dnn.blobFromImage

net.setInput(blob) #Sets the pre-processed image as input to the model using net.setInput(blob)

outputs = net.forward(getOutputsNames(net)) #Performs forward pass using net.forward(getOutputsNames(net)) to get detections.

postprocess(frame, outputs) #Applies post-processing using postprocess(frame, outputs) to filter and visualize detections.

cv2.imshow('Webcam Object Detection', frame) #Displays the resulting frame with bounding boxes using cv2.imshow

if cv2.waitKey(1) & 0xFF == ord('q'): # Exits the loop if 'q' key is pressed

break

cap.release() #Releases the webcam resource.

cv2.destroyAllWindows() #Closes all OpenCV windows.

==========================================================================================

Run your code live to see the object detection in action. The script will identify objects and draw bounding boxes around them. You can also see the confidence score displayed, indicating how certain the script is about the detection.

This is just a basic demonstration, but it shows the power of object detection with Python. With some modifications, you can customize this script for your specific needs.

Advanced Topics (Optional)

If you want to detect specific objects not in the standard YOLOv3 list, you can train your own custom models. This is a more advanced topic but definitely worth exploring.

Conclusion and Audience Engagement

I hope you found this tutorial helpful. Before we wrap up, I’d love to hear your thoughts! What other creative applications can you imagine for this object detection project? Let me know in the comments below, and maybe your idea will inspire my next article. Thanks for reading! If you enjoyed this tutorial, please like, share, and follow for more computer vision projects.

Google Sites

Report abuse