Flask-Powered Object Detection for Real-Time Analysis
Doyin Elugbadebo

Doyin Elugbadebo @doyinelugbadebo

About: A Software Developer with expertise in DevOps, MLOps, Machine Learning and Computer Vision

Joined:
Aug 12, 2019

Flask-Powered Object Detection for Real-Time Analysis

Publish Date: Mar 15
1 0

Computer vision is revolutionizing industries, from autonomous driving to real-time surveillance and healthcare diagnostics. One of the most powerful techniques in this field is the YOLO (You Only Look Once) - a state-of-the-art computer vision framework, known for its speed and accuracy in detecting multiple objects in a single image.

In this article, we'll walk through the process of building a Flask-based real-time object detection the YOLO framework.

By the end of this tutorial, you'll have a fully functional Flask Application that not only serves a single YOLO-based object detection, but also allows you to upload multiple images, process them in parallel, and download the results as a ZIP file.

Prerequisites

Before getting started, ensure you have the following:

The Yolo Framework

YOLO (You Only Look Once) is a state-of-the-art object detection algorithm known for its speed and accuracy. Unlike traditional object detection methods that rely on region proposals and multiple passes over an image (e.g. R-CNN, Faster R-CNN), YOLO treats detection as a single regression problem, predicting bounding boxes and class probabilities in one forward pass of the neural network.

We'll be using both YOLOv3 and YOLOv12 to gain a broader understanding of the YOLO framework and its evolution over time. Here's why:

  • YOLOv3 has been a time-tested model known for its balance between computational efficiency and detection accuracy, thanks to its Darknet-53 backbone. It remains widely used for real-time object detection tasks.

  • YOLOv12, the latest iteration in the YOLO family, released on February 18th, 2025. It introduces modern enhancements, improved accuracy, and better optimization for various hardware. The model achieves both a lower latency and higher mAP when benchmarked on the Microsoft COCO dataset.

By working with both versions, you'll not only master object detection but also learn how to transition between models efficiently for different use cases.

Why use the Flask Framework

If you're new to Flask, it's a lightweight and flexible micro web framework for Python, designed to make web development quick and easy. According to the Flask website, it provides the essentials for building web applications without enforcing too many restrictions.

That being said, being a minimalistic framework doesn't mean Flask lacks power. On the contrary, it offers extensive flexibility. You can extend it with various extensions and third-party libraries to add features like authentication, database integration, and more, making it suitable for production-ready applications.

Let's dive in!.

STEP 1: Setting Up Your Virtual Environment

When working on machine learning projects, it's always a good practice to use a virtual environment to isolate dependencies and avoid conflicts.

Create and activate the virtual environment using:

# Create virtual environment
python -m venv flask_env

# Activate it: on Windows:
flask_env\Scripts\activate

# On Linux/macOS:
source flask_env/bin/activate
Enter fullscreen mode Exit fullscreen mode

Once the virtual environment is activated, install Flask an other necessary dependencies such as Numpy and OpenCV (essential for image processing).

pip install flask
pip install opencv-python opencv-python-headless numpy
Enter fullscreen mode Exit fullscreen mode

STEP 2: Set Up the Flask Application

Flask prioritizes simplicity. Its minimalist design allows you to run an entire Flask app with just a single app.py file.

That's exactly what we'll do.

To proceed, make sure your virtual environment is activated. Next, create a directory for your project:

mkdir flask_object_detection  
cd flask_object_detection  
Enter fullscreen mode Exit fullscreen mode

Next, create a new app.py file with the following code:

from flask import Flask

# Create an instance of the Flask class
app = Flask(__name__)

# Define a route and a view function
@app.route('/')
def home():
    return "Hello, Flask!"

# Run the app
if __name__ == '__main__':
    app.run(debug=True)
Enter fullscreen mode Exit fullscreen mode

This script initializes a Flask web application by importing the Flask class from the flask module. It creates an instance of the Flask application, named app, and defines a route ('/') that maps to the home function.

When accessed, this route returns a simple "Hello, Flask!" message. Additionally, the script includes a condition to ensure that the Flask application starts in debug mode, which is particularly useful for auto-reloading and detailed error tracking during development.

Now start the Flask server

python app.py
Enter fullscreen mode Exit fullscreen mode

By default, Flask runs on http://127.0.0.1:5000/.

Open this URL in your browser, and you should see "Hello, Flask!" displayed.

STEP 3: Set Up Image Upload Functionality in Flask

Now that our basic Flask app is up and running, the next step is to implement image upload functionality.

To do this, we'll configure an upload folder to store incoming images and create a dedicated route to handle the upload process. We'll also import the necessary libraries that will support image processing and detection as we move forward.

Go ahead and update your app.py file with the following code:

from flask import Flask, request, render_template, send_file, flash, redirect, url_for, send_from_directory
from werkzeug.utils import secure_filename
import cv2
import numpy as np
import random
import os
import uuid
import zipfile
from io import BytesIO

app = Flask(__name__)
app.secret_key = os.urandom(24)

# Configure upload folders
app.config['UPLOAD_FOLDER'] = uploads
app.config['ALLOWED_EXTENSIONS'] = {'png', 'jpg', 'jpeg', 'bmp', 'tiff'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']


# ---------- other code here -----------


if __name__ == "__main__":
     app.run(debug=True, use_reloader=False)
Enter fullscreen mode Exit fullscreen mode

Code Explanation:

  • UPLOAD_FOLDER specifies the directory where uploaded images will be saved. It’s set to 'uploads', and the directory is automatically created at runtime using os.makedirs() if it doesn’t already exist.
  • ALLOWED_EXTENSIONS defines a set of accepted image file types. Only files with these extensions will be allowed during upload (e.g., .png, .jpg, .jpeg, etc.).
  • The allowed_file() function is a helper that checks whether an uploaded file has a valid extension. It ensures that only image files (based on the defined set) are processed by the app.
  • app.secret_key is required by Flask to securely sign session cookies and enable features like flashing messages. It's generated here using os.urandom(24), which provides a cryptographically secure random value.

Step 4: Download and Set Up YOLO Pre-trained Files

To begin with, download the YOLO pre-trained weights and configuration files from this link.

Once downloaded, create a new folder named "models" at the root folder of the application and extract the downloaded files there.

Ensure that your folder contains the following essential files:

  1. yolov3.weights: This file contains the pre-trained weights for the YOLOv3 model. These weights enable the model to perform object detection based on the features it has learned during training.
  2. yolov3.cfg: The configuration file that defines the model's architecture, including layer configurations, filter sizes, and other parameters necessary for building the YOLOv3 network.
  3. coco.names: A text file that lists the class names from the COCO dataset. The YOLOv3 model has been retrained on the COCO dataset, which includes more than 80 common object categories such as people, animals, and everyday objects.
  4. yolov12nt: The Tiny version of YOLOv12 Nano, designed for ultra-fast performance with lower computational requirements.

The first three files are required by YOLOv3 while YOLOv12 makes use of the last file.

STEP 5: Modelling Using Yolov3:

  • YOLOv3 is a significant iteration in the YOLO series, known for its remarkable object detection capabilities. Developed by Joseph Redmon and Ali Farhadi, YOLOv3 improves upon its predecessors (YOLOv1 and YOLOv2) by leveraging a deep neural network with multiple detection scales.

To work with YOlOv3, we'll be using OpenCV DNN module.

Now, make sure you have the extracted files in the models folder. Once that's settled, paste the following code after the configuration code in app.py:

# Load the YOLOv3 model configuration and weights
model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')

# Get all the layer names from the YOLO model
layer_names = model.getLayerNames()

# Identify the output layers (layers with no connections going forward)
unconnected_layers = model.getUnconnectedOutLayers()
output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
                 for i in unconnected_layers]

# Load COCO class names from file
with open('models/coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]
Enter fullscreen mode Exit fullscreen mode

This code sets up the YOLOv3 object detection model by first loading its pre-trained weights and configuration using cv2.dnn.readNet(), then retrieving all the layer names from the model with getLayerNames(). It identifies the output layers—which are the layers responsible for generating detection results—using getUnconnectedOutLayers(), and formats them correctly depending on whether the output is an array or integer. Finally, it loads the list of object class names from the COCO dataset by reading each line from the coco.names file and storing them as a stripped string in the classes list. This setup fully prepares the YOLO model to perform object detection tasks.

With this setup, your YOLO model is now ready to perform object detection using the COCO dataset. You can move on to building the detection logic that takes an image input, runs inference, and draws bounding boxes.

STEP 6: Define Routes and Templates for Image Processing

To handle image uploads and display the results, you'll need to define the necessary Flask routes and create corresponding HTML templates.

Start by creating a templates/ directory at the root of your project.

Inside this folder, create a file named index.html and paste the following HTML code into it:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Object Detection</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body class="bg-light">
    <nav class="navbar navbar-dark bg-dark">
        <div class="container">
            <a class="navbar-brand" href="/">Object Detection</a>
        </div>
    </nav>

    <div class="container py-5">
        <div class="text-center mb-4">
            <h1 class="display-4 text-primary">Object Detection</h1>
            <p class="lead">Upload images to detect objects using Yolov3 or Yolo11 Algorithms</p>
        </div>

        <div class="row justify-content-center">
            <div class="col-md-8">
                <div class="card shadow">
                    <div class="card-body">
                        <form action="/process" method="post" enctype="multipart/form-data" id="uploadForm">
                            <div class="mb-3">
                                <input type="file" name="files" 
                                       class="form-control" accept="image/*" multiple>
                                <div class="form-text">
                                    Select one or multiple images (PNG, JPG, JPEG, BMP, TIFF)
                                </div>
                            </div>
                            <button type="submit" class="btn btn-primary w-100">
                                Process Image
                            </button>
                        </form>
                    </div>
                </div>
            </div>
        </div> 

        <!-- Loading Spinner -->
        <div id="loadingSpinner" class="d-none mt-4 text-center">
            <div class="spinner-border text-primary" role="status" style="width: 3rem; height: 3rem;">
                <span class="visually-hidden">Loading...</span>
            </div>
            <p class="mt-3 text-muted">Processing images, please wait...</p>
        </div>  
    </div>

    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/js/bootstrap.bundle.min.js"></script>
    <script>
        document.getElementById('uploadForm').addEventListener('submit', function() {
            document.getElementById('loadingSpinner').classList.remove('d-none');
            this.querySelector('button').disabled = true;
        });
    </script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Also create a result.html file inside the templates folder. This page will display the image after processing:

result.html – Displaying the Processed Image

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Processing Result</title>
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body class="bg-light">
    <nav class="navbar navbar-dark bg-dark">
        <div class="container">
            <a class="navbar-brand" href="/">Object Detection</a>
        </div>
    </nav>

    <div class="container py-5">
        <div class="card shadow">
            <div class="card-body text-center">
                <h3 class="mb-4">✅ Processed Successfully</h3>
                <div class="alert alert-success">
                    Image processed with ID: {{ process_id }}
                </div>
                <img src="{{ image_url }}" class="img-fluid rounded" alt="Processed Result">
                <div class="mt-4">
                    <a href="/" class="btn btn-primary">Process Another Image</a>
                </div>
        </div>
        </div>
    </div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Your project folder should look like this:

flask_object_detection/
├── app.py
├── templates/
│   ├── index.html
│   └── result.html
└── static/
    └── uploads/       # This folder will store uploaded images
Enter fullscreen mode Exit fullscreen mode

After this, replace the index route in app.py with this:

@app.route('/')
def index():
    return render_template('index.html')
Enter fullscreen mode Exit fullscreen mode

Next, define routes to handle file uploads, image processing and post-processing

@app.route('/uploads/<filename>')
def serve_processed_image(filename):
    return send_from_directory(app.config['UPLOAD_FOLDER'], filename)

@app.route('/process', methods=['POST'])
def process_files():
    if 'files' not in request.files:
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    files = request.files.getlist('files')
    if len(files) == 0 or files[0].filename == '':
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    process_id = uuid.uuid4().hex  # Unique ID for this processing session
    processed_files = []

    for file in files:
        if file and allowed_file(file.filename):
            try:
                # Process image
                img = cv2.imdecode(np.frombuffer(file.read(), np.uint8), cv2.IMREAD_COLOR)

                #YOLOv3 detection and drawing code:
                boxes, confidences, class_ids, indexes = detect_objects(img)
                if len(indexes) > 0:
                    for i in indexes.flatten():
                        x, y, w, h = boxes[i]
                        label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                        color = [random.randint(0, 255) for _ in range(3)]
                        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                        cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Save processed image with unique name
                filename = f"{process_id}_{secure_filename(file.filename)}"
                save_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
                cv2.imwrite(save_path, img) #YOLOv3 Implementation
                processed_files.append(filename)

            except Exception as e:
                app.logger.error(f"Error processing {file.filename}: {str(e)}")
                flash(f'Error processing {file.filename}', 'error')

    if len(processed_files) == 0:
        flash('No files processed successfully', 'error')
        return redirect(url_for('index'))

    # Handle single file response
    if len(processed_files) == 1:
        return render_template('result.html', 
                             image_url=url_for('serve_processed_image', 
                             filename=processed_files[0]),
                             process_id=process_id)
Enter fullscreen mode Exit fullscreen mode

To make sure we're on the same page, here is the updated app.py at this point

from flask import Flask, request, render_template, send_file, flash, redirect, url_for, send_from_directory
from werkzeug.utils import secure_filename
import cv2
import numpy as np
import random
import os
import uuid
import zipfile
from io import BytesIO

app = Flask(__name__)
app.secret_key = os.urandom(24)


# Configure upload folders
#-------------------------------------
app.config['UPLOAD_FOLDER'] = 'static/uploads'
app.config['ALLOWED_EXTENSIONS'] = {'png', 'jpg', 'jpeg', 'bmp', 'tiff'}
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)

def allowed_file(filename):
    return '.' in filename and \
           filename.rsplit('.', 1)[1].lower() in app.config['ALLOWED_EXTENSIONS']


#Yolov3 Implementation
#--------------------------------------
# Load the YOLO model configuration and weights
model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')
# Get all the layer names from the YOLO model
layer_names = model.getLayerNames()
# Identify the output layers (layers with no connections going forward)
unconnected_layers = model.getUnconnectedOutLayers()
output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
                 for i in unconnected_layers]

# Load COCO class names from file
with open('models/coco.names', 'r') as f:
    classes = [line.strip() for line in f.readlines()]


# Routes
#----------------------------------------
@app.route('/')
def index():
    return render_template('index.html')

@app.route('/uploads/<filename>')
def serve_processed_image(filename):
    return send_from_directory(app.config['UPLOAD_FOLDER'], filename)

@app.route('/process', methods=['POST'])
def process_files():
    if 'files' not in request.files:
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    files = request.files.getlist('files')
    if len(files) == 0 or files[0].filename == '':
        flash('No files selected', 'error')
        return redirect(url_for('index'))

    process_id = uuid.uuid4().hex  # Unique ID for this processing session
    processed_files = []

    for file in files:
        if file and allowed_file(file.filename):
            try:
                # Process image
                img = cv2.imdecode(np.frombuffer(file.read(), np.uint8), cv2.IMREAD_COLOR)

                #YOLOv3 detection and drawing code:
                boxes, confidences, class_ids, indexes = detect_objects(img)
                if len(indexes) > 0:
                    for i in indexes.flatten():
                        x, y, w, h = boxes[i]
                        label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                        color = [random.randint(0, 255) for _ in range(3)]
                        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                        cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                # Save processed image with unique name
                filename = f"{process_id}_{secure_filename(file.filename)}"
                save_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
                cv2.imwrite(save_path, img) #YOLOv3 Implementation
                processed_files.append(filename)

            except Exception as e:
                app.logger.error(f"Error processing {file.filename}: {str(e)}")
                flash(f'Error processing {file.filename}', 'error')

    if len(processed_files) == 0:
        flash('No files processed successfully', 'error')
        return redirect(url_for('index'))

    # Handle single file response
    if len(processed_files) == 1:
        return render_template('result.html', 
                             image_url=url_for('serve_processed_image', 
                             filename=processed_files[0]),
                             process_id=process_id)

if __name__ == "__main__":
     app.run(debug=True, use_reloader=False)
Enter fullscreen mode Exit fullscreen mode

STEP 7: Start the Application

Start your Flask application by running:

python app.py
Enter fullscreen mode Exit fullscreen mode

Then, open your web browser and navigate to http://localhost:5000/.

You should see the index page with the upload form. When we upload an image, it should be processed, and the result displayed on the result page.

Image description

However, the app won't process any images at this point because we've not defined the function that will handle the actual object detection process.

Here is the terminal output:

[2025-03-12 05:41:51,484] ERROR in app: Error processing 0_lCB37mwYtKFKJcrI.jpg: name 'detect_objects' is not defined
Enter fullscreen mode Exit fullscreen mode

The error occurs because the detect_objects function is used in the '/process' route but hasn't been defined yet. Specifically, the issue arises from this line: boxes, confidences, class_ids, indexes = detect_objects(img), where the function is called but not implemented beforehand.

To handle the image processing logic, go back to app.py and paste the following code immediately after YOLOv3 implementation.

def detect_objects(img):
    height, width = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
    model.setInput(blob)
    outputs = model.forward(output_layers)

    boxes, confidences, class_ids = [], [], []
    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                box = detection[0:4] * np.array([width, height, width, height])
                (center_x, center_y, w, h) = box.astype("int")
                x = int(center_x - (w / 2))
                y = int(center_y - (h / 2))
                boxes.append([x, y, int(w), int(h)])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
    return boxes, confidences, class_ids, indexes
Enter fullscreen mode Exit fullscreen mode

The detect_objects(img) function processes an input image by extracting its dimensions and converting it into a blob for a deep learning model, which then performs object detection. It filters detections with confidence above 0.5, calculates bounding box coordinates, and applies Non-Maximum Suppression (NMS) to remove redundant boxes before returning the final detections, confidence scores, class IDs, and retained indexes.

Restart the server and test again:

python app.py
Enter fullscreen mode Exit fullscreen mode

Image description

Terminal output:

0: 448x640 1 person, 7506.0ms
Speed: 934.0ms preprocess, 7506.0ms inference, 1403.0ms postprocess per image at shape (1, 3, 448, 640)

0: 448x640 1 person, 831.0ms
Speed: 11.0ms preprocess, 831.0ms inference, 12.0ms postprocess per image at shape (1, 3, 448, 640)
127.0.0.1 - - [12/Mar/2025 05:33:36] "POST /process HTTP/1.1" 200 -
127.0.0.1 - - [12/Mar/2025 05:33:39] "GET /uploads/cb53a67b6db04b82910a1895d1a1887d_0_lCB37mwYtKFKJcrI_-_Copy_-_Copy.jpg HTTP/1.1" 200 -
Enter fullscreen mode Exit fullscreen mode

STEP 8: Processing Multiple Images

Now, let’s process multiple Image file uploads. Go back to app.py and paste this after the single file code code snippet

# Handle multiple files as zip
    zip_filename = f"processed_images_{process_id}.zip"
    zip_buffer = BytesIO()

    with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
        for filename in processed_files:
            file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
            zip_file.write(file_path, filename)

    zip_buffer.seek(0)

    response = send_file(zip_buffer,
                        mimetype='application/zip',
                        as_attachment=True,
                        download_name=zip_filename)

    # Add custom header to identify zip response
    response.headers['X-Content-Type'] = 'application/zip'
    return response
Enter fullscreen mode Exit fullscreen mode

Ensure the server is running, then upload multiple image files through the application. The app will begin processing the images and automatically zip all the processed files.

Once processing is complete, the zipped file will be downloaded to your default Downloads directory.

With that, we can now switch from YOLOv3 model to YOLOv12.

STEP 9: Switching to Yolov12

YOLOv12 is an advanced object detection model that emphasizes attention mechanisms to enhance detection accuracy while maintaining real-time processing speeds. It was released on February 18, 2025 and was developed by Yunjie Tian, Qixiang Ye, David Doermann (Read the paper here). This model surpasses previous iterations, such as YOLOv10 and YOLOv11, by achieving higher mean Average Precision (mAP) scores with comparable or faster inference times.

Its architecture supports various tasks, including object detection, segmentation, classification, keypoint detection, and oriented bounding box detection.

First things first, note that YOLOv3 uses OpenCV’s DNN module, while YOLOv12 is accessed via ultralytics library, which has a different API.

So, go ahead to install ultralytics library

pip install ultralytics:
Enter fullscreen mode Exit fullscreen mode

One more thing:- the original code for YOLOv3 involves loading the model with cv2.dnn.readNet, processing blobs, and handling outputs through specific layers. For YOLOv12, the Ultralytics model is more straightforward. The model is loaded directly with YOLO('model.pt'), and predictions are made with model.predict(), which returns a Results object.

Essentially, the key changes when transitioning to YOLOv12 lie in how the model is loaded and how the detection function is implemented. The traditional detect_objects function used in YOLOv3 where outputs are manually parsed and bounding boxes are manually calculated, will be replaced with a much simpler approach.

YOLOv12 handles most of the processing internally, allowing you to extract bounding boxes, confidence scores, and class IDs directly from the Results object. Additionally, YOLOv12 provides a built-in plot() method that automatically annotates the detected objects on the image. These improvements make the new detect_objects function significantly shorter, typically 5–10x more concise—and much less error-prone, as you no longer need to manually compute box coordinates or apply post-processing logic.

Now let's upgrade from the legacy YOLOv3 model to the more modern and efficient Ultralytics YOLO implementation.

In your app.py file, locate and replace the existing YOLOv3 model loading and the detect_objects() function with the Ultralytics-based approach.

Specifically, replace this section:

# ========== YOLOv3 Implementation (Original) ==========
model = cv2.dnn.readNet('models/yolov3.weights', 'models/yolov3.cfg')
layer_names = model.getLayerNames()
unconnected_layers = model.getUnconnectedOutLayers()
output_layers = [layer_names[i[0] - 1] if isinstance(i, np.ndarray) else layer_names[i - 1] 
                  for i in unconnected_layers]

def detect_objects(img):
    height, width = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
    model.setInput(blob)
    outputs = model.forward(output_layers)
    # ... rest of detection logic ...
Enter fullscreen mode Exit fullscreen mode

With this:

# ========== YOLOv12 Implementation (New) ==========
from ultralytics import YOLO

# Load YOLOv12 model
model = YOLO('models/yolov12n.pt')  

def detect_objects(img):
    results = model.predict(img)
    return results[0].plot()  # Returns the annotated image directly
Enter fullscreen mode Exit fullscreen mode

Also, replace this section under @app.route('/process', methods=['POST']):

# ======== REPLACE THIS SECTION ========
                 Old YOLOv3 detection and drawing code:
                 boxes, confidences, class_ids, indexes = detect_objects(img)
                 if len(indexes) > 0:
                     for i in indexes.flatten():
                         x, y, w, h = boxes[i]
                         label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
                         color = [random.randint(0, 255) for _ in range(3)]
                         cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
                         cv2.putText(img, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
Enter fullscreen mode Exit fullscreen mode

With this:

# New YOLOv12 detection and auto-annotation:
                results = model.predict(img)
                annotated_img = detect_objects(img)
Enter fullscreen mode Exit fullscreen mode

Then immediately after that, replace:

cv2.imwrite(save_path, img) 
Enter fullscreen mode Exit fullscreen mode

with this:

cv2.imwrite(save_path, annotated_img)
Enter fullscreen mode Exit fullscreen mode

Now, go ahead and test both single and multiple file uploads again. The images will be processed and outputted as expected.

There's More!

Congrats on making it this far! Now that we've reached the end of the guide, there are still several ways to enhance the app:

  1. Improved File Handling: Currently, the app processes multiple images only as a zipped file and automatically downloads them. You can enhance this by displaying all processed images in a grid layout with an option to download them individually.
  2. Video Processing Support: Right now, object detection works only for images. Extend the app to support video processing, allowing users to upload videos directly, analyze YouTube videos, or even process surveillance feeds in real time.
  3. Authenticated API & User Dashboard: You can implement an authentication system where only registered users can access the app. Provide users with a dashboard where they can choose different object detection and segmentation methods for their needs.

If you need further customizations or guidance, feel free to reach out.

You can download the full code from the GitHub repo

Love the guide?

Your support will be appreciated

Buy Me A Coffee

Conclusion

In conclusion, this guide has provided a comprehensive overview of how to build and deploy a real-time object detection API using Flask and YOLOv3 and YOLOv12. By examining the strengths and differences between these two versions of the YOLO framework, you now have a deeper understanding of how to optimize object detection models for speed, accuracy, and scalability.

References

  1. https://pjreddie.com/darknet/yolo/
  2. https://blog.roboflow.com/train-yolov12-model/
  3. https://learnopencv.com/yolov12/
  4. https://github.com/sunsmarterjie/yolov12
  5. https://roboflow.com/model/yolov12
  6. https://www.arxiv.org/pdf/2502.12524

Comments 0 total

    Add comment