Custom Video Classification Using YOLOv8

Published in

Heartbeat

11 min readAug 16, 2023

Introduction

With the increase in visual data, it can be hard to sort and classify videos, making it difficult for Search Engine Optimization (SEO) algorithms to sort out the video data. YouTube has a vast amount of videos, Instagram reels and TikToks are trending, and OTT platforms have emerged and contributed to the video data lake. The internet is overflooded with videos, making it tedious to search for them if there is no mechanism to classify and organize such data.

While various machine learning and deep learning algorithms have been developed to carry out this task, I found YOLOv8, released in January 2023, to be the most accurate and fast model. In fact, it reached an accuracy of 100 percent within a short amount of time on training on a large amount of data.

In this article, I’ll walk you through the practical implementation of the same from data collection to deployment, so be ready with your popcorn and let us start streaming.

We will collect data for four classes: entertainment, news, sports, and music.

Steps followed

1) Data Collection

Creating the Google credentials and generating the YouTube Data API Key
Scraping Youtube links using Python code and a generated API Key
Downloading the videos of the links saved

2) Setup and Installations

Setting up the virtual Python 3.9 environment
Activating the virtual environment

3) Data Pre-processing

Dividing the videos into 32 frames each
Resizing the frames into size 640x640

4) Training and Setting it to 500 Epochs

5) Prediction on a Sample Video

6) Deployment Using the Gradio App

Data Collection

While there are many open-source datasets and video repositories for computer vision applications, in my opinion, it still needs some work. Most of the open data I found online were related to daily activities or sports. These videos are too random, and such datasets are not enough to collect data for a specific task.

As a DIY hacker, I try to find out unique ways to collect data. Doing this helps prepare a better model, and the predictions are accurate with a high probability.

So for this project, I found a quick hack.

Go to https://console.cloud.google.com/ and log in. Create a new project.

After naming your project, it will be created in seconds, and you can access your project dashboard.

Now, we have to generate a YouTube API Key.

Click on the burger menu, hover over APIs and Services, and click Enabled APIs and Services.

You will be directed to the various API services that Google offers. Scroll down to the YouTube section.

Select the YouTube Data API v3, which is the first option.

And then enable it.

Next, we have to add credentials, and your API Key will be generated. Copy the API Key, as it will be used to collect the data.

So what we need right now is a list of 1000 links to YouTube videos of each class. We will save them to a text file.

The Python script to save the YouTube links is:

# Import the necessary modules
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError

# Prompt the user to enter their API key
api_key = input("Enter your YouTube Data API key: ")

# Initialize the YouTube Data API client
youtube = build('youtube', 'v3', developerKey=api_key)

# Search for news-related videos
video_ids = []
next_page_token = None

while len(video_ids) < 1000:
    try:
        search_response = youtube.search().list(
            q='entertainment',
            type='video',
            videoDefinition='high',
            videoDuration='short',
            part='id',
            maxResults=50,
            pageToken=next_page_token
        ).execute()

        # Extract the video IDs from the search results
        video_ids.extend([item['id']['videoId'] for item in search_response['items']])
        next_page_token = search_response.get('nextPageToken')

        if next_page_token is None:
            break

    except HttpError as e:
        print('An error occurred: %s' % e)
        break

# Download the video links and save them to a text file
with open('entertainment_videos.txt', 'w') as f:
    for video_id in video_ids[:1000]:
        video_url = f'https://www.youtube.com/watch?v={video_id}'
        f.write(video_url + '\n')

The above code accepts a class as the query and downloads 1000 video links of that particular class to a text file.

Now, your folder structure should be something like this:

videoClassification
|__ scrape.py
|__ entertainment_videos.txt
|__ news_videos.txt
|__ sports_videos.txt
|__ music_videos.txt

Create four empty folders: entertainment_videos, news_videos, sports_videos, and music_videos.

I wrote a Python script to download videos from links contained in the text files:

from pytube import YouTube

# Define the path to the text file containing the links
path_to_file = "entertainment_videos.txt"

# Define the path to the folder where you want to save the videos
path_to_folder = "entertainment_videos/"

# Read the links from the text file
with open(path_to_file, "r") as f:
    links = f.readlines()

# Loop through the links and download the videos
for link in links:
    try:
        # Create a YouTube object for the video
        yt = YouTube(link)

        # Get the highest resolution video stream
        stream = yt.streams.get_highest_resolution()

        # Download the video to the specified folder
        stream.download(output_path=path_to_folder)

        print(f"Successfully downloaded {yt.title}")
    except:
        print(f"Failed to download {link}")

Now that we have the video data, we can save it in the train test and validate folders inside the data folder. So the folder structure should now be:

videoClassification/
│
├── video_data/
│   ├── train/
│   │   ├── entertainment/
│   │   │   ├── video1.mp4
│   │   │   ├── video2.mp4
│   │   │   └── ...
│   │   ├── music/
│   │   │   ├── video3.mp4
│   │   │   ├── video4.mp4
│   │   │   └── ...
│   │   ├── news/
│   │   │   ├── video5.mp4
│   │   │   ├── video6.mp4
│   │   │   └── ...
│   │   └── sports/
│   │       ├── video7.mp4
│   │       ├── video8.mp4
│   │       └── ...
│   │
│   ├── test/
│   │   ├── video9.mp4
│   │   ├── video10.mp4
│   │   └── ...
│   │
│   └── validate/
│       ├── video11.mp4
│       ├── video12.mp4
│       └── ...

Setting Up The Environment

conda create -n ultralytics pip python=3.9

Go to your working directory and set up a python3.9 virtual environment.

Activate it using the command:

!source activate ultralytics

Then install the dependencies using:

!pip install -r requirements.txt

Below is the requirements.txt file.

absl-py==1.4.0
aiofiles==23.1.0
aiohttp==3.8.4
aioice==0.9.0
aiortc==1.5.0
aiosignal==1.3.1
altair==4.2.2
anyio==3.6.2
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.2.0
av==10.0.0
blinker==1.5
Brotli==1.0.9
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
click==8.1.3
cmake==3.26.1
coloredlogs==15.0.1
contourpy==1.0.7
cryptography==40.0.1
cycler==0.11.0
decorator==5.1.1
dnspython==2.3.0
entrypoints==0.4
fastapi==0.95.0
ffmpy==0.3.0
filelock==3.10.7
flatbuffers==23.3.3
fonttools==4.39.3
frozenlist==1.3.3
fsspec==2023.3.0
gast==0.4.0
gitdb==4.0.10
GitPython==3.1.31
google-api-core==2.11.0
google-api-python-client==2.83.0
google-auth==2.17.1
google-auth-httplib2==0.1.0
google-auth-oauthlib==1.0.0
google-crc32c==1.5.0
google-pasta==0.2.0
googleapis-common-protos==1.59.0
gradio==3.24.1
gradio_client==0.0.5
grpcio==1.53.0
h11==0.14.0
h5py==3.8.0
httpcore==0.16.3
httplib2==0.22.0
httpx==0.23.3
huggingface-hub==0.13.3
humanfriendly==10.0
idna==3.4
ifaddr==0.2.0
importlib-metadata==6.1.0
importlib-resources==5.12.0
jax==0.4.8
Jinja2==3.1.2
jsonschema==4.17.3
keras==2.12.0
kiwisolver==1.4.4
libclang==16.0.0
linkify-it-py==2.0.0
lit==16.0.0
Markdown==3.4.3
markdown-it-py==2.2.0
MarkupSafe==2.1.2
matplotlib==3.7.1
mdit-py-plugins==0.3.3
mdurl==0.1.2
ml-dtypes==0.0.4
mpmath==1.3.0
multidict==6.0.4
mutagen==1.46.0
networkx==3.0
numpy==1.23.5
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
oauthlib==3.2.2
onnx==1.13.1
onnx-graphsurgeon==0.3.26
onnx2tf==1.8.8
onnxruntime==1.14.1
onnxsim==0.4.19
opencv-python==4.7.0.72
opt-einsum==3.3.0
orjson==3.8.9
packaging==23.0
pafy==0.5.5
pandas==1.5.3
Pillow==9.4.0
protobuf==3.20.3
psutil==5.9.4
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.10.4
pycparser==2.21
pycryptodomex==3.17
pydantic==1.10.7
pydeck==0.8.0
pydub==0.25.1
pyee==9.0.4
Pygments==2.14.0
pylibsrtp==0.8.0
Pympler==1.0.1
pyOpenSSL==23.1.1
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-multipart==0.0.6
pytube==12.1.3
pytz==2023.3
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0
requests==2.28.2
requests-oauthlib==1.3.1
rfc3986==1.5.0
rich==13.3.3
rsa==4.9
scipy==1.10.1
seaborn==0.12.2
semantic-version==2.10.0
semver==2.13.0
sentry-sdk==1.18.0
six==1.16.0
smmap==5.0.0
sng4onnx==1.0.1
sniffio==1.3.0
sounddevice==0.4.6
starlette==0.26.1
streamlit==1.20.0
streamlit-webrtc==0.45.0
sympy==1.11.1
tensorboard==2.12.1
tensorboard-data-server==0.7.0
tensorboard-plugin-wit==1.8.1
tensorflow-cpu==2.12.0
tensorflow-estimator==2.12.0
tensorflow-io-gcs-filesystem==0.32.0
termcolor==2.2.0
tflite-support==0.4.3
thop==0.1.1.post2209072238
toml==0.10.2
toolz==0.12.0
torch==2.0.0
torchvision==0.15.1
tornado==6.2
tqdm==4.65.0
triton==2.0.0
typing_extensions==4.5.0
tzdata==2023.3
tzlocal==4.3
uc-micro-py==1.0.1
ultralytics==8.0.58
uritemplate==4.1.1
urllib3==1.26.15
uvicorn==0.21.1
validators==0.20.0
watchdog==3.0.0
websockets==10.4
Werkzeug==2.2.3
wrapt==1.14.1
yarl==1.8.2
youtube-dl==2021.12.17
yt-dlp @ https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz
zipp==3.15.0

NOTE: Installing Ultralytics alone will install all the necessary dependencies. The additional libraries can be installed as you are working on the project.

Data Preprocessing

Once we have downloaded the videos, we have to divide them into frames. In this case, I divided each video into 32 frames to reduce the class imbalance.

import cv2
import os

def extract_frames(video_path, output_dir, num_frames=16):
    # Open the video file
    cap = cv2.VideoCapture(video_path)

    # Get the total number of frames in the video
    num_frames_total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    # Calculate the step size for the sliding window based on the desired number of frames
    step_size = max(num_frames_total // num_frames, 1)

    # Initialize the current frame number and the frame counter
    frame_num = 0
    count = 0

    # Loop through the video frames and extract a frame every step_size frames
    while True:
        # Read the current frame
        ret, frame = cap.read()

        # If we've reached the end of the video, break out of the loop
        if not ret:
            break

        # If the current frame number is a multiple of the step size, save the frame
        if frame_num % step_size == 0:
            # Construct the output filename
            output_path = os.path.join(output_dir, f'frame_{count:04d}.jpg')

            # Save the frame to disk
            cv2.imwrite(output_path, frame)

            # Increment the frame counter
            count += 1

        # Increment the current frame number
        frame_num += 1

    # Release the video file
    cap.release()

    # Return the total number of frames extracted
    return count

# Define the input and output directories
input_dir = '/home/surya/PycharmProjects/video_classification/news'
output_dir = '/home/surya/PycharmProjects/video_classification/news_frames'



# Number of frames to extract from each video
num_frames = 16

# Loop through all video files in the input directory
for filename in os.listdir(input_dir):
    if filename.endswith('.mp4') or filename.endswith('.avi'):
        # Path to input and output files
        input_file = os.path.join(input_dir, filename)
        output_file = os.path.join(output_dir, os.path.splitext(filename)[0])

        # Create the output directory if it doesn't exist
        if not os.path.exists(output_file):
            os.makedirs(output_file)

        # Extract frames from the video file
        extract_frames(input_file, output_file, num_frames)

        print(f'{num_frames} frames extracted from {input_file} and saved to {output_file}')

Once divided into frames, our problem now becomes an image classification statement which is the easiest task in computer vision compared to object detection and image segmentation. Given that we are using YOLOv8, the problem becomes easier to solve as it is faster and more robust than most algorithms giving high accuracy.

Extract and Resize

Now we have to resize all the frames, which takes only a few minutes. After that, we are ready to train.

from PIL import Image
import os
import uuid

# Set the paths for the input and output directories
input_dir = "news_frames"
output_dir = "news"

# Create the output directory if it doesn't exist
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Loop through all the subdirectories of the input directory
for subdir in os.listdir(input_dir):
    subdir_path = os.path.join(input_dir, subdir)
    if os.path.isdir(subdir_path):
        # Loop through all the image files in the subdirectory
        for filename in os.listdir(subdir_path):
            if filename.endswith(".jpg") or filename.endswith(".png"):
                # Open the image and resize it to 64x64
                image_path = os.path.join(subdir_path, filename)
                image = Image.open(image_path)
                resized_image = image.resize((640, 640))

                # Generate a unique filename and save the resized image to the output directory
                unique_filename = str(uuid.uuid4()) + ".jpg"
                output_path = os.path.join(output_dir, unique_filename)
                resized_image.save(output_path)

Organize the folder structure to look like this:

videoClassification/
│
├── img_data/
│   ├── train/
│   │   ├── entertainment/
│   │   │   ├── img1.jpg
│   │   │   ├── img2.jpg
│   │   │   └── ...
│   │   ├── music/
│   │   │   ├── img3.jpg
│   │   │   ├── img4.jpg
│   │   │   └── ...
│   │   ├── news/
│   │   │   ├── img5.jpg
│   │   │   ├── img6.jpg
│   │   │   └── ...
│   │   └── sports/
│   │       ├── img7.jpg
│   │       ├── img8.jpg
│   │       └── ...
│   │
│   ├── test/
│   │   ├── img9.jpg
│   │   ├── img10.jpg
│   │   └── ...
│   │
│   └── validate/
│       ├── img11.jpg
│       ├── img12.jpg
│       └── ...

Training

To make my life easier, I went with the CLI approach. To train, we need only one line of code specifying the path to the data directory, and I set the epoch to 500. I took reference from the official paralytics documentation and ran the following command.

yolo classify train model=yolov8n-cls.pt data=/home/surya/PycharmProjects/videoClassification/data/ epochs=500 batch=128 imgsz=128 workers=8 patience=10 cache=True device=0

Predict

After training, the model will be saved as best.pt in the runs/detect folder inside the project folder. I predicted the results of the model on a video that belonged to the class entertainment.

yolo task=classify mode=predict model=best.pt conf=0.25 source=1.mp4

As we can see in the above screenshot, entertainment has the highest accuracy with 100 percent. The model was trained well!

Deployment

The next step is to deploy the model. I chose Gradio, as it can deploy ML models with a few lines of Python code.

import gradio as gr
import cv2
import numpy as np
from ultralytics import YOLO
from timeit import default_timer as timer
import os

# catgories
format = {   0: 'entertainment',
             1: 'music',
             2: 'news',
             3: 'sports'}

# returning classifiers output
def video_classifier(inp):
    model = YOLO("best.pt")

    result = model.predict(source=inp)
    probs = result[0].probs
    max_tensor = max(probs)
    tensor_pos = ((probs == max_tensor).nonzero(as_tuple=True)[0])

    return format.get(int(tensor_pos))

# gradio code block for input and output
with gr.Blocks() as app:
    gr.Markdown("## Video classification using Yolov8")
    with gr.Row():
        inp_video = gr.Video()
        out_txt = gr.Textbox()
    btn = gr.Button(value="Submit")
    btn.click(video_classifier, inputs=inp_video, outputs=out_txt)



app.launch()

Demo

I tested the demo on a video belonging to the class entertainment, and it worked perfectly by correctly detecting the label. I also tested for other classes, and it was working as expected.

Some Additional Things to Take Into Consideration

Follow the directory structure as given in the stick figure; otherwise, you might run into errors.
In this particular case, I examined the video data that was to be trained and noticed that any YouTube video that had the word “sports”, “news”, “music”, and “entertainment” in their title was scraped out even though it was not actually about the same topic. So, you might also have to consider other ways of scraping video data.
In case there is a class imbalance, perform data augmentation on the frames or apply more weights to the less represented classes. Other methods also include sampling techniques.
You might also want to refer to the official documentation of Ultralytics to tune the hyperparameters while training.

Conclusion

In this tutorial, we went through a detailed overview of how a classification process can be executed on videos and deployed successfully.

I would like to thank Comet and their team for giving me this wonderful opportunity to showcase my projects. This enables me to teach the community about the latest state-of-the-art architectures and learn from them by coming out of my comfort zone. Stay tuned for more posts, and happy reading!

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.