Person Classification on CCTV Video

4 min readJun 22, 2021

CCTV videos are used to monitor and ensure its safety. It could be installed on public places, convenient stores, and even our own house. One issue of CCTV video is that it is only useful to identify suspects after the incident already happened. If you want to install a system that could prevent crime from happening, maybe you should consider to add some trip wire, motion sensor with alarm, or other types of sensors.

One idea to enhance our security system, especially on CCTV videos is by adding new feature so it could detect if there is an unidentified person in your room and send a notification to house’s owner or ring an alarm so it would help to reduce risk and losses.

Image Classification

This can be achieved by implementing an image classification algorithm on the system. Image classification is one of machine learning application on computer vision’s field. Given a frame or an image which contains some object, the algorithm could print out and classify that object.

In this project, I will create a person classification algorithm using Keras Deep Learning models and implement it on a video to test it. I use Google Colaboratory platform to run this project. Here is the link to the repository.

There are 4 steps that I took to implement this algorithm:

1. Build the neural network model using Transfer Learning method.

2. Collect the dataset.

3. Train the model.

4. Test the model.

Build a model

I use transfer learning method from InceptionV3 in this project. InceptionV3 is one of convolutional neural networks that are commonly used for assisting image analysis. The reason why I chose to use transfer learning is to minimize training time and increase model’s accuracy since it already has pretrained-convolutional layers to extract and break down image features.

Here is a snippet of code to import the InceptionV3 into the repository.

I set the input image size into (256,256,3) meaning that the image should have 256x256 pixels and have a color in RGB format. I only use InceptionV3 until the ‘mixed7’ layer. I then connect it into flattened layer and 1024 units hidden layer.

I use dropout rate of 20% to reduce overfitting. I also use single sigmoid layer as an output since it is a binary classification. Then, the model is compiled using RMSprop as optimizer, binary cross-entropy loss, and accuracy as a metric.

Dataset

I use this dataset from Kaggle, containing various places recorded from CCTV. It has 507 images of human or no human dataset. I divide the dataset into 80% training and 20% validation. Below is the directory structure of the dataset.

The dataset must be uploaded into the repository. For easier use, I uploaded it into my google drive and mount my notebook into my google drive. Both the training and validation dataset then processed to become a data generator. Using a data generator, we could add some augmentations and it would be loaded easily during the training process.

Training

I do the training for 20 epochs with this command.

history = model.fit(train_generator, validation_data = validation_generator, epochs = 20, verbose = 2)

Here are the results.

The model tends to become overfit due to its validation accuracy is below the training accuracy at some point. It is expected because I only have a few datasets. It could be improved by getting more data of CCTV footage or maybe tweak some parameters in the training process.

Run on video file

Video is actually a sequence of frames that runs through certain period of time. We could implement this person classification model by inspecting it on each frame. It could be done by open-cv library provided in Python. We also need to pre-process the frame so it matched our input size specifications, which is 256x256 pixels.

I test my algorithm on CCTV footage I found on Youtube at this link. I download the video and upload it into my repository. Below is snippet of the code that I used to process the video and also the result.