Neural Style Transfer (NST)

Using Deep Learning Algorithms to Perform Image Style Transfer

Published in

Heartbeat

9 min readApr 4, 2023

Artificial Intelligence (AI) has revolutionized the way computer vision and deep learning algorithms create stunning visuals. One of the fascinating applications of AI is Neural Style Transfer (NST). NST allows users to transfer the style from one image to another, thus generating a new artwork that combines the content from one image and the style from another.

It may seem magical, but this deep learning technique is effective for various applications such as artistic expression, product design, artistic rendering, image enhancement, and image synthesis, to mention a few. With lots of potentials, many developers and AI professionals are interested in incorporating image style transfer using NST into their projects.

This comprehensive article will explain the fundamentals of neural style transfer (NST), provide an overview of the techniques used for performing NST, and discuss some best use cases for the application. Let’s dive in!

Fundamentals of Neural Style Transfer

Neural style transfer (NST) is a deep learning algorithm that combines the content from one image and the style from another to generate unique artwork. It creates an intermediate representation of both images using a convolutional neural network (CNN). This CNN is trained on massive datasets of images and detects the content and style information from both images. After that, it combines these two representations to create a new image with a combination of the two styles.

Essentially, NST uses art as inspiration to generate novel artistic expressions combining content and style. It is based on the idea that given two input images, one containing content (i.e., a photo of a person) and the other containing an aesthetic style (i.e., “abstract”), it can generate a new image which preserves the content from the first image while expressing the style from the second one.

Background

Image style transfer has a long history dating back to the early 20th century with the development of cubism and futurism art movements. However, it wasn’t until the emergence of deep learning that the process became more accessible and efficient. Many people trace the modern development of NST to the detailed research work by Gatys et al.

In their paper titled “Image Style Transfer Using Convolutional Neural Networks,” the scholars detailed how to use convolutional neural networks to combine the content of one image with the style of another. The research shows that it is possible to create an entirely new image by modifying the convolutional neural network weights of a content image to match those of a style image.

Since the paper’s publication, various artists and developers have sought ways to perfect the NST process. Today, there are multiple algorithms and packages available for performing style transfer on images, regardless of your level of experience.

Introducing the Comet AI art gallery — a public forum to log experiments, test different parameters, and share your AI-generated art! Learn more about our integration with Gradio to create this one-of-a-kind space.

How to Perform Image Style Transfer Using NST

Performing image style transfer using Neural Style Transfer (NST) is a relatively simple process, but it involves several steps. Also, you will need to have the following resources at hand:

A content image: This is the image that we want to stylize.
A style image: This is the image whose style we want to transfer to the content image.
A pre-trained convolutional neural network (CNN): This is used to extract features from the content and style images.
An optimization algorithm: This is used to optimize the loss function and generate the stylized image.
Code text editor: You will need a code text editor, like VS Code, Atom, or Sublime Text, to write and initiate your python commands.

Step-by-Step Application

After gathering all the necessary resources, you can proceed with the NST application process. Here’s a brief overview of what you need to do:

Load the pre-trained CNN

We will use the VGG-19 CNN, pre-trained on the ImageNet dataset. VGG-19 is a deep CNN architecture that consists of 19 layers, including convolutional layers, pooling layers, and fully connected layers. It has been shown to perform well on image classification tasks.

We will use the pre-trained CNN to extract features from the content and style images. The features will compute the content loss and style loss for the content and style images, respectively.

We can load the VGG-19 CNN using the following python code:

import torch
import torch.nn as nn
import torchvision.models as models

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the VGG-19 CNN
vgg = models.vgg19(pretrained=True).features

# Set the device to run on
vgg.to(device)

# Set the model to evaluation mode
vgg.eval()

Preprocess the input images

Before extracting features from the input images, we need to preprocess them. This involves resizing the images to a standard size and normalizing the pixel values.

We can define a function to preprocess the input images as follows:

import torchvision.transforms as transforms
from PIL import Image

# Define the image preprocessing function
def preprocess_image(image_path, img_size):
    # Open the image
    img = Image.open(image_path).convert('RGB')
    # Resize the image
    transform = transforms.Compose([
        transforms.Resize(img_size),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

After defining the image preprocessing function, we can use it to preprocess the content and style images. We can load the images using the Image.open() function and then preprocess them using the preprocess_image() function.

# Define the image sizes
content_size = 512
style_size = 512

# Define the paths to the content and style images
content_path = "content.jpg"
style_path = "style.jpg"

# Preprocess the content image
content_img = preprocess_image(content_path, content_size).unsqueeze(0).to(device)

# Preprocess the style image
style_img = preprocess_image(style_path, style_size).unsqueeze(0).to(device)

Define the content loss

The content loss is a measure of how much the stylized image differs from the original content image. We will use the mean squared error (MSE) loss to measure the difference between the features of the content image and the features of the stylized image.

To define the content loss, we will extract the features from a convolutional layer of the pre-trained CNN using the content and stylized images. We will then calculate the MSE loss between the feature maps of the two images.

# Define the content loss function
def content_loss(content_features, stylized_features):
    # Compute the mean squared error (MSE) loss between the content features and stylized features
    mse_loss = nn.MSELoss()
    loss = mse_loss(content_features, stylized_features)
    return loss

Define the style loss

The style loss is a measure of how much the stylized image matches the style of the style image. We will use the Gram matrix of the features to measure the style loss. The Gram matrix is a matrix that captures the correlation between the different feature maps.

To define the style loss, we will extract the features from several convolutional layers of the pre-trained CNN using the style image and the stylized image. We will then calculate the Gram matrix of the features and use the MSE loss to measure the difference between the Gram matrices of the two images.

# Define the style loss function
def gram_matrix(features):
    # Reshape the features to be a 2D matrix
    batch_size, channels, height, width = features.size()
    features = features.view(batch_size * channels, height * width)
    
    # Compute the Gram matrix of the features
    gram = torch.mm(features, features.t())
    
    # Normalize the Gram matrix by dividing by the number of elements
    return gram.div(batch_size * channels * height * width)

def style_loss(style_features, stylized_features):
    # Compute the Gram matrix of the style features and the stylized features
    style_grams = [gram_matrix(feature) for feature in style_features]
    stylized_grams = [gram_matrix(feature) for feature in stylized_features]
    
    # Compute the mean squared error (MSE) loss between the style Gram matrices and the stylized Gram matrices
    mse_loss = nn.MSELoss()
    loss = 0
    for style_gram, stylized_gram in zip(style_grams, stylized_grams):
        loss += mse_loss(style_gram, stylized_gram)
    return loss

Define the total loss

The total loss is a combination of the content loss and the style loss. We will use a weighting factor to balance the importance of the content and style information. You can define the total loss with the following code:

def total_loss(content_features, style_features, stylized_features, content_weight, style_weight):
    # Calculate the content loss
    content_loss_value = content_loss(content_features, stylized_features)
    
    # Calculate the style loss
    style_loss_value = style_loss(style_features, stylized_features)
    
    # Combine the content loss and the style loss using weighting factors
    total_loss_value = content_weight * content_loss_value + style_weight * style_loss_value
    
    return total_loss_value

Training the model

Now that we have defined the content loss, style loss, and total loss functions, we can train the model to produce the stylized image. We will use the Adam optimizer to minimize the total loss. Adam optimizer is a popular optimization algorithm used in deep learning for gradient-based optimization of neural network weights. It is an extension of the stochastic gradient descent (SGD) algorithm that adapts the learning rate based on past gradients.

# Define the optimizer and learning rate
optimizer = optim.Adam([stylized_img], lr=0.01)

# Define the number of training iterations
num_iterations = 2000

# Define the weighting factors for the content and style losses
content_weight = 1
style_weight = 100

# Train the model
for i in range(num_iterations):
    # Zero out the gradients
    optimizer.zero_grad()
    
    # Extract the features from the pre-trained CNN for the content, style, and stylized images
    content_features = model(content_img)
    style_features = model(style_img)
    stylized_features = model(stylized_img)
    
    # Calculate the total loss
    loss = total_loss(content_features, style_features, stylized_features, content_weight, style_weight)
    
    # Backpropagate the gradients
    loss.backward()
    
    # Update the weights
    optimizer.step()

Display the stylized image

After training the model, we can display the stylized image using the deprocess_image() function:

# Deprocess the stylized image
stylized_img = deprocess_image(stylized_img.squeeze(0))

# Display the stylized image
plt.imshow(stylized_img)
plt.show()

After completing these processes, you should have a fully rendered image that combines the style and content images. The image should be high-resolution and accurately show each element’s different textures and hues.

Various Use Cases of Neural State Transfer

Many successful applications of neural state transfer have been developed and implemented in practical settings. Some of the most popular use cases include:

Artistic Style Transfer

Neural Style Transfer is widely used in the field of digital art to create visually stunning and creative images. Artists and designers can use this technique to transform ordinary photographs into artwork inspired by the style of famous artists such as Vincent van Gogh, Pablo Picasso, and Claude Monet.

Advertisers are increasingly using neural style transfer to create visually appealing advertisements. This technique can help capture potential customers’ attention and create a strong impression that encourages people to take action.

Fashion

Neural style transfer is also used in the fashion industry to create fashionable designs incorporating elements from different styles. This technique can generate beautiful and unique clothes, shoes, and other accessories designs.

Medical Imaging

Medical imaging is a field that heavily relies on neural state transfer algorithms to produce accurate results. These algorithms can help to detect abnormalities, diagnose diseases, and generate images that are helpful in the diagnosis of medical conditions.

Video Games and Film

Style transfer is also ideal for creating video games and films with visually stunning and unique worlds. By blending different styles, game and film developers can create immersive and memorable experiences for their audiences.

Conclusion

Neural Style Transfer is a fascinating topic in the field of computer vision, and it has opened up new possibilities for the creative use of images. By following the steps outlined in this article, you can easily implement NST and create stunning images with your own style. We hope this article has given you a good understanding of NST and inspired you to explore its potential further.

Resources

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.