Skip to content

PyTorch Tutorial: Develop Deep Learning Models with Python

Introduction to PyTorch A Complete Overview Cover Image

In this tutorial, you’ll learn how to use PyTorch for an end-to-end deep learning project. Learning PyTorch can seem intimidating, with its specialized classes and workflows – but it doesn’t have to be. This tutorial will abstract away the math behind neural networks and deep learning. Instead, we’ll focus on learning the mechanics behind how to implement deep learning in Pytorch!

By the end of this guide, you’ll have learned the following:

  • What the deep learning workflow looks like in PyTorch
  • How to load data into PyTorch Datasets and batch it with DataLoaders
  • How to define deep learning classes using PyTorch
  • How to create and use optimizers and criterion (loss functions)
  • How to build a training and validation loop
  • How to save, serialize, and load deep learning models
  • How to use models to make predictions (inferences) using PyTorch

We have a lot to learn! Let’s dive right in!

Understanding the PyTorch Deep Learning Workflow

In general, most deep learning projects follow a similar process, regardless if you’re working on tabular data, sequential data (like text), or vision projects (such as images). The nuances of the particular task may vary, but the overall approach is the same.

In general a deep learning project in PyTorch follows the steps below:

  1. Loading and preparing the data: using DataSets and DataLoaders, PyTorch makes it simple to load, transform, and batch your data
  2. Building a model: PyTorch relies on an object-oriented approach to define your models, making it easy to structure your projects
  3. Fitting the model (training) and validating the results: by using training and validation loops, PyTorch lets you access data and use it to fit your model
  4. Making predictions: by using an aptly-named inference mode, you can make predictions using new data
  5. Saving and loading the model: PyTorch provides a simple framework for serializing your models and loading them in a memory-efficient way

Throughout this tutorial, you’ll learn how to use this workflow to run through your first deep learning project. Let’s get started by creating some data and defining it in a PyTorch DataSet class.

Creating a PyTorch Dataset for Easy Loading

PyTorch uses a Dataset class to define ways in which to represent your data. These datasets inherit from the torch.utils.data.Dataset class and provide you with great functionality, such as integration with DataLoaders (which you’ll learn about soon).

In short, PyTorch Datasets provide the following benefits:

  1. Data Loading: The Dataset class provides a simple and intuitive way to load your data from various sources such as files, databases, or APIs. Furthermore, it allows you to use encapsulation to load your data in a single class, making it easier to manage and reuse.
  2. Data Preprocessing: With the Dataset class, you can define custom preprocessing operations on your data. This includes tasks such as data augmentation, normalization, scaling, or any other transformations required to prepare your data for training or inference. Again, because it’s abstracted away in a class, this process can be hidden away from end-users.
  3. Data Access and Indexing: The Dataset class enables you to access individual data samples by their index. It provides a unified interface to retrieve samples, which is useful during training or when evaluating the model’s performance.
  4. Integration with Data Loaders: The Dataset class is designed to work easily with PyTorch’s DataLoader class, which handles efficient and parallel data loading. The DataLoader takes a Dataset instance as input and provides options for batch size, shuffling, and parallelism.
  5. Customizability: By subclassing the Dataset class, you can create your own custom dataset with specific functionality tailored to your task. This allows you to implement any specialized behavior required for your data, such as handling multi-modal inputs, complex labeling schemes, or handling imbalanced datasets.
  6. Integration with PyTorch Ecosystem: PyTorch’s ecosystem provides a wide range of tools and libraries that are compatible with the Dataset class. For example, you can use popular libraries like torchvision or torchaudio to directly load common datasets, or use third-party libraries that offer pre-processing functions compatible with the Dataset class.

Now that you have a good sense of the benefits that PyTorch DataSets provide, let’s begin by loading some data. We’ll then move into creating a PyTorch DataSet.

# Loading a Sample Dataset
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=10000, n_features=3, random_state=123)

In the example above, we used the make_blobs() function to generate three different clusters, along three dimensions. We asked the function to create 10,000 different values. The function returns a tuple of data, containing our feature vector, X, and our target y.

Let’s now see what this data looks like by plotting it using a 3D scatterplot in Matplotlib.

# Plotting the Data
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter3D(X[:, 0], X[:, 1], X[:, 2], c=y)
plt.show()

In the code block above, we used Matplotlib to plot our dataset. Let’s see what it looks like:

Dataset for PyTorch Tutorial

We can see that we have three different clusters of data, which we’ve broken out by color. This allows us to visualize how the data are broken out. We’ll be a little cheeky here and refer to the data by their blobs of color, meaning that we have yellow blobs, teal blobs, and purple blobs.

Let’s now see how we can create a PyTorch DataSet. The main requirements of a PyTorch dataset is that they implement the following:

  • A custom __len__() method that returns the length of the dataset, and
  • A custom __getitem__() method that allows you to index an item

Let’s implement our first Dataset now!

# Creating a PyTorch Dataset
from torch.utils.data import Dataset

class BlobDataset(Dataset):
    def __init__(self, features, targets):
        super().__init__()
        self.features = torch.from_numpy(features).type(torch.float32)
        self.targets = torch.tensor(targets, dtype=torch.long)

    def __len__(self):
        return len(self.features)

    def __getitem__(self, index):
        return self.features[index], self.targets[index]

Let’s break down what we did in the code block above:

  1. We imported the Dataset class, allowing us to subclass our own dataset class
  2. We then define our dataset, inheriting from the Dataset class
  3. We use the super().__init__() method to initialize the parent class
  4. We declare two class attributes, features and targets, to define our features and targets. Note that we’re converting our values here to PyTorch tensors of specific data types
  5. We declare our __len__() method that returns the length of the dataset
  6. We declare our __getitem__() method that allows us to index a value. The method returns a tuple containing our features and targets.

Wow. Ok. That class actually does quite a lot! Importantly, it helps abstract away a lot of the complexity of creating and manipulating our data.

It might seem like a lot to create a specific class containing our data. However, it allows you to be more confident in separating the intents between loading data and training your model.

So far, we haven’t actually created our dataset, but rather just defined it. Let’s now create the dataset and see what we can do with it.

# Creating Our Dataset
dataset = BlobDataset(X, y)
print(dataset)

# Returns:
# <__main__.BlobDataset object at 0x7fe49f656770>

By loading our dataset using our variables X and y we instantiated our dataset. By printing it, we can see that it returns the object.

Now, let’s take a look at how we can use the methods that we previously defined to get more information about our dataset. Let’s start by taking a look at the length of the dataset:

# Printing the Length of the Dataset
print(len(dataset))

# Returns: 10000

We can see that, as expected, our dataset is 10,000 records long. Now, let’s see how we can index the dataset by accessing the first item in the dataset:

# Indexing Items in Our Dataset
print(dataset[0])

# Returns: (tensor([ 5.1478, -3.7830, -4.8408]), tensor(0))

We can see that by indexing the dataset that we’re able to access an individual item. What’s great about this is that it allows us to also slice the dataset.

Let’s now take a look at how we can split the dataset into training and testing datasets.

Using PyTorch to Split Data into Training and Validation Data

Splitting your dataset into training and testing partitions is an important step in being able to ensure your training isn’t overfitting. The popular Scikit-Learn library provides the train_test_split() function to split your dataset.

However, PyTorch itself provides a helpful function for splitting your dataset. I prefer this approach as it allows you to keep working within the PyTorch ecosystem.

Let’s see how we can use PyTorch to split our dataset into two:

# Splitting a Dataset into Training and Testing
from torch.utils.data import random_split
train, test = random_split(dataset=dataset, lengths=[0.8, 0.2])

In the code block above, we imported the random_split function. The function works by passing in a dataset and the lengths we want to use.

The function gives you two options to define lengths:

  1. Passing in floats that add up to 1.0, which define the proportions for the splits,
  2. Passing in integers, which define the number of records to use in each partition

Personally, I prefer using the floats, since it allows you to more easily understand how the data are split proportionally.

Let’s now take a look at how we can use PyTorch DataLoaders to efficiently batch data.

Using PyTorch DataLoaders for Batching Data Efficiently

PyTorch DataLoaders are often used in combination with Datasets, which provides efficient access to data loading and batching. Because working with deep learning often requires massive amounts of data, being able to batch data allows you to work more efficiently.

In particular, PyTorch DataLoaders provide the following benefits:

  1. Batching: DataLoader allows you to automatically create batches of data from your Dataset. Batching is crucial for training machine learning models as it enables parallel processing and efficient memory utilization. With DataLoader, you can specify the batch size, and it will automatically generate mini-batches of data for training or inference.
  2. Data Shuffling: DataLoader supports shuffling of data, which is important to prevent any bias or order dependency in the training process. Shuffling the data helps to ensure that the model does not learn from any sequential patterns present in the dataset. You can specify the shuffle parameter while creating a DataLoader to randomize the order of data samples.
  3. Parallel Data Loading: DataLoader provides an option to load data in parallel using multiple workers. This feature can significantly speed up the data loading process, especially when dealing with large datasets or complex data preprocessing. By specifying the num_workers parameter, you can leverage multiple CPU cores to load and preprocess data simultaneously.
  4. Iterability: DataLoader is an iterable, meaning you can use it in a for loop to iterate over the data batches. This allows for easy and clean code implementation during training or evaluation. You can iterate over the DataLoader object and obtain batches of data, which can be directly fed to the model for processing.
  5. Data Transformation: DataLoader supports applying data transformations on-the-fly during the data loading process. You can define a series of transformations using the transforms parameter to perform operations like data augmentation, normalization, resizing, or any other custom preprocessing tasks. The transformations are applied to each batch of data before it is returned by the DataLoader.
  6. Compatibility with GPUs: PyTorch seamlessly integrates with GPUs, and DataLoader supports loading batches of data directly onto the GPU memory. By setting the pin_memory parameter to True and using a CUDA-enabled device, DataLoader can efficiently transfer the data batches to the GPU, minimizing the data transfer overhead during training.
  7. Integration with Training Loop: DataLoader simplifies the integration of data loading with the training loop. You can easily combine DataLoader with other PyTorch components, such as loss functions, optimizers, and model training loops, to create an end-to-end training pipeline.

We can see that the DataLoader class provides a huge slew of benefits! Let’s see how we can define a data loader, by defining its batch size and more.

# Creating DataLoaders
from torch.utils.data import DataLoader
batch_size = 128
train_loader = DataLoader(train, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test, batch_size=batch_size, shuffle=True)

In the example above, we defined two DataLoaders. We passed in our datasets, a batch size, and instructed PyTorch to shuffle the data. Notice that we defined the batch size separately. This is good practice, as it allows you to keep your configurations separate and easily tracked.

Because these items are defined objectives in themselves, we can access specific attributes, such as the batch size. Let’s print out the batch size of our training loader:

# Printing the Batch Size of a DataLoader
print(train_loader.batch_size)

# Returns: 128

Because the DataLoaders themselves offer the ability to iterate, we can inspect a batch by passing it into the iter() function. Let’s see how we can inspect one of our batches:

# Iterating over a DataLoader
data_iter = iter(train_loader)
first = next(data_iter)
print(first[0])

# Returns:
# [ 1.8840e+00,  3.0326e+00, -2.5314e+00],
# [-1.3279e-01,  3.9492e+00, -2.8709e+00],
# [ 3.2058e+00, -5.2252e+00, -5.4700e+00],
# [ 3.3243e+00, -4.1180e+00, -4.5077e+00],
# [ 9.6662e+00,  2.3717e+00, -1.2761e+00],
# ...

In the code block above, we created a variable first, which contains the first batch of data. We printed out the first item in that loader, which represents our batch’s features. I truncated the result, as it contains 128 records (our batch size!).

Similarly, we can access the targets by indexing the second item in the variable:

# Accessing the Targets of the Batch
print(first[1])

# Returns:
# tensor([0, 2, 0, 0, 2...

Similar to the previous example, I have truncated the output for clarity. Keep in mind, that this will return the same number of records as our batch size!

So far, we have defined our dataset structure (using a PyTorch Dataset) and provided mechanisms by which to iterate over our data in efficient batches (using PyTorch DataLoaders). Let’s now dive into how to define a neural network, using PyTorch!

Defining a PyTorch Neural Network Class

In order to define a neural network in PyTorch, the recommended approach is to define yet another class. The class itself will inherit from the nn.Module class.

This class allows you to focus on the architecture of your model while abstracting away many of the complexities. The nn.Module class is actually a Python abstract base class, which as you’ll learn requires implementing certain aspects!

Let’s dive into exploring some of the benefits of using the nn.Module class to build your neural networks:

  1. Model Organization: The nn.Module class provides a convenient way to organize your model’s architecture. You can define different layers, operations, and parameters as attributes within your custom module class. This allows for a modular and hierarchical representation of your neural network.
  2. Parameter Tracking: The nn.Module class automatically tracks and manages the learnable parameters of your model. By using PyTorch’s Parameter class, you can define model parameters within your module’s attributes. These parameters are registered and can be accessed and updated during training or inference.
  3. Forward Propagation: The nn.Module class requires you to define a forward method, which specifies how the input data flows through the layers of your model. By implementing the forward method, you define the computation graph of your neural network, enabling the model to perform forward propagation during training or inference.
  4. Automatic Differentiation: PyTorch’s automatic differentiation capability is seamlessly integrated with nn.Module. The nn.Module class keeps track of the operations performed on the input data within the forward method. This allows you to compute gradients using backpropagation with the backward method, making it easier to train your model and update the learnable parameters.
  5. Model Serialization: The nn.Module class provides functionalities for model serialization and loading. You can save the state of a model (including the architecture and learned parameters) to disk using PyTorch’s serialization utilities. This allows you to save and load models for future use, sharing with others, or deployment in production environments.
  6. Pre-built Layers and Modules: PyTorch’s nn module provides a variety of pre-built layers and modules that you can use within your custom nn.Module class. These include common layers like convolutional layers, linear layers, recurrent layers, activation functions, loss functions, and more. By leveraging these pre-built components, you can easily construct complex neural network architectures.
  7. Parallel and Distributed Computing: PyTorch’s nn.Module class is designed to support parallel and distributed computing. You can use PyTorch’s DataParallel or DistributedDataParallel wrappers to parallelize your model across multiple GPUs or multiple machines, respectively. This can significantly accelerate training and inference for large-scale models.

Now that you have a strong understanding of why you should use the nn.Module class, let’s dive into how to architect your model. Keep in mind, this isn’t a tutorial on the best architecture, but rather the workflow overall. Let’s see how we can build our model:

# Building a PyTorch Neural Network
import torch.nn as nn

class BlobClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)

        return x

In the code above, we defined a fairly simple model, implementing both an __init__() method and a forward() method. Recall that the nn.Module class is an abstract base class – this requires us to define a forward method.

In the model, we define two linear layers and use the rectified linear function (ReLU) as our activation function. We expect the model to have three inputs:

  1. input_size, which identifies the dimensionality of our input,
  2. hidden_size, which defines how many hidden layers we want in our model, and
  3. output_size, which defines the dimensionality of our targets

Let’s see how we can instantiate our model using our data:

# Instantiating Our Model
input_size = len(X[0])
hidden_size = 128
output_size = len(set(y))

model = BlobClassifier(input_size, hidden_size, output_size)
print(model)

# Returns:
# BlobClassifier(
#   (fc1): Linear(in_features=3, out_features=128, bias=True)
#   (fc2): Linear(in_features=128, out_features=3, bias=True)
#   (relu): ReLU()
# )

In the code block above, we first declared our parameters. We were able to use our dataset X and y to help define these:

  • The input size is simply the dimensionality of our features, which can be calculated using the length of any record (in this case the first)
  • The hidden size is a hyperparameter, which we set to a conventional 128
  • Finally, the output size is the number of different features, which in this case is 3. We can define this by calculating the length of the set of values in our target vector.

We passed these parameters into the BlobClassifier class, creating a model object. When we print it, the high-level architecture is displayed, allowing us to see the different layers of the model.

The last step before we train our model is to establish our optimizers and criterion. Let’s dive into this in the following section.

Using PyTorch Optimizers to Update a Model’s Parameters

PyTorch uses objects called optimizers to handle gradient descent calculation and parameter optimization. Let’s explore this in a bit more details:

  1. Parameter Updates: The primary purpose of an optimizer is to update the parameters of the neural network based on the computed gradients during backpropagation. The optimizer takes care of adjusting the parameter values according to the specified optimization algorithm, such as stochastic gradient descent (SGD), Adam, RMSprop, etc. This ensures that the model converges towards the optimal set of parameters that minimize the loss function.
  2. Gradient Computation: The optimizer works in conjunction with PyTorch’s automatic differentiation engine. It leverages the computed gradients of the loss function with respect to the model parameters. The optimizer extracts and applies these gradients to update the parameters effectively. This eliminates the need to manually compute and update gradients, saving time and effort.

While it’s technically possible to update parameters and gradients directly in PyTorch, this process becomes difficult to manage once you have more than a handful of parameters. To put this into perspective, our simple model we defined earlier has over 1,000 parameters!

Let’s see how we can define an optimizer in PyTorch:

# Defining an Optimizer in PyTorch
import torch.optim as optim

learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

In the code block above, we imported the optim module, choosing to aliase it to make it easier to call. This also follows PyTorch convention, making it easier for others to read your code. Then, we defined our learning rate to be 0.0001. This is another tuneable hyperparameter that you can use to modify how our model is fitted.

Finally, we create our optimizer by instantiating an Adam object, passing in our model’s parameters and the learning rate. This allows the optimizer to update the parameters as we train our model.

Using PyTorch Criterion to Measure Loss

The PyTorch Criterion (or Loss) class is used to compute the loss or objective function during training a neural network. It plays a crucial role in training models and offers several benefits:

  1. Loss Computation: The primary purpose of a criterion is to compute the loss or objective function that measures the discrepancy between the model’s predictions and the ground truth labels. The criterion takes the predicted outputs of the model and the target labels as input and calculates the loss value. The loss value provides a quantitative measure of how well the model is performing on the given task.
  2. Backpropagation and Gradients: The criterion is essential for backpropagation, which is used to compute gradients and update the model’s parameters during training. By computing the loss, the criterion enables the gradients to flow backward through the network, allowing for efficient parameter updates through gradient descent optimization algorithms. The gradients are computed with respect to the model’s parameters, and the criterion facilitates this calculation.
  3. Various Loss Functions: PyTorch provides a wide range of built-in loss functions through the Criterion class. These include common loss functions such as mean squared error (MSE), binary cross-entropy, categorical cross-entropy, and more. Each loss function has its own characteristics and is suited for different types of tasks, such as regression, classification, or semantic segmentation. Using the appropriate loss function is crucial for training models effectively on specific tasks.

Similar to the optimizer, we can calculate the loss of a model manually by defining our own functions. In many cases, the tight integration that PyTorch optimizers have with the broader ecosystem makes them a much better fit.

Let’s see how we can define a loss function using a PyTorch criterion. Since we have a multi-class classification problem, we’ll use cross-entropy loss to calculate the loss of our model during training and validation.

# Defining a PyTorch Criterion
criterion = nn.CrossEntropyLoss()

In the code block above, we defined our criterion using the CrossEntropyLoss class from the nn module. In this case, we instantiated the object without any non-default parameters. However, the class provides different parameters you can use to customize its behavior.

Moving Between CPU and GPU (CUDA) Using PyTorch

Finally, before diving into how to actually train our model, let’s cover one final thing: moving between a CPU and a GPU using PyTorch. PyTorch makes this very straightforward and abstracts away many of the complexities.

In PyTorch, you can move different objects (such as batches of data, models, and more) to a respective device by using the .to() method. In order to find out what device you have available, you can use the code block below:

# Defining a Device in PyTorch
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In the code block above, we defined a new variable, device. This uses the ternary operator to determine whether a GPU is available. If it is, it uses the string 'cuda', otherwise 'cpu'. From there we can simply move items to the provided device by applying the .to() method.

Now that we have the foundational elements for our neural network model fitting, let’s dive into creating a training loop for our model.

Creating a PyTorch Training Loop to Train Your Model

PyTorch provides a ton of flexibility in how to fit your model. In practice, this is most often done by creating a Python for loop and iterating for a set number of times. In deep learning terminology, you fit a model for a set number of epochs. An epoch is defined as allowing your model to see the entire dataset one full time.

In practice, training on the entire dataset at once can be computationally expensive and memory-intensive, especially for large datasets. Mini-batch training addresses this by dividing the dataset into smaller subsets called mini-batches. Each mini-batch contains a fixed number of samples (e.g., 32, 64, or 128) randomly sampled from the dataset. The model performs forward and backward computations on each mini-batch separately, updating the parameters based on the gradients computed for that mini-batch.

This works in tandem with some of the objects we defined earlier:

  • A dataset is analogous to an epoch,
  • A DataLoader allows you to create batches of data by defining a batch size

Let’s take a look at the standard structure of how a model is trained using a for loop in PyTorch:

# Defining a Basic Training Loop in PyTorch
num_epochs = 50

for epoch in range(num_epochs):
    train()

In the code block above, we created a very simple for loop. We defined a number of epochs and then used the range function to iterate over our train() function a set number of times.

You may have noticed that we called a function named train(). But, this function doesn’t yet exist. That’s the next part of the puzzle, so let’s get started on defining the function.

In defining our train() function, we will work with a number of different objects that we’ve defined so far:

  • model, our instantiated model that contains the parameters and architecture as we’d defined it
  • train_loader, which allows us to iterate over our training dataset in a shuffled, batched format
  • criterion, which defines how we measure the model’s loss as it’s being trained
  • optimizer, which allows PyTorch to easily update our model’s parameters

We’ll define a few more parameters to make our training loop a little more informative. In particular, we’ll define:

  • The interval at which we want to print training information, print_every
  • The current epoch and the total num_epochs to measure progress

Let’s see what our function looks like. It’s going to be long and there’ll be some interesting stuff happening, but don’t be intimidated!

# Creating our Training Function
def train(model, train_loader, criterion, optimizer, print_every, epoch, num_epochs, train_loss_values):
    model.train()
    train_loss = 0.0

    for batch_idx, (inputs, targets) in enumerate(train_loader):
        inputs = inputs.to(device)
        targets = targets.to(device)
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        train_loss += loss.item()

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        if batch_idx % print_every == 0:
            print(f'Epoch [{epoch + 1:03}/{num_epochs:03}] Batch [{batch_idx+1:03}/{len(train_loader):03}], Train Loss: {train_loss/print_every:.4f}')

            train_loss = 0.0

At the beginning of the function, the model is set to training mode using model.train(). This ensures that certain layers within the model, such as dropout or batch normalization, behave correctly during the training process. The variable train_loss is initialized to keep track of the accumulated training loss for the current epoch.

The function then enters a loop that iterates over the mini-batches of the training data. Each mini-batch consists of a batch of input data and their corresponding target labels. The inputs and targets are moved to the appropriate device, such as a GPU, using inputs = inputs.to(device) and targets = targets.to(device) for faster computation if available.

Before computing the gradients in the backward pass, optimizer.zero_grad() is called to clear the gradients of the model’s parameters. This step ensures that the gradients are not accumulated from previous iterations. The model performs a forward pass by passing the input data through it to generate predictions. The loss between the predicted outputs and the target labels is computed using the specified loss function (criterion).

The current batch’s loss value is then added to the running total of the training loss for the current epoch using train_loss += loss.item(). This accumulation allows the function to keep track of the overall training loss for the epoch.

Backpropagation is performed next by calling loss.backward(), which computes the gradients of the loss with respect to the model’s parameters. Finally, the optimizer’s step() method is called to update the model’s parameters using the computed gradients. This step is crucial for optimizing the model and improving its performance.

To provide progress updates, the function checks if the current batch index is a multiple of the specified print_every value. If it is, a progress update is printed, displaying the current epoch number, batch index, total number of batches, and the average training loss for the recent batches.

After printing the progress update, the train_loss variable is reset to zero in preparation for the next set of mini-batches in the next iteration.

Creating a PyTorch Validation Loop to Prevent Overfitting

Validation allows us to assess the model’s performance on a separate dataset that is not used for training. This dataset, often referred to as the validation set, consists of examples that are distinct from the training data. By evaluating the model on unseen data, we can gain insights into its generalization ability and understand how well it performs on real-world examples.

We can build this process into the training loop, which allows us to validate our model’s performance on unseen data on each epoch. Let’s see how we can build a validate function that we can run in the same loop:

# Defining a Validation Function
def validate(model, val_loader, criterion, device, val_loss_values):
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            inputs = inputs.to(device)
            targets = targets.to(device)

            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            val_loss += loss.item()

            # Compute accuracy
            _, predicted = torch.max(outputs.data, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()

    avg_loss = val_loss / len(val_loader)
    accuracy = correct / total

    print(f'Validation Loss: {avg_loss:.4f}, Accuracy: {accuracy * 100:.2f}%')

Let’s break down what the function does step by step. At the beginning of the function, the model is set to evaluation mode using model.eval(). This ensures that certain layers within the model, such as dropout or batch normalization, behave differently than during training. The variables val_loss, correct, and total are initialized to keep track of the accumulated validation loss, the number of correctly predicted samples, and the total number of samples in the validation set, respectively.

The function then enters a loop that iterates over the mini-batches of the validation data. Each mini-batch consists of a batch of input data and their corresponding target labels. The inputs and targets are moved to the appropriate device, such as a GPU, using inputs = inputs.to(device) and targets = targets.to(device) for faster computation if available.

During each iteration of the loop, a forward pass is performed by passing the input data through the model to generate predictions. The loss between the predicted outputs and the target labels is computed using the specified loss function (criterion) with loss = criterion(outputs, targets). The current batch’s loss value is added to the running total of the validation loss using val_loss += loss.item().

To compute the accuracy, the predicted labels are obtained by finding the maximum value along the second dimension of the output tensor using _, predicted = torch.max(outputs.data, 1). The total count of samples is incremented by the number of samples in the current batch using total += targets.size(0). The number of correctly predicted samples is calculated by comparing the predicted labels to the target labels and summing the matches with correct += (predicted == targets).sum().item().

After the loop, the average validation loss is computed by dividing the accumulated loss by the number of mini-batches (len(val_loader)). The accuracy is calculated by dividing the number of correctly predicted samples by the total number of samples. The validation loss and accuracy are then printed to provide an overview of the model’s performance on the validation dataset.

Overall, this validation function evaluates the performance of a trained model on unseen data. It computes the validation loss, tracks the number of correctly predicted samples, and calculates the accuracy. By analyzing the validation results, we can gain insights into the model’s ability to generalize to new data and make informed decisions regarding its performance and potential improvements.

Using Our Loop to Train the Model

Now that we have our functions defined, we can pass these into the loop we defined earlier. In some cases, you’ll see these functions implemented in the loop itself. However, I find it more intuitive to define these as separate functions. Let’s see what our loop looks like:

# Creating a Training / Validation Loop
num_epochs = 50
model.to(device)
for epoch in range(num_epochs):
    train(model, train_loader, criterion, optimizer, 500, epoch, num_epochs, train_loss_values)
    validate(model, test_loader, criterion, device, val_loss_values)
    print('\n----------------------\n')

In the code block above, we instruct Python to run 50 epochs. In order to do this, we first move the model to the device. From there, we run the train() and validate() functions. This allows us to see the progress through both the training and validation processes.

For example, the training loop will print the output below:

Epoch [001/050] Batch [001/063], Train Loss: 0.0034
Validation Loss: 0.0523, Accuracy: 100.00%
----------------------
Epoch [002/050] Batch [001/063], Train Loss: 0.0001
Validation Loss: 0.0209, Accuracy: 100.00%
----------------------
...
Epoch [050/050] Batch [001/063], Train Loss: 0.0000
Validation Loss: 0.0001, Accuracy: 100.00%

Once all the epochs are processed, the model is fitted. If you’re happy with the accuracy it provides, you can go ahead and save the model – which you’ll learn in the next section.

How to Save a PyTorch Model

PyTorch provides two main ways in which you can save your model:

  1. Saving the entire model using the torch.save() function and passing in the entire model
  2. Saving only the model’s learnable parameters and weights by saving only the state_dict of the model

What approach you use is up to you. Personally, I prefer saving the state dictionary. It’s often a smaller file size and it makes loading the model more intuitive since it requires you to also load the model’s architecture explicitly.

The state_dict is a Python dictionary object that contains the learnable parameters of the model (e.g., weights and biases) and other relevant information needed to reconstruct the model’s architecture and parameters.

By saving the state_dict, you separate the model’s architecture from its parameters. This means that you can define the model separately and load the saved state_dict into it, which is particularly useful when you have a pre-defined model architecture and want to load the learned parameters. Let’s see how we can save only the model’s state dictionary:

# Saving only the model's state dictionary
torch.save(model.state_dict(), 'file_path.p')

On the other hand, you can also save the entire model, including the architecture. Saving the entire model allows you to save the complete object, including the model’s architecture, parameters, and any other associated information (e.g., optimizer state). It captures the entire state of the model as a single object.

When you save the entire model, you can later load it directly into memory using torch.load() without explicitly defining the model’s architecture. This makes it convenient for inference, model transfer, or continuing training.

Let’s see how we can save the entire model:

# Saving the entire model
torch.save(model, 'file_path.p')

Now that you have learned how to save a model, let’s look at how to load the model.

How to Load a PyTorch Model

Being able to load a PyTorch model allows you to make use of your model for inference later on. As you learned in the previous section, there are two main approaches of working with saved models:

  1. Saving only the state_dict of the model, which includes only the learned weights and parameters
  2. Saving the entire model, which includes the model’s architecture

Let’s explore the first case here, since it’s more space-efficient and allows your model’s users to understand the inner workings of the model.

In order to make this work, we need define the models architecture. Let’s see what this looks like:

# Loading a Model and a state_dict
class BlobClassifier(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)

        return x

input_size = len(X[0])
hidden_size = 128
output_size = len(set(y))

model = BlobClassifier(input_size, hidden_size, output_size)
state_dict = torch.load('file_path.p')
model.state_dict = state_dict

In order to load a model from its state dictionary, we first need to instantiate the model. This is what we do in the code block above. Following that, we can load the state_dictionary using the torch.load() function. Finally, we assigned the learned weights and parameters by passing them into the .state_dict attribute.

Now that you have learned how to save and load a model, let’s move on to making predictions.

How to Use PyTorch to Make Predictions Using a Deep Learning Model

The entire tutorial has taken us to his point – being able to make predictions. PyTorch provides a helpful context manager for making predictions – inference mode. We can use this mode to move our model into evaluation mode which prevents any weights and biases being update.

In order to make an inference, we need to make sure that we pass in tensors in the expected format and on the same device. Let’s see how we can make a prediction by passing in some data.

When we initially defined our data, we created a three-dimensional dataset. This means that we need to pass in a tensor of length 3.

# Making predictions with PyTorch
data = torch.tensor([0.0, 0.0, -8.0], device=device)
with torch.inference_mode():
    preds = model(data)
    preds = torch.functional.F.softmax(preds, dim=0)
    max_idx = torch.argmax(preds)
    print(max_idx.item())

# Returns: 1

In the code block above, we first define a data tensor. This moves our list of data into a tensor on the specified device.

Then, we move to inference mode, which allows us to make predictions. There are a few new elements here, so let’s explain them in detail:

  1. We first make predictions by passing our data into the model, which runs it through the forward pass.
  2. Then, we pass this result into the softmax function, which takes our predictions and returns a probability function.
  3. Finally, we find the index of the highest probability value, which returns 1.

By returning 1, we see that this is the second class available. This corresponds to our purple blobs! Congratulations – you just made your first prediction!

Conclusion

In conclusion, this tutorial has provided a comprehensive overview of the PyTorch deep learning workflow. We started by understanding the essential components of a deep learning project, including data loading and preparation, model building, training and validation, making predictions, and saving and loading models. By following this tutorial, you have gained a solid foundation in implementing deep learning projects using PyTorch.

Throughout the tutorial, we abstracted away the complex mathematics behind neural networks and focused on the mechanics of using PyTorch effectively. By leveraging PyTorch’s specialized classes and workflows, you can streamline your deep learning projects and make them more manageable.

By the end of this guide, you have learned how to load data into PyTorch Datasets and batch it using DataLoaders. You have also gained the skills to define deep learning classes, create and use optimizers and criterion (loss functions), build training and validation loops, and save, serialize, and load deep learning models. Additionally, you have discovered how to make predictions (inferences) using PyTorch.

With this knowledge, you are well-equipped to tackle a wide range of deep learning tasks using PyTorch. Remember to continue exploring and experimenting with the concepts and techniques covered in this tutorial to enhance your understanding and proficiency in the field of deep learning.

Now it’s time for you to apply what you have learned and embark on your own deep learning projects using PyTorch. Keep learning, practicing, and pushing the boundaries of what you can achieve with this powerful framework. Happy coding!

To learn more about how to work with PyTorch and how to install it, check out the official documentation on installing.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *