Calculate a Rolling Average (Mean) in Pandas

  • by
Rolling Average in Pandas Cover Image
  • Save

In this post, you’ll learn how to calculate a rolling mean in Pandas using the rolling() function. Rolling averages are also known as moving averages.

Creating a rolling average allows you to “smooth” out small fluctuations in datasets, while gaining insight into trends. It’s often used in macroeconomics, such as unemployment, gross domestic product, and stock prices.

Rolling average in Pandas visualized
  • Save

Video Tutorial

Table of Contents

Loading Our Dataset

Let’s load a dataset to explore the rolling function with:

import pandas as pd

dates = pd.date_range(start='2020-01-01', end='2020-01-31', freq='d').to_list()
numbers = [43, 3, 31, 1, 39, 18, 15, 49, 6, 14, 46, 45, 15, 13, 5, 4, 23, 25, 43, 4, 18, 3, 27, 38, 43, 38, 45, 39, 19, 30, 50]

df = pd.DataFrame({'Dates':dates, 'Price':numbers})
print(df.head())

We printed out the first five rows, using the head function:

        Dates  Price
0  2020-01-01     43
1  2020-01-02      3
2  2020-01-03     31
3  2020-01-04      1
4  2020-01-05     39

Explaining the Pandas Rolling() Function

To calculate a moving average in Pandas, you combine the rolling() function with the mean() function. Let’s take a moment to explore the rolling() function in Pandas:

DataFrame.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)

Let’s explore what these parameters do:

  • The window parameter determines the number of observations used to calculate a statistic.
  • Min periods will default to the window value and represents the minimum number of observations required.
  • Center determines whether to center the labels of the data
  • Win_type determines the weighting of each item. If left alone, each item will be weighted equally.
  • Axis determines whether the function works along rows or columns.
  • Closed determines whether to close on different endpoints.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Creating a Rolling Average in Pandas

Let’s use Pandas to create a rolling average. It’s important to determine the window size, or rather, the amount of observations required to form a statistic. Let’s create a rolling mean with a window size of 5:

df['Rolling'] = df['Price'].rolling(5).mean()
print(df.head(10))

This returns:

       Dates  Price  Rolling
0 2020-01-01     43      NaN
1 2020-01-02      3      NaN
2 2020-01-03     31      NaN
3 2020-01-04      1      NaN
4 2020-01-05     39     23.4
5 2020-01-06     18     18.4
6 2020-01-07     15     20.8
7 2020-01-08     49     24.4
8 2020-01-09      6     25.4
9 2020-01-10     14     20.4

There’s a couple of items to note here:

  • We’ve assigned a new column (Rolling) that takes values from the Price column
  • Only one argument has been assigned (the window size)
  • By default, the data is not centered (meaning only previous values are considered)
    • Because of this, the first four values are returned as NaN

Visualizing a Moving Average in Pandas

Let’s create a visualization in order to demonstrate the benefit of the rolling average.

The data without the rolling average looks like this:

The original data
  • Save

The data as a rolling average looks like this:

A rolling average in Pandas
  • Save

Combined, they look like this:

Overlaying a rolling average over the original values
  • Save

Conclusion

In this post, you learned how to create a moving average in Pandas. Doing this combines the rolling() and mean() functions.

To learn more about the rolling function, check out the official documentation.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!

Tags: