Skip to content

Calculate a Rolling Average (Mean) in Pandas

Pandas Rolling Average Mean Cover Image

In this post, you’ll learn how to calculate a rolling mean in Pandas using the rolling() function. Rolling averages are also known as moving averages.

Creating a rolling average allows you to “smooth” out small fluctuations in datasets, while gaining insight into trends. It’s often used in macroeconomics, such as unemployment, gross domestic product, and stock prices.A moving average is used to create a rolling subset of the full data and calculate the average of that subset. This allows you to smooth out data with high degrees of fluctuation.

By the end of this tutorial, you’ll have learned:

  • How to calculate a rolling average in Pandas
  • How to understand the syntax of the .rolling() method
  • How to change the center of your rolling average
How to Calculate a Rolling Average in Pandas
The effect of a rolling average on your data

Loading Our Dataset

Let’s load a dataset to explore the rolling function with:

import pandas as pd

prices = [62, 64, 63, 69, 71, 73, 74, 76, 75, 74, 72, 86, 98, 85, 103, 92, 93, 96, 96, 75, 84, 91, 71, 108, 106, 106, 115, 116, 122, 108, 101, 125, 119, 107, 123, 109, 163, 149, 99, 137, 110, 187, 116, 123, 144, 119, 176, 155, 179, 179, 123, 133, 200, 193, 136, 167, 131, 179, 200, 192, 138, 164, 210, 174, 257, 180, 173, 221, 204, 187, 283, 198, 223, 218, 198, 168, 279, 187, 261, 210, 221, 201, 257, 160, 312, 169, 239, 277, 148, 236, 255]

dates = pd.date_range('2022-04-01', periods=len(prices))

df = pd.DataFrame(data=zip(dates, prices), columns=['Date', 'Price'])
print(df.head())

We printed out the first five rows, using the head function:

#         Date  Price
# 0 2022-04-01     62
# 1 2022-04-02     64
# 2 2022-04-03     63
# 3 2022-04-04     69
# 4 2022-04-05     71

Explaining the Pandas Rolling() Function

To calculate a moving average in Pandas, you combine the rolling() function with the mean() function. Let’s take a moment to explore the rolling() function in Pandas:

df.rolling(
    window,             # Size of the moving window
    min_periods=None,   # Min number of observations
    center=False,       # Whether to use the center or right-edge
    win_type=None,      # Weighting of windows
    on=None,            # What column to use
    axis=0,             # For columns or rows
    closed=None,        # How to close windows
    method='single'     # To apply on column/row or DataFrame
)

Let’s explore what these parameters do:

  • window= determines the number of observations used to calculate a statistic.
  • min_periods= will default to the window value and represents the minimum number of observations required.
  • center= determines whether to center the labels of the data
  • win_type= determines the weighting of each item. If left alone, each item will be weighted equally.
  • on= what column to use
  • axis= determines whether the function works along rows or columns.
  • closed= determines whether to close on different endpoints.
  • method= determines whether the function should calculate rows/columns or on an entire DataFrame

Now that you have a strong understanding of the .rolling() method, let’s start calculating the rolling average in Pandas.

Creating a Rolling Average in Pandas

Let’s use Pandas to create a rolling average. In the previous section, you learned that the Pandas .rolling() method returns a rolling window of a given size. Let’s assume we wanted to calculate the rolling window of size 7, we can simply pass in the integer 7.

# Calculating a Rolling Mean with Pandas
df['Rolling'] = df['Price'].rolling(7).mean()

print(df.head(10))

This returns:

        Date  Price    Rolling
0 2022-04-01     62        NaN
1 2022-04-02     64        NaN
2 2022-04-03     63        NaN
3 2022-04-04     69        NaN
4 2022-04-05     71        NaN
5 2022-04-06     73        NaN
6 2022-04-07     74  68.000000
7 2022-04-08     76  70.000000
8 2022-04-09     75  71.571429
9 2022-04-10     74  73.142857

Let’s break down what we did here:

  1. We created a new column using the rolling() method
  2. We passed in the value of 7 to create a rolling 7 day window
  3. We then applied the .mean() method to the calculate the mean of this rolling window

It’s important to note here that we need to pass in a chained method. Otherwise, Python will raise an error. This is where you could pass in different methods in order to calculate other rolling statistics.

Visualizing a Moving Average in Pandas

Let’s create a visualization in order to demonstrate the benefit of the rolling average. To visualize the data without the rolling average, we can write the following code:

# Visualizing data without rolling averages
import matplotlib.pyplot as plt
plt.plot(df['Date'], df['Price'])
plt.title('Data With Rolling Average')

plt.show()
Data without Rolling Average in Pandas
The data without a rolling average

To visualize what effect the rolling average has on smoothing the data, we can plot the two columns in the same chart:

# Plotting the effect of a rolling average
import matplotlib.pyplot as plt
plt.plot(df['Date'], df['Price'])
plt.plot(df['Date'], df['Rolling'])
plt.title('Data With Rolling Average')

plt.show()

This returns the following image:

Data with Rolling Average in Pandas
The effect of a rolling mean in Pandas

Modifying the Center of a Rolling Average in Pandas

By default, Pandas use the right-most edge for the window’s resulting values. This is why our data started on the 7th day, because no data existed for the first six.We can modify this behavior by modifying the center= argument to True. This will result in “shifting” the value to the center of the window index.

Let’s see how we can do this in Pandas:

# Modifying the Window Center
df = pd.DataFrame(data=zip(dates, prices), columns=['Date', 'Price'])
df['Rolling'] = df['Price'].rolling(7).mean()

df['Rolling Center'] = df['Price'].rolling(7, center=True).mean()

print(df.head(10))

# Returns:
#         Date  Price    Rolling  Rolling Center
# 0 2022-04-01     62        NaN             NaN
# 1 2022-04-02     64        NaN             NaN
# 2 2022-04-03     63        NaN             NaN
# 3 2022-04-04     69        NaN       68.000000
# 4 2022-04-05     71        NaN       70.000000
# 5 2022-04-06     73        NaN       71.571429
# 6 2022-04-07     74  68.000000       73.142857
# 7 2022-04-08     76  70.000000       73.571429
# 8 2022-04-09     75  71.571429       75.714286
# 9 2022-04-10     74  73.142857       79.285714

We can see that the window was adjusted. Because of this, the rolling data started at the center of the window (which in this case was the 4th record).

Conclusion

In this tutorial, you learned how to calculate a rolling average in Pandas. You learned what a rolling average is and why it’s useful. You then learned how to use the Pandas rolling function to calculate a rolling window which was used to apply the .mean() method to. You also learned how to visualize the data as well as how to change the center of the rolling window.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published.