In this post, you’ll learn multiple ways to calculate a cumulative sum on a Pandas Dataframe, including calculating a cumulative sum on a single column, dealing with null values, groups within a dataframe column, and calculating cumulative percentages.
Table of Contents
Loading a sample dataframe
To begin, let’s load a sample Pandas Dataframe. If you want to follow along, copy the code from below and paste it into your favourite editor:
import pandas as pd
df = pd.DataFrame.from_dict(
{
'Type': ['A', 'B', 'A', 'A', 'A', 'B', 'A', 'B', 'B'],
'Date': ['01-Jan-21', '01-Jan-21', '02-Jan-21', '03-Jan-21', '05-Jan-21', '07-Jan-21', '09-Jan-21', '10-Jan-21', '11-Jan-21'],
'Sales': [10, 15, 7, 23, 18, 7, 3, 10, 25],
'Profits': [3, 5, 2, 7, 6, 2, 1, 3, 8]
}
)
print(df)
This returns the following dataframe:
Type Date Sales Profits
0 A 01-Jan-21 10 3
1 B 01-Jan-21 15 5
2 A 02-Jan-21 7 2
3 A 03-Jan-21 23 7
4 A 05-Jan-21 18 6
5 B 07-Jan-21 7 2
6 A 09-Jan-21 3 1
7 B 10-Jan-21 10 3
8 B 11-Jan-21 25 8
Calculating a Pandas Cumulative Sum on a Single Column
Pandas makes it easy to calculate a cumulative sum on a column by using the .cumsum()
method.
Let’s say we wanted to calculate the cumulative sum on the Sales
column. We can accomplish this by writing:
df['Sales'] = df['Sales'].cumsum()
print(df)
This returns the following dataframe:
Type Date Sales Profits
0 A 01-Jan-21 10 3
1 B 01-Jan-21 25 5
2 A 02-Jan-21 32 2
3 A 03-Jan-21 55 7
4 A 05-Jan-21 73 6
5 B 07-Jan-21 80 2
6 A 09-Jan-21 83 1
7 B 10-Jan-21 93 3
8 B 11-Jan-21 118 8
Creating a Pandas Cumulative Sum as a New Column
Now let’s say that we wanted to create a new column, rather than overwriting the column, we can create a new column. To learn more about creating columns in Pandas, check out my video here.
df['Cumulative Sales'] = df['Sales'].cumsum()
print(df)
This returns the following dataframe:
Type Date Sales Profits Cumulative Sales
0 A 01-Jan-21 10 3 10
1 B 01-Jan-21 15 5 25
2 A 02-Jan-21 7 2 32
3 A 03-Jan-21 23 7 55
4 A 05-Jan-21 18 6 73
5 B 07-Jan-21 7 2 80
6 A 09-Jan-21 3 1 83
7 B 10-Jan-21 10 3 93
8 B 11-Jan-21 25 8 118
Now let’s take a look at calculating cumulative sums for columns with null values.
Calculating a Pandas Cumulative Sum with Null Values
The Pandas .cumsum()
also allows you to work with missing data. To test this out, let’s first insert a missing value into our dataframe.
import numpy as np
df.loc[5, 'Sales'] = np.NaN
print(df)
This returns the following dataframe:
Type Date Sales Profits
0 A 01-Jan-21 10.0 3
1 B 01-Jan-21 15.0 5
2 A 02-Jan-21 7.0 2
3 A 03-Jan-21 23.0 7
4 A 05-Jan-21 18.0 6
5 B 07-Jan-21 NaN 2
6 A 09-Jan-21 3.0 1
7 B 10-Jan-21 10.0 3
8 B 11-Jan-21 25.0 8
The Pandas .cumsum()
method has a skipna=
parameter which defaults to True
. What it does, is ignore those missing values (essentially treating them as zeroes).
If we set this parameter to False
, then any values following the missing value will be ignored.
df['Sales'] = df['Sales'].cumsum(skipna=False)
print(df)
This returns the following dataframe:
Type Date Sales Profits
0 A 01-Jan-21 10.0 3
1 B 01-Jan-21 25.0 5
2 A 02-Jan-21 32.0 2
3 A 03-Jan-21 55.0 7
4 A 05-Jan-21 73.0 6
5 B 07-Jan-21 NaN 2
6 A 09-Jan-21 NaN 1
7 B 10-Jan-21 NaN 3
8 B 11-Jan-21 NaN 8
Calculating a Pandas Cumulative Sum on Groups
There may be times when you want to calculate cumulative sums on groups in a Pandas Dataframe.
Let’s say we wanted to calculate the cumulative sums separately for the Type
column, so that the cumulative sum is only calculated for particular groups. In the example below, we’ll use the Pandas .groupby()
method, which you can learn more about in my video here.
Let’s try this out using some Pandas code:
df['Grouped Cumulative Sum'] = df[['Type', 'Sales']].groupby('Type').cumsum()
print(df)
This returns the following dataframe:
Type Date Sales Profits Grouped Cumulative Sum
0 A 01-Jan-21 10 3 10
1 B 01-Jan-21 15 5 15
2 A 02-Jan-21 7 2 17
3 A 03-Jan-21 23 7 40
4 A 05-Jan-21 18 6 58
5 B 07-Jan-21 7 2 22
6 A 09-Jan-21 3 1 61
7 B 10-Jan-21 10 3 32
8 B 11-Jan-21 25 8 57
What we’ve done here is apply the .groupby()
method to both the Type and Price column, and then group the two columns by the Type column. Finally, we call the .cumsum()
method on the remaining column (Price) to calculate its cumulative sum.
What this accomplishes, is that the cumulative sum is calculated only for the specific groups, meaning rows are effectively “skipped” when they don’t belong to a particular group.
Calculating Pandas Cumulative Percentages
Finally, let’s take a look at calculating cumulative percentages in Pandas. This can be particularly effective in seeing how much an item makes up the cumulative total.
While there is no dedicated function for calculating cumulative percentages, we can use the Pandas .cumsum()
method in conjunction with the .sum()
method.
df['Cumulative Sales Percentage'] = df['Sales'].cumsum() / df['Sales'].sum()
print(df)
What we’ve done here is first calculate the cumulative sum, then divided it by the sum of the entire column.
This returns the following dataframe, where the cumulative percentage is shown as a proportion of 1:
Type Date Sales Profits Cumulative Sales Percentage
0 A 01-Jan-21 10 3 0.084746
1 B 01-Jan-21 15 5 0.211864
2 A 02-Jan-21 7 2 0.271186
3 A 03-Jan-21 23 7 0.466102
4 A 05-Jan-21 18 6 0.618644
5 B 07-Jan-21 7 2 0.677966
6 A 09-Jan-21 3 1 0.703390
7 B 10-Jan-21 10 3 0.788136
8 B 11-Jan-21 25 8 1.000000
If we wanted to apply some style to the formatting of the cumulative percentage, we can apply some f-string formatting to the column.
df['Cumulative Sales Percentage'] = df['Sales'].cumsum() / df['Sales'].sum()
df['Cumulative Sales Percentage'] = df['Cumulative Sales Percentage'].apply(lambda x: f'{x:.2%}')
print(df)
This returns the following dataframe:
Type Date Sales Profits Cumulative Sales Percentage
0 A 01-Jan-21 10 3 8.47%
1 B 01-Jan-21 15 5 21.19%
2 A 02-Jan-21 7 2 27.12%
3 A 03-Jan-21 23 7 46.61%
4 A 05-Jan-21 18 6 61.86%
5 B 07-Jan-21 7 2 67.80%
6 A 09-Jan-21 3 1 70.34%
7 B 10-Jan-21 10 3 78.81%
8 B 11-Jan-21 25 8 100.00%
Conclusion
In this post, you learned all about calculating cumulative sums in Pandas. You learned how to calculate it on a single column, dealing with missing values, on groups within a dataframe, and calculating Pandas cumulative percentages. To learn more about the .cumsum()
method, check out the official documentation here.