Skip to content

4 Ways to Calculate Pandas Cumulative Sum

Pandas Cumulative Sum Cover Image

In this post, you’ll learn multiple ways to calculate a cumulative sum on a Pandas Dataframe, including calculating a cumulative sum on a single column, dealing with null values, groups within a dataframe column, and calculating cumulative percentages.

Loading a sample dataframe

To begin, let’s load a sample Pandas Dataframe. If you want to follow along, copy the code from below and paste it into your favourite editor:

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        'Type': ['A', 'B', 'A', 'A', 'A', 'B', 'A', 'B', 'B'], 
        'Date': ['01-Jan-21', '01-Jan-21', '02-Jan-21', '03-Jan-21', '05-Jan-21', '07-Jan-21', '09-Jan-21', '10-Jan-21', '11-Jan-21'], 
        'Sales': [10, 15, 7, 23, 18, 7, 3, 10, 25], 
        'Profits': [3, 5, 2, 7, 6, 2, 1, 3, 8]
    }
)

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits
0    A  01-Jan-21     10        3
1    B  01-Jan-21     15        5
2    A  02-Jan-21      7        2
3    A  03-Jan-21     23        7
4    A  05-Jan-21     18        6
5    B  07-Jan-21      7        2
6    A  09-Jan-21      3        1
7    B  10-Jan-21     10        3
8    B  11-Jan-21     25        8

Calculating a Pandas Cumulative Sum on a Single Column

Pandas makes it easy to calculate a cumulative sum on a column by using the .cumsum() method.

Let’s say we wanted to calculate the cumulative sum on the Sales column. We can accomplish this by writing:

df['Sales'] = df['Sales'].cumsum()

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits
0    A  01-Jan-21     10        3
1    B  01-Jan-21     25        5
2    A  02-Jan-21     32        2
3    A  03-Jan-21     55        7
4    A  05-Jan-21     73        6
5    B  07-Jan-21     80        2
6    A  09-Jan-21     83        1
7    B  10-Jan-21     93        3
8    B  11-Jan-21    118        8

Creating a Pandas Cumulative Sum as a New Column

Now let’s say that we wanted to create a new column, rather than overwriting the column, we can create a new column. To learn more about creating columns in Pandas, check out my video here.

df['Cumulative Sales'] = df['Sales'].cumsum()

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits  Cumulative Sales
0    A  01-Jan-21     10        3                10
1    B  01-Jan-21     15        5                25
2    A  02-Jan-21      7        2                32
3    A  03-Jan-21     23        7                55
4    A  05-Jan-21     18        6                73
5    B  07-Jan-21      7        2                80
6    A  09-Jan-21      3        1                83
7    B  10-Jan-21     10        3                93
8    B  11-Jan-21     25        8               118

Now let’s take a look at calculating cumulative sums for columns with null values.

Calculating a Pandas Cumulative Sum with Null Values

The Pandas .cumsum() also allows you to work with missing data. To test this out, let’s first insert a missing value into our dataframe.

import numpy as np

df.loc[5, 'Sales'] = np.NaN

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits
0    A  01-Jan-21   10.0        3
1    B  01-Jan-21   15.0        5
2    A  02-Jan-21    7.0        2
3    A  03-Jan-21   23.0        7
4    A  05-Jan-21   18.0        6
5    B  07-Jan-21    NaN        2
6    A  09-Jan-21    3.0        1
7    B  10-Jan-21   10.0        3
8    B  11-Jan-21   25.0        8

The Pandas .cumsum() method has a skipna= parameter which defaults to True. What it does, is ignore those missing values (essentially treating them as zeroes).

If we set this parameter to False, then any values following the missing value will be ignored.

df['Sales'] = df['Sales'].cumsum(skipna=False)

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits
0    A  01-Jan-21   10.0        3
1    B  01-Jan-21   25.0        5
2    A  02-Jan-21   32.0        2
3    A  03-Jan-21   55.0        7
4    A  05-Jan-21   73.0        6
5    B  07-Jan-21    NaN        2
6    A  09-Jan-21    NaN        1
7    B  10-Jan-21    NaN        3
8    B  11-Jan-21    NaN        8

Calculating a Pandas Cumulative Sum on Groups

There may be times when you want to calculate cumulative sums on groups in a Pandas Dataframe.

Let’s say we wanted to calculate the cumulative sums separately for the Type column, so that the cumulative sum is only calculated for particular groups. In the example below, we’ll use the Pandas .groupby() method, which you can learn more about in my video here.

Let’s try this out using some Pandas code:

df['Grouped Cumulative Sum'] = df[['Type', 'Sales']].groupby('Type').cumsum()

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits  Grouped Cumulative Sum
0    A  01-Jan-21     10        3                      10
1    B  01-Jan-21     15        5                      15
2    A  02-Jan-21      7        2                      17
3    A  03-Jan-21     23        7                      40
4    A  05-Jan-21     18        6                      58
5    B  07-Jan-21      7        2                      22
6    A  09-Jan-21      3        1                      61
7    B  10-Jan-21     10        3                      32
8    B  11-Jan-21     25        8                      57

What we’ve done here is apply the .groupby() method to both the Type and Price column, and then group the two columns by the Type column. Finally, we call the .cumsum() method on the remaining column (Price) to calculate its cumulative sum.

What this accomplishes, is that the cumulative sum is calculated only for the specific groups, meaning rows are effectively “skipped” when they don’t belong to a particular group.

Calculating Pandas Cumulative Percentages

Finally, let’s take a look at calculating cumulative percentages in Pandas. This can be particularly effective in seeing how much an item makes up the cumulative total.

While there is no dedicated function for calculating cumulative percentages, we can use the Pandas .cumsum() method in conjunction with the .sum() method.

df['Cumulative Sales Percentage'] = df['Sales'].cumsum() / df['Sales'].sum()

print(df)

What we’ve done here is first calculate the cumulative sum, then divided it by the sum of the entire column.

This returns the following dataframe, where the cumulative percentage is shown as a proportion of 1:

  Type       Date  Sales  Profits  Cumulative Sales Percentage
0    A  01-Jan-21     10        3                     0.084746
1    B  01-Jan-21     15        5                     0.211864
2    A  02-Jan-21      7        2                     0.271186
3    A  03-Jan-21     23        7                     0.466102
4    A  05-Jan-21     18        6                     0.618644
5    B  07-Jan-21      7        2                     0.677966
6    A  09-Jan-21      3        1                     0.703390
7    B  10-Jan-21     10        3                     0.788136
8    B  11-Jan-21     25        8                     1.000000

If we wanted to apply some style to the formatting of the cumulative percentage, we can apply some f-string formatting to the column.

df['Cumulative Sales Percentage'] = df['Sales'].cumsum() / df['Sales'].sum()
df['Cumulative Sales Percentage'] = df['Cumulative Sales Percentage'].apply(lambda x: f'{x:.2%}')

print(df)

This returns the following dataframe:

  Type       Date  Sales  Profits Cumulative Sales Percentage
0    A  01-Jan-21     10        3                       8.47%
1    B  01-Jan-21     15        5                      21.19%
2    A  02-Jan-21      7        2                      27.12%
3    A  03-Jan-21     23        7                      46.61%
4    A  05-Jan-21     18        6                      61.86%
5    B  07-Jan-21      7        2                      67.80%
6    A  09-Jan-21      3        1                      70.34%
7    B  10-Jan-21     10        3                      78.81%
8    B  11-Jan-21     25        8                     100.00%

Conclusion

In this post, you learned all about calculating cumulative sums in Pandas. You learned how to calculate it on a single column, dealing with missing values, on groups within a dataframe, and calculating Pandas cumulative percentages. To learn more about the .cumsum() method, check out the official documentation here.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *