In this tutorial, you’ll learn how use Pandas to calculate a sum, including how to add up dataframe columns and rows. Being able to add up values to calculate either column totals or row totals allows you to generate some helpful summary statistics.
By the end of this tutorial, you’ll have learned how to:
- Calculate the Sum of a Pandas Dataframe Column
- Calculate the Sum of a Pandas Dataframe Row
- Add Pandas Dataframe Columns Together
- Add Pandas Dataframe Columns That Meet a Condition
- Calculate the Sum of a Pandas GroupBy Dataframe
Quick Answer: Use Pandas .sum() To Add Dataframe Columns and Rows
How to add values | Description | Example |
---|---|---|
Column-wise | Add all numeric values in a Pandas column or a dataframe’s columns | df['column name'].sum() |
Row-wise | Add all numeric values in a Pandas row | df.sum(axis=1) |
Specific Columns | Add values of specific columns | df['column 1'] + df['column 2'] |
Table of Contents
Loading a Sample Pandas Dataframe
If you want to follow along with the tutorial line by line, copy the code below. This code loads a sample Pandas Dataframe that we’ll reference throughout the tutorial. If you have your own data, feel free to use that follow along, but your results will vary.
# Loading a Sample Pandas Dataframe
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Nik', 'Kate', 'Kevin', 'Evan', 'Jane', 'Kyra', 'Melissa'],
'January_Sales': [90, 95, 75, 93, 60, 85, 75],
'February_Sales': [95, 95, 75, 65, 50, 85, 100],
'March_Sales': [100, 95, 50, 75, 90, 50, 80]
})
print(df.head())
# Returns:
# Name January_Sales February_Sales March_Sales
# 0 Nik 90 95 100
# 1 Kate 95 95 95
# 2 Kevin 75 75 50
# 3 Evan 93 65 75
# 4 Jane 60 50 90
We can see that we have four columns: 1 that contains the name of a salesperson and three that contain the sales values of each salesperson.
In the next section, you’ll learn how to use Pandas to add up all the values in a dataframe column.
Calculate the Sum of a Pandas Dataframe Column
A common task you may need to do is add up all the values in a Pandas Dataframe column. Thankfully, Pandas makes this very easy with the sum
method. We can apply this method to either a Pandas series (meaning, a column) or an entire dataframe.
Let’s start by learning how to how to add up all the values in a Pandas column:
# Add all values in a Pandas column
january_sum = df['January_Sales'].sum()
print(january_sum)
# Returns: 572
Similarly, we can calculate the sum of all columns in a Pandas Dataframe. We can do this by simply applying the sum
method on the entire dataframe.
Let’s give this a shot:
# Calculate the sum of all columns in a Pandas Dataframe
dataframe_sum = df.sum()
print(dataframe_sum)
# Returns:
# Name NikKateKevinEvanJaneKyraMelissa
# January_Sales 573
# February_Sales 565
# March_Sales 540
This returns a Pandas series that easily query, if we wanted to return the sum of a particular column. By default, Pandas will only add up numeric columns, meaning that we don’t add up our Name column.
In the next section, you’ll learn how to calculate the sum of a Pandas Dataframe row.
Calculate the Sum of a Pandas Dataframe Row
In many cases, you’ll want to add up values across rows in a Pandas Dataframe. Similar to the example above, we can make use of the .sum
method. By default, Pandas will apply an axis=0
argument, which will add up values index-wise. If we can change this to axis=1
, values will be added column-wise.
Let’s see how we can add up values across rows in Pandas:
# Adding up dataframe rows using .sum()
dataframe_sum = df.sum(axis=1, numeric_only=True)
print(dataframe_sum)
# Returns:
# 0 285
# 1 285
# 2 200
# 3 233
# 4 200
# 5 220
# 6 255
# dtype: int64
You may be wondering why we apply the numeric_only=True
argument here. In future versions of Pandas, a TypeError
will be thrown if non-numeric columns are included.
What if we wanted to assign an index to make the rows easier to tell apart? We can do this using the Pandas set_index
method. Let’s see what this looks like:
# Setting an index and adding up all dataframe rows
dataframe_sum = df.set_index('Name').sum(axis=1, numeric_only=True)
print(dataframe_sum)
# Returns:
# Name
# Nik 285
# Kate 285
# Kevin 200
# Evan 233
# Jane 200
# Kyra 220
# Melissa 255
# dtype: int64
This is much cleaner result that allows us to better see the row’s identifier, which in this case is the name of the salesperson.
In the next section, you’ll learn how to just add some columns of a Pandas Dataframe together.
Add Pandas Dataframe Columns Together
Pandas makes it easy to add different columns together, selectively. Say we only wanted to add two columns together row-wise, rather than all of them, we can simply add the columns directly. The benefit of this approach is that we can assign a new column that stores these values.
Let’s see what this looks like:
# Add two columns in a Pandas dataframe
df['Jan_Feb_Sum'] = df['January_Sales'] + df['February_Sales']
print(df.head())
# Returns:
# Name January_Sales February_Sales March_Sales Jan_Feb_Sum
# 0 Nik 90 95 100 185
# 1 Kate 95 95 95 190
# 2 Kevin 75 75 50 150
# 3 Evan 93 65 75 158
# 4 Jane 60 50 90 110
We can see that the we’ve created a new column that stores the sum of two of our columns. A great thing about this operation is that its vectorized, meaning that its very fast and takes advantage of the power of Pandas.
In the next section, you’ll learn how to add dataframe columns conditionally.
Add Pandas Dataframe Columns That Meet a Condition
There may be times when you want to add multiple columns in a dataframe, but not all of them. We can do this by adding Pandas columns conditionally, with the help of a list comprehension.
For this example, let’s modify our dataframe to include an additional numerical column:
# Loading a modify Pandas Dataframe
# Loading a Sample Pandas Dataframe
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Nik', 'Kate', 'Kevin', 'Evan', 'Jane', 'Kyra', 'Melissa'],
'January_Sales': [90, 95, 75, 93, 60, 85, 75],
'February_Sales': [95, 95, 75, 65, 50, 85, 100],
'March_Sales': [100, 95, 50, 75, 90, 50, 80],
'Some Random Number': [1,2,3,4,5,6,7]
})
print(df.head())
# Returns:
# Name January_Sales February_Sales March_Sales Some Random Number
# 0 Nik 90 95 100 1
# 1 Kate 95 95 95 2
# 2 Kevin 75 75 50 3
# 3 Evan 93 65 75 4
# 4 Jane 60 50 90 5
Now, when we’re adding the values of our rows, it may not make sense to include the last column. Say, we only wanted to include the columns that include the word Sales. What we can do is create a list comprehension that checks if the word Sales exists in a column or not.
Let’s see how we can do this:
sales_columns = [col for col in df.columns if 'Sales' in col]
print(sales_columns)
# Returns: ['January_Sales', 'February_Sales', 'March_Sales']
You can learn more about how to do this in this tutorial, by learning how to iterate over columns and checking for a condition. In order to do this, we’ll first use Pandas to get our DataFrame’s columns as a list.
Now that we have our columns selected, we can use the axis=1
argument and add up only the columns that contain Sales. Let’s see what this looks like:
# Adding columns conditionally in Pandas
sales_columns = [col for col in df.columns if 'Sales' in col]
df['Total Sales'] = df[sales_columns].sum(axis=1)
print(df.head())
# Returns:
# Name January_Sales February_Sales March_Sales Some Random Number Total Sales
# 0 Nik 90 95 100 1 285
# 1 Kate 95 95 95 2 285
# 2 Kevin 75 75 50 3 200
# 3 Evan 93 65 75 4 233
# 4 Jane 60 50 90 5 200
This way, we can safely add up values across columns row-wise without adding in columns that we don’t want to include.
In the next section, you’ll learn how to calculate the sum of a Pandas Dataframe when data are grouped using groupby
.
Calculate the Sum of a Pandas GroupBy Dataframe
In this final section, you’ll learn how to calculate the sum of a Pandas Dataframe when grouping data using the groupby
method. For this, we’ll modify our dataframe to include a column to a salesperson’s gender. This can allow us to group the data by gender and calculating totals by gender.
# Loading a Sample Pandas Dataframe
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Nik', 'Kate', 'Kevin', 'Evan', 'Jane', 'Kyra', 'Melissa'],
'Gender': ['Male', 'Female', 'Male', 'Male', 'Female', 'Female', 'Female'],
'January_Sales': [90, 95, 75, 93, 60, 85, 75],
'February_Sales': [95, 95, 75, 65, 50, 85, 100],
'March_Sales': [100, 95, 50, 75, 90, 50, 80],
})
print(df.head())
# Returns:
# Name Gender January_Sales February_Sales March_Sales
# 0 Nik Male 90 95 100
# 1 Kate Female 95 95 95
# 2 Kevin Male 75 75 50
# 3 Evan Male 93 65 75
# 4 Jane Female 60 50 90
We can now group our data using the group by method, in order to group it by gender. To learn more about how to group data with the groupby
method, check out my video here:
Let’s group our data and add all the numeric columns:
grouped = df.groupby('Gender').sum()
print(grouped)
# Returns:
# January_Sales February_Sales March_Sales
# Gender
# Female 315 330 315
# Male 258 235 225
We can see that first grouping our data by Gender
and then adding the values in the dataframe returns a column-wise sum based on the groupings of Gender.
Conclusion
In this tutorial, you learned how to use the Pandas sum method to calculate sums across dataframes. You learned how to add values row-wise and column-wise. You also learned how add columns conditionally and how to add values in a grouped Pandas Dataframe.
To learn more about the Pandas sum
function, check out the official documentation here.
Additional Resources
Check out the links below to learn about related topics: