The Pandas groupby method is incredibly powerful and even lets you group by and aggregate multiple columns. In this tutorial, you’ll learn how to use the Pandas groupby method to aggregate multiple columns. The syntax of the method can be a little confusing at first. Don’t worry – this tutorial will simplify this. If you’re looking to get a deep dive on the Pandas groupby method, we’ve got you covered, too.
By the end of this tutorial, you’ll have learned:
- How to use Pandas groupby for multiple columns
- How to use multiple aggregations for multiple columns in Pandas groupby
- How to use different aggregations for multiple columns in groupby
How to use Pandas Groupby with Multiple Columns
To use Pandas groupby with multiple columns, you can pass in a list of column headers directly into the method. The order in which you pass columns into the list determines the hierarchy of columns you use.
To start, let’s load a sample Pandas DataFrame. We’ll use the same dataset as we did in our in-depth guide to Pandas pivot tables.
import pandas as pd
df = pd.read_excel(
'https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx')
print(df.head())
# Date Region Type Units Sales
# 0 2020-07-11 East Children's Clothing 18.0 306
# 1 2020-09-23 North Children's Clothing 14.0 448
# 2 2020-04-02 South Women's Clothing 17.0 425
# 3 2020-02-28 East Children's Clothing 26.0 832
# 4 2020-03-19 West Women's Clothing 3.0 33
We can see that we have five columns, two of which are numeric, two are strings, and one is a date. Now, let’s take a look at a simple example and break down what’s happening:
# Grouping by Multiple Columns in Pandas
import pandas as pd
df = pd.read_excel(
'https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx')
grouped = df.groupby(['Region', 'Type']).sum()
print(grouped.head())
# Returns:
# Units Sales
# Region Type
# East Children's Clothing 2318.0 45849
# Men's Clothing 2420.0 51685
# Women's Clothing 3372.0 70229
# North Children's Clothing 1763.0 37306
# Men's Clothing 0.0 39975
Let’s break down what is happening in the code block above:
- We import Pandas and load our dataset
- We then create a new grouped DataFrame by passing in
['Region', 'Type']
into the.groupby()
method - Finally, we apply the
.sum()
method to calculate the sum for each aggregation
We can see that by passing in a list of multiple columns, we create a hierarchy in which columns are to be grouped. This is very important and determines the layers in which your data will be grouped.
Using GroupBy with Multiple Columns to Aggregate a Single Columns
In the example above, we used the Pandas .groupby()
method to aggregate multiple columns. However, we aggregated all of the numeric columns. To use Pandas groupby with multiple columns and aggregate only a single column, we need to index that column.
Let’s see how we can do this and then break down how it works:
# Using GroupBy with Multiple Columns to Aggregate a Single Columns
import pandas as pd
df = pd.read_excel(
'https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx')
grouped = df.groupby(['Region', 'Type'])['Units'].sum()
print(grouped.head())
# Returns:
# Region Type
# East Children's Clothing 2318.0
# Men's Clothing 2420.0
# Women's Clothing 3372.0
# North Children's Clothing 1763.0
# Men's Clothing 0.0
In the example above, we are grouping by multiple columns, but only aggregating the 'Units'
column.
How to Use Multiple Aggregations with Pandas GroupBy with Multiple Columns
Similar to the example above, we can also use multiple aggregations when using the groupby method with multiple columns. In order to do this, we can pass a list into the .aggregate()
method that defines the aggregations we want to use. Let’s see how we can calculate the minimum and maximum values for the Units columns:
# Multiple Aggregations with Pandas GroupBy with Multiple Columns
import pandas as pd
df = pd.read_excel(
'https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx')
grouped = df.groupby(['Region', 'Type'])['Units'].aggregate(['min', 'max'])
print(grouped.head())
# Returns:
# min max
# Region Type
# East Children's Clothing 3.0 35.0
# Men's Clothing 3.0 35.0
# Women's Clothing 3.0 35.0
# North Children's Clothing 3.0 35.0
# Men's Clothing NaN NaN
In order to use multiple aggregations, we have to use the .aggregate()
method on our groupby object. This allows you to calculate multiple aggregations for the grouping you’re using. Finally, let’s take a look at how to use different aggregations per column when using Pandas groupby with multiple columns.
How to Use Different Aggregations Per Column When Using Pandas GroupBy with Multiple Columns
Pandas also allows you to use different aggregations per column when using groupby with multiple columns. In the example above, we used a list to pass multiple strings into the .aggregate()
function. Similarly, we can pass in a dictionary that defined the aggregations we want to use per column.
Let’s take a look at an example and then dive into how this works:
# Different Aggregations Per Column When Using Pandas GroupBy with Multiple Columns
import pandas as pd
df = pd.read_excel(
'https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx')
grouped = df.groupby(['Region', 'Type']).aggregate(
{'Units': 'mean', 'Sales': ['min', 'max']})
print(grouped.head())
# Returns:
# Units Sales
# mean min max
# Region Type
# East Children's Clothing 20.513274 36 1122
# Men's Clothing 19.836066 36 1054
# Women's Clothing 19.159091 52 1054
# North Children's Clothing 20.741176 55 1088
# Men's Clothing NaN 66 1155
Let’s break down what we’re doing in the code above:
- We create our groupby object as before, grouping by the Region and Type fields
- We then apply the
.aggregate()
method to this groupby object - In the
.aggregate()
method, we pass in a dictionary. The keys are the column labels and the values represent the aggregations we want to apply.
We can see that we can pass in a single aggregation, as we did for Units, or pass in a list of aggregations, as we did for Sales.
Conclusion
In this tutorial, you learned how to use Pandas groupby with multiple columns. The groupby method is an incredibly powerful and versatile method that allows you to aggregate values in a similar way to SQL GROUP BY statements.
You first learned how to use the .groupby()
method with multiple columns. Then, you learned how to aggregate only a single column, when grouping by multiple columns. Then, you learned how to specify multiple aggregations for all columns. Finally, you learned how to specify different aggregations for each column when grouping by multiple columns.
Additional Resources
To learn more about related topics, check out the tutorials below: