Pandas: Count Unique Values in a GroupBy Object

Pandas Count Unique Values in a Group By Cover Image

In this tutorial, you’ll learn how to use Pandas to count unique values in a groupby object. The Pandas .groupby() method is an essential tool in your data analysis toolkit, allowing you to easily split your data into different groups and allow you to perform different aggregations to each group.

By the end of this tutorial, you’ll have learned how to count unique values in a Pandas groupby object, using the incredibly useful .nunique() Pandas method.

The Quick Answer: Use .nunique() to Count Unique Values in a Pandas GroupBy Object

Quick Answer - Pandas Count Unique Values in a Group By

Loading a Sample Dataframe

If you want to follow along with this tutorial, feel free to load the sample dataframe provided below by simply copying and pasting the code into your favourite code editor.

Let’s dive right in:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/women-stem.csv')

print(df.head())

This returns the following dataframe:

   Rank  Major_code                                      Major Major_category  Total   Men  Women  ShareWomen  Median
0     1        2419                      PETROLEUM ENGINEERING    Engineering   2339  2057    282    0.120564  110000
1     2        2416             MINING AND MINERAL ENGINEERING    Engineering    756   679     77    0.101852   75000
2     3        2415                  METALLURGICAL ENGINEERING    Engineering    856   725    131    0.153037   73000
3     4        2417  NAVAL ARCHITECTURE AND MARINE ENGINEERING    Engineering   1258  1123    135    0.107313   70000
4     5        2418                        NUCLEAR ENGINEERING    Engineering   2573  2200    373    0.144967   65000

This dataset is provided by FiveThirtyEight and provides information on women’s representation across different STEM majors.

Understanding Pandas GroupBy

Before we dive into how to use Pandas .groupby() to count unique values in a group, let’s explore how the .groupby() method actually works. This will allow you to understand why this solution works, allowing you to apply it different scenarios more easily. The method is incredibly versatile and fast, allowing you to answer relatively complex questions with ease.

The Pandas .groupby()works in three parts:

  1. Split – split the data into different groups
  2. Apply – apply some form of aggregation
  3. Combine – recombine the data

Let’s see how you can use the .groupby() method to find the maximum of a group, specifically the Major group, with the maximum proportion of women in that group:

# How to use the Pandas .groupby() method
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/women-stem.csv')
group_by = df.groupby('Major_category')['ShareWomen'].max()

print(group_by.head())

This returns the following:

Major_category
Biology & Life Science     0.707719
Computers & Mathematics    0.526476
Engineering                0.451465
Health                     0.967998
Physical Sciences          0.750473
Name: ShareWomen, dtype: float64

Now that you know how to use the Pandas .groupby() method, let’s see how we can use the method to count the number of unique values in each group.

To learn more about the Pandas .groupby() method, check out my in-depth tutorial here:

Pandas: Count Unique Values in a GroupBy Object

Let’s learn how you can count the number of unique values in a Pandas groupby object. Similar to the example shown above, you’re able to apply a particular transformation to a group.

In order to do this, we can use the helpful Pandas .nunique() method, which allows us to easily count the number of unique values in a given segment. To learn more about this function, check out my tutorial here.

Let’s see how we can do this with Python and Pandas:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/women-stem.csv')
group_by = df.groupby('Major_category')['ShareWomen'].nunique()

print(group_by.head())

This returns the following:

Major_category
Biology & Life Science     14
Computers & Mathematics    11
Engineering                29
Health                     12
Physical Sciences          10
Name: ShareWomen, dtype: int64

Let’s see what we’ve done here:

  1. We first used the .groupby() method and passed in the Major_category column, indicating we want to split by that column
  2. We then passed in the ShareWomen column to indicate we want the number of unique values for that column
  3. We then apply the .nunique() method to count the number of unique values in that column

Conclusion

In this post, you learned how to count the number of unique values in a Pandas group. You learned a little bit about the Pandas .groupby() method and how to use it to aggregate data. Finally, you learned how to use the Pandas .groupby() method to count the number of unique values in each Pandas group.

To learn more about the Pandas groupby method, check out the official documentation here.