Pandas Value_counts to Count Unique Values

  • by
Cover image for Pandas value_counts
  • Save

An important step in exploring your dataset is to explore how often unique values show up. Pandas makes this incredibly easy using the Pandas value_counts function.

In this post, you’ll learn how to use the Pandas value_counts function to count unique values in a Pandas dataframe.

Video Tutorial

Table of Contents

Loading libraries and dataset

Let’s begin by loading the Pandas and Numpy libraries and the dataset you’ll use to learn the value_counts function.

import pandas as pd
import numpy as np

data = {'Level': ['Beginner', 'Intermediate', 'Advanced', 'Beginner', 'Intermediate', 'Advanced', 'Beginner', 'Intermediate', 'Advanced', 'Beginner', 'Intermediate', 'Advanced', 'Beginner', 'Intermediate', 'Advanced', 'Beginner', 'Intermediate', 'Advanced'], 'Students': [10.0, 20.0, 10.0, 40.0, 20.0, 10.0, np.nan, 20.0, 20.0, 40.0, 10.0, 30.0, 30.0, 10.0, 10.0, 10.0, 40.0, 20.0]}

df = pd.DataFrame.from_dict(data)

Let’s print out the first five records using the .head() method:

print(df.head())

Using the .head() method returns the following:

          Level  Students
0      Beginner      10.0
1  Intermediate      20.0
2      Advanced      10.0
3      Beginner      40.0
4  Intermediate      20.0

Pandas value_counts Explored

Let’s take a moment to explore the different parameters of the value counts function. You’ll want to apply the function to a series, rather than a dataframe.

Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Let’s explore these parameters:

  • The normalize parameter returns relative frequencies of values,
  • The sort parameter sorts the values in order from highest to lowest,
  • The ascending parameter sorts values either in ascending or descending order,
  • The bins parameter groups values into half open bins and only works with numeric data,
  • The dropna parameter is used to include or exclude missing values.

Let’s begin by creating a value_counts series of the Students column:

df['Students'].value_counts()

This returns the following:

10.0    7
20.0    5
40.0    3
30.0    2
Name: Students, dtype: int64

Pandas value_counts Normalize for Percentages

The value_counts function has a useful parameter (the normalize parameter) to return relative frequencies.

Let’s create relative frequencies of the Students column:

df['Students'].value_counts(normalize=True)

This returns:

10.0    0.411765
20.0    0.294118
40.0    0.176471
30.0    0.117647
Name: Students, dtype: float64

If you wanted to turn these into percentages, we can multiply it by 100:

df['Students'].value_counts(normalize=True) * 100

Which returns:

10.0    41.176471
20.0    29.411765
40.0    17.647059
30.0    11.764706
Name: Students, dtype: float64

Pandas value_counts dropna to includes missing values

By default, the value_counts function does not include missing values in the resulting series. It can be helpful to know how many values are missing, however.

To include missing values, simply set the dropna= parameter to False.

df['Students'].value_counts(dropna=False)

This returns:

10.0    7
20.0    5
40.0    3
30.0    2
NaN     1
Name: Students, dtype: int64

Use Pandas value_counts to bin data

If you’re working with large numbers of numerical data, it can be helpful to bin your data into different bins to get a more general overview of the data.

For example, you can use the bins= argument to split the resulting series into bins. Let’s split the data into three bins:

df['Students'].value_counts(bins=3)

This returns:

(9.969000000000001, 20.0]    12
(30.0, 40.0]                  3
(20.0, 30.0]                  2
Name: Students, dtype: int64

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Combining Pandas value_counts and groupby

A really useful tip with the value_counts function to return the counts of unique sets of values. Let’s group the data by the Level column and then generate counts for the Students column:

df.groupby('Level')['Students'].value_counts()

This returns:

Level         Students
Advanced      10.0        3
              20.0        2
              30.0        1
Beginner      10.0        2
              40.0        2
              30.0        1
Intermediate  20.0        3
              10.0        2
              40.0        1
Name: Students, dtype: int64

Conclusion

In this post, you learned how to use the value_counts function to create counts of unique values. You also learned how to use the different parameters available and how to combine the groupby() function with the value_counts function.

To learn more about the Pandas value_counts function, check out the official documentation.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!