All the Ways to Get Pandas Unique Values

  • by
Pandas unique values cover image
  • Save

In this post, you’ll learn all the different ways to find unique values in your Pandas dataframe.

Be sure to check the table of contents if you need a specific question answered!

Often while working with large datasets, you’ll want to know to how many unique elements exist in a column (or multiple columns).

Table of Contents:

Loading Our Data Set

Let’s begin by loading a dataset we can use throughout this tutorial. We’ll use the FiveThirtyEight data set on the 2018 World Cup as it’ll have nicely structured data with lots of unique values to explore!

import pandas as pd
df = pd.read_csv('https://projects.fivethirtyeight.com/soccer-api/international/2018/wc_matches.csv')

# Returns
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 64 entries, 0 to 63
# Data columns (total 20 columns):
# date           64 non-null object
# league_id      64 non-null int64
# league         64 non-null object
# team1          64 non-null object
# team2          64 non-null object
# spi1           64 non-null float64
# spi2           64 non-null float64
# prob1          64 non-null float64
# prob2          64 non-null float64
# probtie        64 non-null float64
# proj_score1    64 non-null float64
# proj_score2    64 non-null float64
# score1         64 non-null int64
# score2         64 non-null int64
# xg1            64 non-null float64
# xg2            64 non-null float64
# nsxg1          64 non-null float64
# nsxg2          64 non-null float64
# adj_score1     64 non-null float64
# adj_score2     64 non-null float64
# dtypes: float64(13), int64(3), object(4)
# memory usage: 10.1+ KB

Get Unique Values from a Column

To print out all unique values in a specific column, you can use the Pandas unique() method.

For example, if you wanted to see the unique values of the team1 column, you can write:

print(df['team1'].unique())

# Returns:
# ['Russia' 'Egypt' 'Morocco' 'Portugal' 'France' 'Argentina' 'Peru'
 'Croatia' 'Costa Rica' 'Germany' 'Brazil' 'Sweden' 'Belgium' 'Tunisia'
 'Colombia' 'Poland' 'Uruguay' 'Iran' 'Denmark' 'Nigeria' 'Serbia'
 'South Korea' 'England' 'Japan' 'Saudi Arabia' 'Spain' 'Australia'
 'Iceland' 'Mexico' 'Switzerland' 'Senegal' 'Panama']

While this looks like a list, it actually returns a np.array object.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Get Unique Values as a List

To return the unique values as a list, you can combine the list function and the unique method:

unique_list = list(df['team1'].unique())

Similarly, you could use the tolist() function to accomplish the same thing:

unique_list2 = df['team1'].unique().tolist()

Count Unique Values

To get a count of unique values in a certain column, you can combine the unique function with the len function:

unique_list = list(df['team1'].unique())
print(len(unique_list))

# Returns
# 32

Get Unique Values from Multiple Columns

To get unique values from multiple columns, you can use the drop_duplicates function applied to the columns.

Let’s load a new dataframe for this example:

df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5], 'c':[1,2,3,4]})
print(df)

# Returns
#   a  b  c
#0  1  3  1
#1  2  4  2
#2  1  3  3
#3  2  5  4

To get unique values from columns A and B, you can use the drop_duplicates function:

print(df[['a','b']].drop_duplicates())

# Returns
#  a  b
#0  1  3
#1  2  4
#3  2  5

Get Unique Values with Frequencies

To get frequencies of each unique value in a column, you can use the value_counts function.

To get the frequencies for each value in the Date column, you could write:

print(df['date'].value_counts())

# Returns
# 2018-06-16    4
# 2018-06-25    4
# 2018-06-26    4
# 2018-06-27    4
# 2018-06-28    4
# ...

This is a helpful way of understand how often different values appear.

Count Unique Values Per Column

To return a count of unique values per column, you can use the nunique function.

This function returns the number of unique values.

print(df.nunique())

# Returns
# date           25
# league_id       1
# league          1
# team1          32
# ...

Your Free Tips and Tricks eBook is Waiting!

Sign up for my mailing and receive your FREE guide to 31 tips for Pandas!

Get Unique Values without NaN

To ignore NaN values while returning unique values, you can simply chain the dropna function and the unique function.

To try this, let’s load a dataframe that includes NaNs:

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,1,2], 'b':[np.nan,4,3,5], 'c':[1,2,3,4]})

print(df)

# Returns
#    a    b  c
# 0  1  NaN  1
# 1  2  4.0  2
# 2  1  3.0  3
# 3  2  5.0  4

If you were to return unique values for column b, it would include the NaN value:

print(df['b'].unique())

# Returns
# [nan  4.  3.  5.]

Now, to return this without the NaN, you can simply chain the dropna and unique functions together:

print(df['b'].dropna().unique())

# Returns:
# [4. 3. 5.]

Get Unique Values with a Condition

If you want to get unique values where a certain condition is true, you can do this by combining the loc function with the unique function.

Say you were wanting to return teams only starting with S, you can first filter your data using loc and then apply with unique function:

print(df.loc[df['team1'].str[0]=='S', 'team1'].unique())

# Returns:
# ['Sweden' 'Serbia' 'South Korea' 'Saudi Arabia' 'Spain' 'Switzerland', 'Senegal']

Let’s break this down a little:

  1. The dataframe is filtered using loc to only return the team1 column, based on the condition that the first letter (.str[0]) of the team1 column is S.
  2. The unique function is then applied

Conclusion

In this post, we learned all about finding unique values in a Pandas dataframe, including for a single column and across multiple columns. We also covered how to count unique values and provide frequencies for each unique value. Finally, we learned how to apply creative solutions to find unique values with conditions and dropping NaNs from unique counts.

To learn more about the unique function, check out the official documentation here.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!