Skip to content

Pandas unique(): Get Unique Values in a DataFrame

Pandas unique() Get Unique Values in a DataFrame Cover Image

In this tutorial, you’ll learn how to get unique values in a Pandas DataFrame, including getting unique values for a single column and across multiple columns. Being able to understand how to work with unique values is an important skill for a data scientist or data engineer of any skill level.

By the end of this tutorial, you’ll have learned the following:

  • How to use the Pandas .unique() method to get unique values in a Pandas DataFrame column
  • How to get unique values across multiple columns
  • How to count unique values and generate frequency tables for unique values
  • And more

The Quick Answer: Use Pandas unique()

You can use the Pandas .unique() method to get the unique values in a Pandas DataFrame column. The values are returned in order of appearance and are unsorted.

Take a look at the code block below for how this method works:

# Get Unique Values in a Pandas DataFrame Column
import pandas as pd
df = pd.DataFrame({'Education': ['Graduate','Graduate','Undergraduate','Postgraduate']})
unique_vals = df['Education'].unique()
print(unique_vals)

# Returns: ['Graduate' 'Undergraduate' 'Postgraduate']

If you’d like to learn more, read on! This guide will teach you the ins and outs of working with unique data in a Pandas DataFrame.

Real-World Applications of Unique Data

Let’s dive into some real-world applications of working with unique data and why it matters. Take a look at the sample DataFrame that we’re creating below. We’ll be using this dataset throughout the tutorial.

# Loading a Sample Dataset
import pandas as pd
dataset = {
 'Education Status': ['Graduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Postgraduate','Graduate','Undergraduate','Graduate','Postgraduate','Postgraduate'],
 'Employment Status': ['Employed','employed','Unemployed','Employed','Employed','Unemployed','Employed','Employed','Employed','Employed','Unemployed','Employed','Employed','Employed','Employed'],
 'Gender': ['F','M','M','F','M','F','M','F','M','F','M','F','M','F','F']}

df = pd.DataFrame(dataset)
print(df.head())

# Returns:
#   Education Status Employment Status Gender
# 0         Graduate          Employed      F
# 1         Graduate          employed      M
# 2    Undergraduate        Unemployed      M
# 3     Postgraduate          Employed      F
# 4         Graduate          Employed      M

Understanding unique data within a DataFrame allows you to understand:

  1. The data itself, such as what data are included and what data aren’t
  2. Whether or not data quality issues exist. For example, we can see that the Employment Status column has two capitalizations for the word Employed. Understanding what unique values exist, allows us to better understand if we need to clean our data.

Let’s now dive into how to understand the Pandas .unique() method.

Understanding the Pandas unique() Method

The unique() method in Pandas does not actually have any parameters itself. Instead, it is a Series-level function applied on a DataFrame column without any input parameters. When applied to a specific column of a DataFrame, it returns an array of unique values present in that column.

Here’s a breakdown of how the unique() method works:

  • Select the column on which unique() will be applied by specifying the column name in brackets after the DataFrame name.
  • Call the unique() method without any input parameters or arguments.
  • Obtain an array of unique values found in the selected column.

Let’s take a look at the unique() function using the sample dataset we created earlier.

Get Unique Values for a Pandas DataFrame Column

In order to get the unique values in a Pandas DataFrame column, you can simply apply the .unique() method to the column. The method will return a NumPy array, in the order in which the values appear.

Let’s take a look at how we can get the unique values in the Education Status column:

# Get Unique Values for a Column in Pandas
print(df['Education Status'].unique())

# Returns:
# ['Graduate' 'Undergraduate' 'Postgraduate']

In the example above, we applied the .unique() method to the df['Education Status'] column. This returned the three unique values as a NumPy Array.

Let’s explore how we can return the unique values as a list in the next section.

Get Unique Values for a Pandas Column as a List

By default, the Pandas .unique() method returns a NumPy array of the unique values. In order to return a list instead, we can apply the .tolist() method to the array to convert it to a Python list.

Let’s see what this looks like:

# Get Unique Values for a Column in Pandas as a List
print(df['Education Status'].unique().tolist())

# Returns:
# ['Graduate' 'Undergraduate' 'Postgraduate']

In the example above, we applied the .tolist() method to our NumPy array, converting it to a list.

Let’s now take a look at how we can get unique values for multiple Pandas DataFrame columns.

Get Unique Values for Multiple Pandas DataFrame Columns

By default, the Pandas .unique() method can only be applied to a single column. This is because the method is a Pandas Series method, rather than a DataFrame method.

In order to get the unique values of multiple DataFrame columns, we can use the .drop_duplicates() method. This will return a DataFrame of all of the unique combinations.

Let’s take a look at what this looks like:

# Get Unique Values for Multiple DataFrame Columns
unique = df[['Education Status', 'Gender']].drop_duplicates()
print(unique)

# Returns:
#   Education Status  Gender
# 0         Graduate  Female
# 1         Graduate    Male
# 2    Undergraduate    Male
# 3     Postgraduate  Female
# 5    Undergraduate  Female
# 6     Postgraduate    Male

The Pandas .drop_duplicates() method can be a helpful way to identify only the unique values across two or more columns.

Count Unique Values in a Pandas DataFrame Column

In order to count how many unique values exist in a given DataFrame column (or columns), we can apply the .nunique() method. The method will return a single value if applied to a single column, and a Pandas Series if applied to multiple columns.

Let’s see how we can use the .nunique() method to count how many unique values exist in a column:

# Count Unique Values in a Pandas DataFrame Column
num_statuses = df['Employment Status'].nunique()
print(num_statuses)

# Returns: 3

The nunique method can be incredibly helpful to understand the number of unique values that exist in a column.

Count Occurrences of Unique Values in a Pandas DataFrame Column

In this section, we’ll explore how to count the occurrences of values across unique values. This, in essence, generates a frequency table for the unique values in a DataFrame column.

Let’s see how we can use the .value_counts() method to count occurrences of unique values in a Pandas DataFrame column:

# Count Occurrences of Unique Values in a Pandas DataFrame Column
print(df['Education Status'].value_counts())

# Returns:
# Graduate         6
# Postgraduate     5
# Undergraduate    4
# Name: Education Status, dtype: int64

When we applied the .value_counts() method to our DataFrame column, it returned a series in which each unique value is counted.

Frequently Asked Questions

What is the unique() method in Pandas?

The unique() method is is a Pandas method that is used to find the unique values in a Series object. It can be applied on a specific DataFrame column to return an array of unique values present in that column.

How are NaN values handled by the unique() method?

By default, the unique() method includes NaN values in its output array. In order to exclude missing values, you can first apply the .dropna() method to the column.

How can I sort the unique values of a DataFrame column when using the unique() method?

After using the unique() method to obtain the unique values in a DataFrame column, you can sort the resulting array by employing Python’s built-in sorted() function. This function accepts a sequence (such as the array returned by unique()) and returns a sorted list of elements.

How can I find the total number of unique values in a DataFrame column?

To find the total number of unique values in a DataFrame column, use the nunique() method. It is applied the same way as unique() but returns an integer count of distinct values rather than a list of unique values.

Conclusion

In this tutorial, you learned how to get unique values in a Pandas DataFrame, including getting unique values for a single column and across multiple columns. You first learned how to get the unique values for a single column, as well as for multiple columns. Then, you learned how to count unique values, as well as the occurrences of unique values. To learn more about the .unique() method, check out the official documentation.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

1 thought on “Pandas unique(): Get Unique Values in a DataFrame”

  1. Pingback: VLOOKUP in Python and Pandas using .map() or .merge() • datagy

Leave a Reply

Your email address will not be published. Required fields are marked *