Skip to content

Pandas Isin to Filter a Dataframe like SQL IN and NOT IN

Pandas IsIn Cover Image

Pandas isin makes it easy to emulate the SQL IN and NOT IN operators to filter your dataframe using the Pandas .isin() method. In this post, you’ll learn how the .isin() method works, how to filter a single column, how to filter multiple columns, and how to filter based on conditions not being true.

Loading a Sample Dataframe

To follow along with this tutorial, feel free to load the same dataframe below. Simply copy the code below into your favourite code editor:

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        'Name': ['Joan', 'Devi', 'Melissa', 'Dave'],
        'Age':[19, 43, 27, 32],
        'Gender': ['Female', 'Female', 'Female', 'Male'],
        'Education': ['High School', 'College', 'PhD', 'High School'],
        'City': ['Atlanta', 'Toronto', 'New York City', 'Madrid']
    }
)

print(df)

This returns the following dataframe:

      Name  Age  Gender    Education           City
0     Joan   19  Female  High School        Atlanta
1     Devi   43  Female      College        Toronto
2  Melissa   27  Female          PhD  New York City
3     Dave   32    Male  High School         Madrid

Want to learn more? If you want to learn other ways of filtering a Pandas Dataframe, check out my post on filtering data with Pandas here.

Pandas Isin Syntax

Let’s explore the syntax for the .isin() method before diving into some examples:

DataFrame.isin(values)

The function takes a single parameter values, where you can pass in an iterable, a Series, a DataFrame or a dictionary. Whatever you pass into the values parameter is run against a vectorized boolean expression (meaning it’s fast!) and filters your dataframe.

There are some particular notes to keep in mind when you want to filter your dataframe:

  • If you pass in a dictionary, the keys must match the column names for which you want to filter the data.
  • If you pass in a dataframe, both the columns and the index must match for which you want to filter the data.

Filtering a Single Column with Pandas Isin

Let’s begin filter our dataframe by first focusing on a single column. The .isin() method is very helpful when you’re trying to filter on multiple options.

For example, let’s say you wanted to filter your data to include people with an Education of either College or PhD. To accomplish this, you could write:

df = df[df['Education'].isin(['College', 'PhD'])]

print(df)

This returns the following dataframe:

      Name  Age  Gender Education           City
1     Devi   43  Female   College        Toronto
2  Melissa   27  Female       PhD  New York City

Let’s get a better understanding of what’s actually going on here. Running just df['Education'].isin(['College', 'PhD']) actually returns a boolean array that looks like this:

0    False
1     True
2     True
3    False

That array is then applied to the dataframe df to filter the dataframe more easily. This allows us to apply the filtering in a vectorized format, allowing it to be much, much faster.

Now let’s take a look at filtering on multiple columns with the .isin() method.

Filtering Multiple Columns with Pandas Isin

Filtering on multiple columns adds a little bit of complexity to the equation, but it’s nothing you can’t handle! Let’s look at an example first and then break down what’s going on.

In the code below, we want to filter rows where any of the following conditions are met:

  • Education is college
  • Gender is female
df = df[df[['Education', 'Gender']].isin(['College', 'Female']).any(axis=1)]

print(df)

What we do is we look at the two columns Education and Gender by wrapping them in double-square bracket. We then apply the .isin() method as before. However, we must chain it with the .any() method in order to filter the dataframe down.

This in essence acts as an or operator and returns the following dataframe:

      Name  Age  Gender    Education           City
0     Joan   19  Female  High School        Atlanta
1     Devi   43  Female      College        Toronto
2  Melissa   27  Female          PhD  New York City

If we wanted to apply an and condition, we would simply use chain the .all() method, rather than the .any():

df = df[df[['Education', 'Gender']].isin(['College', 'Female']).all(axis=1)]

print(df)

This returns the following dataframe:

   Name  Age  Gender Education     City
1  Devi   43  Female   College  Toronto

Filtering an Entire Dataframe with Pandas Isin

Similarly, you can use the .isin() method to filter data in an entire dataframe. Rather than passing it onto a series, we pass the method onto the dataframe itself.

Let’s explore this with another example. Here we want to filter down to any of the following criteria:

  • Gender is male
  • Education is college
df = df[df.isin(['College', 'Male']).any(axis=1)]

print(df)

Note, that we’re using the .any() method again to ensure that any of the conditions can be met. If we did not do this, we’d need to match criteria in every column.

This method can be helpful if you don’t want to specify the columns in which your data can be found (or you just don’t know where it is).

This returns the following dataframe:

   Name  Age  Gender    Education     City
1  Devi   43  Female      College  Toronto
3  Dave   32    Male  High School   Madrid

Filtering Using Pandas Isin Not Matching Condition

Similar to comparing the .isin() method to SQL’s IN statement, we can use the Pandas unary operator (~) to perform a NOT IN selection.

Let’s take a look at how we can accomplish this using the Pandas .isin() method on a single column.

Here we want to select any record where education is not equal to ‘College’ or ‘PhD’:

df = df[~df['Education'].isin(['College', 'PhD'])]

print(df)

This returns the following dataframe:

   Name  Age  Gender    Education     City
0  Joan   19  Female  High School  Atlanta
3  Dave   32    Male  High School   Madrid

The same use of the unary operator applies to all other sections of this tutorial as well, in terms of filtering on multiple columns, or the entire dataframe.

Conclusion

In this post, you learned how to emulate the SQL IN and NOT IN operators to filter a dataframe using the .isin() method. You learned how to filter a single column, multiple columns, entire dataframes, and filtering items not matching a condition. If you want to learn more about the .isin() method, check out the official documentation here.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

2 thoughts on “Pandas Isin to Filter a Dataframe like SQL IN and NOT IN”

  1. might want to mention that `df = df[df[[‘Col1’, ‘Col2’]].isin([2, 3]).all(axis=1)]` wont care if 2 is in col1 or col2 andthe same for 3

    import pandas as pd
    df = pd.DataFrame.from_dict(
    {
    ‘Name’: [‘Joan’, ‘Devi’, ‘Melissa’, ‘Dave’],
    ‘Col1’: [1, 2, 3, 4],
    ‘Col2’: [4, 3, 2, 1],
    }
    )
    # print(df)

    df = df[df[[‘Col1’, ‘Col2’]].isin([2, 3]).all(axis=1)]
    print(df)

    this doesnt return the result where col1 is 2 and col2 is 3 per se

    Name Col1 Col2
    1 Devi 2 3
    2 Melissa 3 2

    (just something im stuck with at the moment and ended up on this page trying to find an answer (where multiple columns match diff dataframes))

Leave a Reply

Your email address will not be published. Required fields are marked *