Set Pandas Conditional Column Based on Values of Another Column

Learn how to create a pandas conditional column cover image

There are many times when you may need to set a Pandas column value based on the condition of another column. In this post, you’ll learn all the different ways in which you can create Pandas conditional columns.

Video Tutorial

If you prefer to follow along with a video tutorial, check out my video below:

Loading a Sample Dataframe

Let’s begin by loading a sample Pandas dataframe that we can use throughout this tutorial.

We’ll begin by import pandas and loading a dataframe using the .from_dict() method:

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        'Name': ['Jane', 'Melissa', 'John', 'Matt'],
        'Age': [23, 45, 35, 64],
        'Birth City': ['London', 'Paris', 'Toronto', 'Atlanta'],
        'Gender': ['F', 'F', 'M', 'M']
    }
)

print(df)

This returns the following dataframe:

      Name  Age Birth City Gender
0     Jane   23     London      F
1  Melissa   45      Paris      F
2     John   35    Toronto      M
3     Matt   64    Atlanta      M

Using Pandas loc to Set Pandas Conditional Column

Pandas loc is incredibly powerful! If you need a refresher on loc (or iloc), check out my tutorial here. Pandas’ loc creates a boolean mask, based on a condition. Sometimes, that condition can just be selecting rows and columns, but it can also be used to filter dataframes. These filtered dataframes can then have values applied to them.

Let’s explore the syntax a little bit:

df.loc[df[‘column’] condition, ‘new column name’] = ‘value if condition is met’

With the syntax above, we filter the dataframe using .loc and then assign a value to any row in the column (or columns) where the condition is met.

Let’s try this out by assigning the string ‘Under 30’ to anyone with an age less than 30, and ‘Over 30’ to anyone 30 or older.

df['Age Category'] = 'Over 30'
df.loc[df['Age'] < 30, 'Age Category'] = 'Under 30'

Let's take a look at what we did here:

  1. We assigned the string 'Over 30' to every record in the dataframe. To learn more about this, check out my post here or creating new columns.
  2. We then use .loc to create a boolean mask on the Age column to filter down to rows where the age is less than 30. When this condition is met, the Age Category column is assigned the new value 'Under 30'

But what happens when you have multiple conditions? You could, of course, use .loc multiple times, but this is difficult to read and fairly unpleasant to write. Let's see how we can accomplish this using numpy's .select() method.

Using Numpy Select to Set Values using Multiple Conditions

Similar to the method above to use .loc to create a conditional column in Pandas, we can use the numpy .select() method.

Let's begin by importing numpy and we'll give it the conventional alias np :

import numpy as np

Now, say we wanted to apply a number of different age groups, as below:

  • <20 years old,
  • 20-39 years old,
  • 40-59 years old,
  • 60+ years old

In order to do this, we'll create a list of conditions and corresponding values to fill:

conditions = [
    (df['Age'] < 20),
    (df['Age'] >= 20) & (df['Age'] < 40),
    (df['Age'] >= 40) & (df['Age'] < 59),
    (df['Age'] >= 60)
]

values = ['<20 years old', '20-39 years old', '40-59 years old', '60+ years old']

df['Age Group'] = np.select(conditions, values)

print(df)

Running this returns the following dataframe:

      Name  Age Birth City Gender        Age Group
0     Jane   23     London      F  20-39 years old
1  Melissa   45      Paris      F  40-59 years old
2     John   35    Toronto      M  20-39 years old
3     Matt   64    Atlanta      M    60+ years old

Let's break down what happens here:

  • We first define a list of conditions in which the criteria are specified. Recall that lists are ordered meaning that they should be in the order in which you would like the corresponding values to appear.
  • We then define a list of values to use, which corresponds to the values you'd like applied in your new column.

Something to consider here is that this can be a bit counterintuitive to write. You can similarly define a function to apply different values. We'll cover this off in the section of using the Pandas .apply() method below.

One of the key benefits is that using numpy as is very fast, especially when compared to using the .apply() method.

Using Pandas Map to Set Values in Another Column

The Pandas .map() method is very helpful when you're applying labels to another column. In order to use this method, you define a dictionary to apply to the column.

For our sample dataframe, let's imagine that we have offices in America, Canada, and France. We want to map the cities to their corresponding countries and apply and "Other" value for any other city.

city_dict = {
    'Paris': 'France', 
    'Toronto': 'Canada', 
    'Atlanta': 'USA'
}

df['Country'] = df['Birth City'].map(city_dict)

print(df)

When we print this out, we get the following dataframe returned:

      Name  Age Birth City Gender Country
0     Jane   23     London      F     NaN
1  Melissa   45      Paris      F  France
2     John   35    Toronto      M  Canada
3     Matt   64    Atlanta      M     USA

What we can see here, is that there is a NaN value associated with any City that doesn't have a corresponding country. If we want to apply "Other" to any missing values, we can chain the .fillna() method:

city_dict = {
    'Paris': 'France', 
    'Toronto': 'Canada', 
    'Atlanta': 'USA'
}

df['Country'] = df['Birth City'].map(city_dict).fillna('Other')

print(df)

This returns the following dataframe:

      Name  Age Birth City Gender Country
0     Jane   23     London      F   Other
1  Melissa   45      Paris      F  France
2     John   35    Toronto      M  Canada
3     Matt   64    Atlanta      M     USA

Using Pandas Apply to Apply a function to a column

Finally, you can apply built-in or custom functions to a dataframe using the Pandas .apply() method.

Let's take a look at both applying built-in functions such as len() and even applying custom functions.

Applying Python Built-in Functions to a Column

We can easily apply a built-in function using the .apply() method. Let's see how we can use the len() function to count how long a string of a given column.

df['Name Length'] = df['Name'].apply(len)

print(df)

This returns the following dataframe:

      Name  Age Birth City Gender  Name Length
0     Jane   23     London      F            4
1  Melissa   45      Paris      F            7
2     John   35    Toronto      M            4
3     Matt   64    Atlanta      M            4

Take note of a few things here:

  • We apply the .apply() method to a particular column,
  • We omit the parentheses "()"

Using Third-Party Packages in Pandas Apply

Similarly, you can use functions from using packages. Let's use numpy to apply the .sqrt() method to find the scare root of a person's age.

import numpy as np

df['Age Squareroot'] = df['Age'].apply(np.sqrt)

print(df)

This returns the following dataframe:

      Name  Age Birth City Gender  Age Squareroot
0     Jane   23     London      F        4.795832
1  Melissa   45      Paris      F        6.708204
2     John   35    Toronto      M        5.916080
3     Matt   64    Atlanta      M        8.000000

Using Custom Functions with Pandas Apply

Something that makes the .apply() method extremely powerful is the ability to define and apply your own functions.

Let's revisit how we could use an if-else statement to create age categories as in our earlier example:

def age_groups(x):
    if x < 20:
        return '<20 years old'
    elif x < 40:
        return '20-39 years old'
    elif x < 60:
        return '40-59 years old'
    else:
        return '60+ years old'

df['Age Group'] = df['Age'].apply(age_groups)

print(df)

This returns the following dataframe:

      Name  Age Birth City Gender        Age Group
0     Jane   23     London      F  20-39 years old
1  Melissa   45      Paris      F  40-59 years old
2     John   35    Toronto      M  20-39 years old
3     Matt   64    Atlanta      M    60+ years old

Conclusion

In this post, you learned a number of ways in which you can apply values to a dataframe column to create a Pandas conditional column, including using .loc, .np.select(), Pandas .map() and Pandas .apply(). Each of these methods has a different use case that we explored throughout this post.

Learn more about Pandas methods covered here by checking out their official documentation: