In this post, you’ll learn how to use the Pandas .replace()
method to replace data in your DataFrame. The Pandas DataFrame.replace()
method can be used to replace a string, values, and even regular expressions (regex) in your DataFrame.
Update for 2023
The Quick Answer:
# Replace a Single Value
df['Age'] = df['Age'].replace(23, 99)
# Replace Multiple Values
df['Age'] = df['Age'].replace([23, 45], [99, 999])
# Also works in the Entire DataFrame
df = df.replace(23, 99)
df = df.replace([23, 45], [99, 999])
# Replace Multiple Values with a Single Value
df['Age'] = df['Age'].replace([23, 45, 35], 99)
# Using a Dictionary (Dict is passed into to_replace=)
df['Age'] = df['Age'].replace({23:99, 45:999})
# Using a Dictionary for Column Replacements (key:value = column:value)
df = df.replace({'Name': 'Jane', 'Age': 45}, 99)
Table of Contents
Pandas Replace Method Syntax
The Pandas .replace()
method takes a number of different parameters. Let’s take a look at them:
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
The list below breaks down what the parameters of the .replace()
method expect and what they represent:
to_replace=
: take a string, list, dictionary, regex, int, float, etc., and describes the values to replacevalue=
: The value to replace withinplace=
: whether to perform the operation in placelimit=
: the maximum size gap to backward or forward fillregex=
: whether to interpret to_replace and/or value as regexmethod=
: the method to use for replacement
Let’s dive into how to use the method, starting by loading a sample Pandas DataFrame.
Loading Sample DataFrame
To start things off, let’s begin by loading a Pandas DataFrame. We’ll keep things simple so it’s easier to follow exactly what we’re replacing.
# Loading a Sample DataFrame
import pandas as pd
df = pd.DataFrame.from_dict({'Name': ['Jane', 'Melissa', 'John', 'Matt'],'Age': [23, 45, 35, 64],'Birth City': ['London', 'Paris', 'Toronto', 'Atlanta'],'Gender': ['F', 'F', 'M', 'M']})
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 London F
# 1 Melissa 45 Paris F
# 2 John 35 Toronto M
# 3 Matt 64 Atlanta M
Let’s now dive into how to use the method, starting by looking at how to replace a single value in a given column.
Replace a Single Value in a Pandas DataFrame Column
Let’s learn how to replace a single value in a Pandas column. In the example below, we’ll look to replace the value Jane
with Joan
. In order to do this, we simply need to pass the value we want to replace into the to_replace=
parameter and the value we want to replace with into the value=
parameter.
# Replace a Single Value with Another Value Using Pandas .replace()
df['Name'] = df['Name'].replace(to_replace='Jane', value='Joan')
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Joan 23 London F
# 1 Melissa 45 Paris F
# 2 John 35 Toronto M
# 3 Matt 64 Atlanta M
In the code block above, we applied the .replace()
method to the column directly, reassigning the column to itself. Because the two parameters are the first and second parameters, positionally, we don’t actually need to name them.
Replace Multiple Values with the Same Value in a Pandas DataFrame
Now, you may want to replace multiple values with the same value. This is also extremely easy to do using the .replace()
method.
Of course, you could simply run the method twice, but there’s a much more efficient way to accomplish this. Here, we’ll look to replace London
and Paris
with Europe
:
# Replace Multiple Values with Another Value Using Pandas .replace()
df['Birth City'] = df['Birth City'].replace(
to_replace=['London', 'Paris'],
value='Europe')
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 Europe F
# 1 Melissa 45 Europe F
# 2 John 35 Toronto M
# 3 Matt 64 Atlanta M
In the code block above, we passed in a list of values into the to_replace=
parameter. This looks for both of the values in the column. Since we only passed in a single value into the value=
parameter, this value is used to replace both the other values.
Now let’s look at how to replace multiple values with different ones in the following section.
Replace Multiple Values with Different Values in a Pandas DataFrame
Like the example above, you can replace a list of multiple values with a list of different ones.
In order to do this, you can pass in a list of values into the to_replace=
parameter as well as a list of equal length into the value=
parameter.
In the example below, we’ll replace London
with England
and Paris
with France
:
# Replace Multiple Values with Different Values Using Pandas .replace()
df['Birth City'] = df['Birth City'].replace(
to_replace=['London', 'Paris'],
value=['England', 'France'])
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 England F
# 1 Melissa 45 France F
# 2 John 35 Toronto M
# 3 Matt 64 Atlanta M
In the following section, we’ll explore how to accomplish this for values across the entire DataFrame, rather than a single column.
Replace Values in the Entire DataFrame
In the previous examples, you learned how to replace values in a single column. Similar to those examples, we can easily replace values in the entire DataFrame.
Let’s take a look at replacing the letter F
with P
in the entire DataFrame:
# Replace Values Across and Entire DataFrame
df = df.replace(
to_replace='M',
value='P')
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 London F
# 1 Melissa 45 Paris F
# 2 John 35 Toronto P
# 3 Matt 64 Atlanta P
In the example above, we applied the .replace()
to the entire DataFrame. We can see that this didn’t return the expected results. In this case, only entire cell values that match the conditions are replaced.
Replacing Values with Regex (Regular Expressions)
In order to replace substrings in a Pandas DataFrame, you can instruct Pandas to use regular expressions (regex). In order to replace substrings (such as in Melissa), we simply pass in regex=True
:
# Replace Values Using Regex
df = df.replace(
to_replace='M',
value='P',
regex=True)
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 London F
# 1 Pelissa 45 Paris F
# 2 John 35 Toronto P
# 3 Patt 64 Atlanta P
Let’s also take a closer look at more complex regular expression replacements.
Using Pandas .replace() With More Complex Regex
We can use regular expressions to make complex replacements.
We’ll cover a fairly simple example, where we replace any four-letter word in the Name
column with “Four letter name”.
The following .replace()
method call does just that:
# Using More Complex Regex with Pandas .replace()
df = df.replace(
to_replace=r'\b\w{4}\b',
value='Four letter name',
regex=True)
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Four letter name 23 London F
# 1 Melissa 45 Paris F
# 2 Four letter name 35 Toronto M
# 3 Four letter name 64 Atlanta M
In the following section, you’ll learn how to replace values in place.
Replace Values In Place with Pandas
We can also replace values in place, rather than having to re-assign them. This is done simply by setting inplace=
to True
.
Let’s revisit an earlier example:
# Replacing Values In Place
df['Birth City'].replace(
to_replace='Paris',
value='France',
inplace=True)
print(df)
# Returns:
# Name Age Birth City Gender
# 0 Jane 23 London F
# 1 Melissa 45 France F
# 2 John 35 Toronto M
# 3 Matt 64 Atlanta M
While this approach does save some memory (as it doesn’t need to create a new object), it’s often better to be consistent with how the rest of your code is formatted.
Using Dictionaries to Replace Values with Pandas replace
The Pandas .replace()
method also allows you to use dictionaries to replace values. This can often be a convenient way of handling many replacements. However, it’s not my preferred approach as the behavior can often be difficult to read.
Let’s take a look at how the method can replace values:
# Using a Dictionary (Dict is passed into to_replace=)
df['Age'] = df['Age'].replace({23:99, 45:999})
# Using a Dictionary for Column Replacements (key:value = column:value)
df = df.replace({'Name': 'Jane', 'Age': 45}, 99)
We can see that the dictionary can be used in two different ways:
- To map values to replace so that the dictionary represents
{original value : new value}
- To map replacements from columns so that it follows the structure shown here:
to_replace={column1: value1, column2: value2}, value=new value
While the first approach is more concise, I would prefer using the Pandas map() method for this approach.
The second method provides more flexibility for using the method across different columns but can be a little harder to read. In these cases, I would personally just call the method twice for different columns.
Conclusion
In this post, you learned how to use the Pandas replace method to, well, replace values in a Pandas DataFrame. The .replace()
method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire DataFrame. The method also incorporates regular expressions to make complex replacements easier.
To learn more about the Pandas .replace()
method, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the resources below: