In this post, you’ll learn how to use the Pandas .replace()
method to replace data in your dataframe. The Pandas dataframe.replace() function can be used to replace a string, values, and even regular expressions (regex) in your dataframe. It’s an immensely powerful function – so let’s dive right in!
Loading Sample Dataframe
To start things off, let’s begin by loading a Pandas dataframe. We’ll keep things simple so it’s easier to follow exactly what we’re replacing.
import pandas as pd
df = pd.DataFrame.from_dict(
{
'Name': ['Jane', 'Melissa', 'John', 'Matt'],
'Age': [23, 45, 35, 64],
'Birth City': ['London', 'Paris', 'Toronto', 'Atlanta'],
'Gender': ['F', 'F', 'M', 'M']
}
)
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Jane 23 London F
1 Melissa 45 Paris F
2 John 35 Toronto M
3 Matt 64 Atlanta M
Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!
Pandas Replace Method Syntax
The Pandas .replace()
method takes a number of different parameters. Let’s take a look at them:
DataFrame.replace(
to_replace=None,
value=None,
inplace=False,
limit=None,
regex=False,
method='pad')
Let’s take a closer look at what these actually mean:
- to_replace: take a string, list, dictionary, regex, int, float, etc. and describes the values to replace
- value: The value to replace with
- inplace: whether to perform the operation in place
- limit: the maximum size gap to backward or forward fill
- regex: whether to interpret to_replace and/or value as regex
- method: the method to use for replacement
Replace a Single Value in Pandas
Let’s learn how to replace a single value in a Pandas column.
In the example below, we’ll look to replace the value Jane
with Joan
:
df['Name'] = df['Name'].replace(to_replace='Jane', value='Joan')
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Joan 23 London F
1 Melissa 45 Paris F
2 John 35 Toronto M
3 Matt 64 Atlanta M
Replace Multiple Values with the Same Value in Pandas
Now, you may want to replace multiple values with the same value. This is also extremely easy to do using the .replace()
method.
Of course, you could simply run the method twice, but there’s a much more efficient way to accomplish this. Here, we’ll look to replace London
and Paris
with Europe
:
df['Birth City'] = df['Birth City'].replace(
to_replace=['London', 'Paris'],
value='Europe')
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Jane 23 Europe F
1 Melissa 45 Europe F
2 John 35 Toronto M
3 Matt 64 Atlanta M
Now let’s take a look at how to replace multiple values with different values.
Replace Multiple Values with Different Values in Pandas
Similar to the example above, you can replace a list of multiple values with a list of different values.
This is as easy as loading in a list into each of the to_replace
and values
parameters. It’s important to note that the lists must be the same length.
In the example below, we’ll replace London
with England
and Paris
with France
:
df['Birth City'] = df['Birth City'].replace(
to_replace=['London', 'Paris'],
value=['England', 'France'])
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Jane 23 England F
1 Melissa 45 France F
2 John 35 Toronto M
3 Matt 64 Atlanta M
Replace Values in the Entire Dataframe
In the previous examples, you learned how to replace values in a single column. Similar to those examples, we can easily replace values in the entire dataframe.
Let’s take a look at replacing the letter F
with P
in the entire dataframe:
df = df.replace(
to_replace='M',
value='P')
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Jane 23 London F
1 Melissa 45 Paris F
2 John 35 Toronto P
3 Matt 64 Atlanta P
We can see that this didn’t return the expected results.
In order to replace substrings (such as in Melissa), we simply pass in regex=True
:
df = df.replace(
to_replace='M',
value='P',
regex=True)
print(df)
This returns the expected dataframe:
Name Age Birth City Gender
0 Jane 23 London F
1 Pelissa 45 Paris F
2 John 35 Toronto P
3 Patt 64 Atlanta P
Finally, let’s take a closer look at more complex regular expression replacements.
Replacing Values with Regex (Regular Expressions)
We can use regular expressions to make complex replacements.
We’ll cover off a fairly simple example, where we replace any four letter word in the Name
column with “Four letter name”.
The following .replace()
method call does just that:
df = df.replace(
to_replace=r'\b\w{4}\b',
value='Four letter name',
regex=True)
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Four letter name 23 London F
1 Melissa 45 Paris F
2 Four letter name 35 Toronto M
3 Four letter name 64 Atlanta M
Replace Values In Place with Pandas
We can also replace values inplace, rather than having to re-assign them. This is done simply by setting inplace=
to True
.
Let’s re-visit an earlier example:
df['Birth City'].replace(
to_replace='Paris',
value='France',
inplace=True)
print(df)
This returns the following dataframe:
Name Age Birth City Gender
0 Jane 23 London F
1 Melissa 45 France F
2 John 35 Toronto M
3 Matt 64 Atlanta M
Conclusion
In this post, you learned how to use the Pandas replace method to, well, replace values in a Pandas dataframe. The .replace()
method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire dataframe. The method also incorporates regular expressions to make complex replacements easier.
To learn more about the Pandas .replace()
method, check out the official documentation here.