Skip to content

Pandas replace() – Replace Values in Pandas Dataframe

Pandas replace() - Replace Values in Pandas Dataframe Cover Image

In this post, you’ll learn how to use the Pandas .replace() method to replace data in your DataFrame. The Pandas DataFrame.replace() method can be used to replace a string, values, and even regular expressions (regex) in your DataFrame.

Update for 2023

The entire post has been rewritten in order to make the content clearer and easier to follow. The tutorial now also covers the method= parameter and provides a cheat sheet of how to use the function (found below)

The Quick Answer:

# Replace a Single Value
df['Age'] = df['Age'].replace(23, 99)

# Replace Multiple Values
df['Age'] = df['Age'].replace([23, 45], [99, 999])

# Also works in the Entire DataFrame
df = df.replace(23, 99)
df = df.replace([23, 45], [99, 999])

# Replace Multiple Values with a Single Value
df['Age'] = df['Age'].replace([23, 45, 35], 99)

# Using a Dictionary (Dict is passed into to_replace=)
df['Age'] = df['Age'].replace({23:99, 45:999})

# Using a Dictionary for Column Replacements (key:value = column:value)
df = df.replace({'Name': 'Jane', 'Age': 45}, 99)

Pandas Replace Method Syntax

The Pandas .replace() method takes a number of different parameters. Let’s take a look at them:

DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

The list below breaks down what the parameters of the .replace() method expect and what they represent:

  • to_replace=: take a string, list, dictionary, regex, int, float, etc., and describes the values to replace
  • value=: The value to replace with
  • inplace=: whether to perform the operation in place
  • limit=: the maximum size gap to backward or forward fill
  • regex=: whether to interpret to_replace and/or value as regex
  • method=: the method to use for replacement

Let’s dive into how to use the method, starting by loading a sample Pandas DataFrame.

Loading Sample DataFrame

To start things off, let’s begin by loading a Pandas DataFrame. We’ll keep things simple so it’s easier to follow exactly what we’re replacing.

# Loading a Sample DataFrame
import pandas as pd
df = pd.DataFrame.from_dict({'Name': ['Jane', 'Melissa', 'John', 'Matt'],'Age': [23, 45, 35, 64],'Birth City': ['London', 'Paris', 'Toronto', 'Atlanta'],'Gender': ['F', 'F', 'M', 'M']})
print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23     London      F
# 1  Melissa   45      Paris      F
# 2     John   35    Toronto      M
# 3     Matt   64    Atlanta      M

Let’s now dive into how to use the method, starting by looking at how to replace a single value in a given column.

Replace a Single Value in a Pandas DataFrame Column

Let’s learn how to replace a single value in a Pandas column. In the example below, we’ll look to replace the value Jane with Joan. In order to do this, we simply need to pass the value we want to replace into the to_replace= parameter and the value we want to replace with into the value= parameter.

# Replace a Single Value with Another Value Using Pandas .replace()
df['Name'] = df['Name'].replace(to_replace='Jane', value='Joan')
print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Joan   23     London      F
# 1  Melissa   45      Paris      F
# 2     John   35    Toronto      M
# 3     Matt   64    Atlanta      M

In the code block above, we applied the .replace() method to the column directly, reassigning the column to itself. Because the two parameters are the first and second parameters, positionally, we don’t actually need to name them.

Replace Multiple Values with the Same Value in a Pandas DataFrame

Now, you may want to replace multiple values with the same value. This is also extremely easy to do using the .replace() method.

Of course, you could simply run the method twice, but there’s a much more efficient way to accomplish this. Here, we’ll look to replace London and Paris with Europe:

# Replace Multiple Values with Another Value Using Pandas .replace()
df['Birth City'] = df['Birth City'].replace(
    to_replace=['London', 'Paris'], 
    value='Europe')
print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23     Europe      F
# 1  Melissa   45     Europe      F
# 2     John   35    Toronto      M
# 3     Matt   64    Atlanta      M

In the code block above, we passed in a list of values into the to_replace= parameter. This looks for both of the values in the column. Since we only passed in a single value into the value= parameter, this value is used to replace both the other values.

Now let’s look at how to replace multiple values with different ones in the following section.

Replace Multiple Values with Different Values in a Pandas DataFrame

Like the example above, you can replace a list of multiple values with a list of different ones.

In order to do this, you can pass in a list of values into the to_replace= parameter as well as a list of equal length into the value= parameter.

In the example below, we’ll replace London with England and Paris with France:

# Replace Multiple Values with Different Values Using Pandas .replace()
df['Birth City'] = df['Birth City'].replace(
    to_replace=['London', 'Paris'], 
    value=['England', 'France'])

print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23    England      F
# 1  Melissa   45     France      F
# 2     John   35    Toronto      M
# 3     Matt   64    Atlanta      M

In the following section, we’ll explore how to accomplish this for values across the entire DataFrame, rather than a single column.

Replace Values in the Entire DataFrame

In the previous examples, you learned how to replace values in a single column. Similar to those examples, we can easily replace values in the entire DataFrame.

Let’s take a look at replacing the letter F with P in the entire DataFrame:

# Replace Values Across and Entire DataFrame
df = df.replace(
    to_replace='M', 
    value='P')

print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23     London      F
# 1  Melissa   45      Paris      F
# 2     John   35    Toronto      P
# 3     Matt   64    Atlanta      P

In the example above, we applied the .replace() to the entire DataFrame. We can see that this didn’t return the expected results. In this case, only entire cell values that match the conditions are replaced.

Replacing Values with Regex (Regular Expressions)

In order to replace substrings in a Pandas DataFrame, you can instruct Pandas to use regular expressions (regex). In order to replace substrings (such as in Melissa), we simply pass in regex=True:

# Replace Values Using Regex
df = df.replace(
    to_replace='M', 
    value='P',
    regex=True)

print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23     London      F
# 1  Pelissa   45      Paris      F
# 2     John   35    Toronto      P
# 3     Patt   64    Atlanta      P

Let’s also take a closer look at more complex regular expression replacements.

Using Pandas .replace() With More Complex Regex

We can use regular expressions to make complex replacements.

We’ll cover a fairly simple example, where we replace any four-letter word in the Name column with “Four letter name”.

The following .replace() method call does just that:

# Using More Complex Regex with Pandas .replace()
df = df.replace(
    to_replace=r'\b\w{4}\b', 
    value='Four letter name',
    regex=True)

print(df)

# Returns:
#                Name  Age Birth City Gender
# 0  Four letter name   23     London      F
# 1           Melissa   45      Paris      F
# 2  Four letter name   35    Toronto      M
# 3  Four letter name   64    Atlanta      M

In the following section, you’ll learn how to replace values in place.

Replace Values In Place with Pandas

We can also replace values in place, rather than having to re-assign them. This is done simply by setting inplace= to True.

Let’s revisit an earlier example:

# Replacing Values In Place
df['Birth City'].replace(
    to_replace='Paris', 
    value='France',
    inplace=True)

print(df)

# Returns:
#       Name  Age Birth City Gender
# 0     Jane   23     London      F
# 1  Melissa   45     France      F
# 2     John   35    Toronto      M
# 3     Matt   64    Atlanta      M

While this approach does save some memory (as it doesn’t need to create a new object), it’s often better to be consistent with how the rest of your code is formatted.

Using Dictionaries to Replace Values with Pandas replace

The Pandas .replace() method also allows you to use dictionaries to replace values. This can often be a convenient way of handling many replacements. However, it’s not my preferred approach as the behavior can often be difficult to read.

Let’s take a look at how the method can replace values:

# Using a Dictionary (Dict is passed into to_replace=)
df['Age'] = df['Age'].replace({23:99, 45:999})

# Using a Dictionary for Column Replacements (key:value = column:value)
df = df.replace({'Name': 'Jane', 'Age': 45}, 99)

We can see that the dictionary can be used in two different ways:

  1. To map values to replace so that the dictionary represents {original value : new value}
  2. To map replacements from columns so that it follows the structure shown here: to_replace={column1: value1, column2: value2}, value=new value

While the first approach is more concise, I would prefer using the Pandas map() method for this approach.

The second method provides more flexibility for using the method across different columns but can be a little harder to read. In these cases, I would personally just call the method twice for different columns.

Conclusion

In this post, you learned how to use the Pandas replace method to, well, replace values in a Pandas DataFrame. The .replace() method is extremely powerful and lets you replace values across a single column, multiple columns, and an entire DataFrame. The method also incorporates regular expressions to make complex replacements easier.

To learn more about the Pandas .replace() method, check out the official documentation here.

Additional Resources

To learn more about related topics, check out the resources below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *