VLOOKUP in Python and Pandas using .map() or .merge()

vlookup in python

In this tutorial, you’ll learn how to use Python and Pandas to vlookup data in a Pandas dataframe. VLOOKUPs are common functions in Excel that allow you to map data from one table to another. In many cases, this can be used to lookup data from a reference table, such as mapping in, say, a town’s region or a client’s gender.

Throughout this tutorial, you’ll learn how to use the Pandas map() and merge() functions that allow you to map in data using a Python dictionary and merge in another Pandas dataframe of reference data. Doing this can have tremendous benefits in your data preparation, especially if you’re working with highly normalized datasets from databases and need to denormalize your data.

Want to learn how to get a file’s extension in Python? This tutorial will teach you how to use the os and pathlib libraries to do just that!

Loading a Sample Dataframe

In order to follow along with this tutorial, feel free to import the dataframe listed below. If you have your own datasets, feel free to use those. However, if you want to follow along line for line, copy the code below and we’ll get started!

# Loading a Sample Pandas Dataframe
import pandas as pd

df = pd.DataFrame.from_dict({
    'Name': ['Nik', 'Kate', 'James', 'Nik', 'Kate', 'James', 'Nik', 'Kate', 'James'],
    'Month': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Gender': ['M', 'F', 'M', 'M', 'F', 'M', 'M', 'F', 'M']
})

print(df.head())

# Returns
#     Name  Month Gender
# 0    Nik      1      M
# 1   Kate      1      F
# 2  James      1      M
# 3    Nik      2      M
# 4   Kate      2      F

What we’ve done here is the following:

  1. We imported pandas using the alias pd, following convention and making it easier to reference the library
  2. We loaded a dataframe, using the .from_dict() method, that allows us to create a dataframe using a Pandas dictionary
  3. We then printed the first five records of the dataframe, using the .head() dataframe method

We can see that by having printed out the first five rows of the Pandas dataframe using the Pandas .head() method, that we have a fairly small dataframe. This is done intentionally to give you as much oversight of the data as possible.

Now that you have your Pandas dataframe loaded, let’s learn how to use the Pandas .map() method to allow you emulate using the VLOOKUP function in Pandas.

Want to learn how to use the Python zip() function to iterate over two lists? This tutorial teaches you exactly what the zip() function does and shows you some creative ways to use the function.

Using the .map() Method to Replicate VLOOKUP

The Pandas .map() method allows us to, well, map values to a Pandas series, or a column in our dataframe. We can map values to a Pandas dataframe column using a dictionary, where the key of our dictionary is the corresponding value in our Pandas column and the dictionary’s value that is the value we want to map into it.

In our dataframe, we have an abbreviated column for a person’s gender, using the values ‘m’ and ‘f’. We can map in a dictionary where the dataframe values for gender are our keys and the new values are dictionary’s values.

Let’s see what this dictionary would look like:

gender_map = {
    'M': 'Male',
    'F': 'Female'
}

If we wanted to be sure that we’re getting all the values in a column, we can first check what all the unique values are in that column. To learn more about how to do this using the Pandas .unique() method, check out my in-depth tutorial here.

Want to learn more about Python list comprehensions? Check out this in-depth tutorial that covers off everything you need to know, with hands-on examples. More of a visual learner, check out my YouTube tutorial here.

Now that we have our dictionary defined, we can proceed with mapping in these values. Let’s see how we can do this using Pandas:

gender_map = {
    'M': 'Male',
    'F': 'Female'
}

df['Gender (Expanded)'] = df['Gender'].map(gender_map)
print(df.head())

# Returns:
#     Name  Month Gender Gender (Expanded)
# 0    Nik      1      M              Male
# 1   Kate      1      F            Female
# 2  James      1      M              Male
# 3    Nik      2      M              Male
# 4   Kate      2      F            Female

Let’s explore what we’ve done here:

  1. We created a new column using direct assignment. To do this, we applied the .map() method to the column that we want to map data onto.
  2. As the only argument, we passed in a dictionary that contained our mapping values.

We can see here that this essentially completed a VLOOKUP using the dictionary.

This method works extremely well and efficiently if the data isn’t stored in another dataframe. However, say you’re working with a relational database (like those covered in our SQL tutorials), and the data exists in another dataframe. Then, instead of generating a dictionary first, you can simply use the .merge() method to join the dataframes together.

Want to learn more about Python for-loops? Check out my in-depth tutorial that takes your from beginner to advanced for-loops user! Want to watch a video instead? Check out my YouTube tutorial here.

Using Pandas’ .merge() Method to Replicate VLOOKUP

There may be many times when you’re working with highly normalized data tables and need to merge them together. Pandas, thankfully, provides an incredibly helpful method, .merge(), that allows us to merge two dataframes together.

In the dataframe we loaded above, we have a column that identifies that month using an integer value. This started at 1 for January and would continue through to 12 for December.

We can create another dataframe that contains the mapping values for our months.

Let’s see how we can do this using Pandas:

months = pd.DataFrame.from_dict({
    'Number': [1,2,3,4,5,6,7,8,9,10,11,12],
    'Months': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
})

print(months.head())

# Returns:
#    Number    Months
# 0       1   January
# 1       2  February
# 2       3     March
# 3       4     April
# 4       5       May

Let’s explore what we’ve done here:

  1. We loaded a new dataframe, months, using the .from_dict() dataframe method
  2. We then printed out the first five records using the .head() method

To merge our two dataframes, lets see how we can use the Pandas merge() function:

df = pd.merge(
    left=df,
    right=months,
    left_on='Month',
    right_on='Number',
    how='left'
)

print(df.head())

# Returns:
#     Name  Month Gender  Number    Months
# 0    Nik      1      M       1   January
# 1   Kate      1      F       1   January
# 2  James      1      M       1   January
# 3    Nik      2      M       2  February
# 4   Kate      2      F       2  February

Remember, a VLOOKUP is essentially a left-join between two tables. This is what we’ve done here, using the pandas merge() function. The function takes a number of helpful arguments:

  • left= and right=: the dataframes to use as your left and right tables in the join
  • left_on= and right_on=: identifies which pandas columns to use to merge the dataframes. If the columns are the same across the dataframes, you can simply use on=
  • how=: how to merge the tables (either a left, right, inner, or outer join)

In the example above, we used a left join to join our tables, thereby emulating a VLOOKUP in Python!

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Conclusion: VLOOKUP in Python and Pandas using .map() or .merge()

Pandas makes it incredibly easy to replicate VLOOKUP style functions. In many ways, they remove a lot of the issues that VLOOKUP has, including not only merging on the left-most column.

In this tutorial, you learned how to use Python and Pandas to emulate the popular Excel VLOOKUP function. The VLOOKUP function creates a left-join between two tables, allowing you to look up values from another table.

You learned how to use the Pandas .map() method to map a dictionary to another Pandas dataframe column. You also learned how to use the Pandas merge() function which allows you to merge two dataframes based on a key or multiple keys.

For more Python tutorials, check out our other tutorials. The official documentation can be found here for .map() and .merge().

Want to learn how to pretty print a JSON file using Python? Learn three different methods to accomplish this using this in-depth tutorial here.