In this tutorial, you’ll learn how to use Python and Pandas to VLOOKUP data in a Pandas DataFrame. VLOOKUPs are common functions in Excel that allow you to map data from one table to another. In many cases, this can be used to lookup data from a reference table, such as mapping in, say, a town’s region or a client’s gender.
Throughout this tutorial, you’ll learn how to use the Pandas map()
and merge()
functions that allow you to map in data using a Python dictionary and merge in another Pandas DataFrame of reference data. Doing this can have tremendous benefits in your data preparation, especially if you’re working with highly normalized datasets from databases and need to denormalize your data.
Table of Contents
Loading a Sample Dataframe
In order to follow along with this tutorial, feel free to import the DataFrame listed below. If you have your own datasets, feel free to use those. However, if you want to follow along line-by-line, copy the code below and we’ll get started!
# Loading a Sample Pandas Dataframe
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Nik', 'Kate', 'James', 'Nik', 'Kate', 'James', 'Nik', 'Kate', 'James'],
'Month': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Gender': ['M', 'F', 'M', 'M', 'F', 'M', 'M', 'F', 'M']
})
print(df.head())
# Returns
# Name Month Gender
# 0 Nik 1 M
# 1 Kate 1 F
# 2 James 1 M
# 3 Nik 2 M
# 4 Kate 2 F
What we’ve done here is the following:
- We imported pandas using the alias
pd
, following convention and making it easier to reference the library - We loaded a dataframe, using the
.from_dict()
method, that allows us to create a dataframe using a Pandas dictionary - We then printed the first five records of the dataframe, using the
.head()
dataframe method
We can see that by having printed out the first five rows of the Pandas DataFrame using the Pandas .head()
method, that we have a fairly small DataFrame. This is done intentionally to give you as much oversight of the data as possible.
Now that you have your Pandas DataFrame loaded, let’s learn how to use the Pandas .map() method to allow you to emulate using the VLOOKUP function in Pandas.
Using the .map() Method to Replicate VLOOKUP
The Pandas .map()
method allows us to, well, map values to a Pandas series, or a column in our DataFrame. We can map values to a Pandas DataFrame column using a dictionary, where the key of our dictionary is the corresponding value in our Pandas column and the dictionary’s value that is the value we want to map into it.
In our DataFrame, we have an abbreviated column for a person’s gender, using the values ‘m’ and ‘f’. We can map in a dictionary where the DataFrame values for gender are our keys and the new values are dictionary’s values.
Let’s see what this dictionary would look like:
gender_map = {
'M': 'Male',
'F': 'Female'
}
If we wanted to be sure that we’re getting all the values in a column, we can first check what all the unique values are in that column. The Pandas .unique() method allows you to easily get all of the unique values in a DataFrame column.
Now that we have our dictionary defined, we can proceed with mapping these values. Let’s see how we can do this using Pandas:
gender_map = {
'M': 'Male',
'F': 'Female'
}
df['Gender (Expanded)'] = df['Gender'].map(gender_map)
print(df.head())
# Returns:
# Name Month Gender Gender (Expanded)
# 0 Nik 1 M Male
# 1 Kate 1 F Female
# 2 James 1 M Male
# 3 Nik 2 M Male
# 4 Kate 2 F Female
Let’s explore what we’ve done here:
- We created a new column using direct assignment. To do this, we applied the
.map()
method to the column that we want to map data onto. - As the only argument, we passed in a dictionary that contained our mapping values.
We can see here that this essentially completed a VLOOKUP using the dictionary.
This method works extremely well and efficiently if the data isn’t stored in another DataFrame. However, say you’re working with a relational database (like those covered in our SQL tutorials), and the data exists in another DataFrame. Then, instead of generating a dictionary first, you can simply use the .merge() method to join the DataFrames together.
Using Pandas’ .merge() Method to Replicate VLOOKUP
There may be many times when you’re working with highly normalized data tables and need to merge them together. Pandas, thankfully, provides an incredibly helpful method, .merge()
, that allows us to merge two DataFrames together.
In the DataFrame we loaded above, we have a column that identifies that month using an integer value. This started at 1 for January and would continue through to 12 for December.
We can create another DataFrame that contains the mapping values for our months.
Let’s see how we can do this using Pandas:
months = pd.DataFrame.from_dict({
'Number': [1,2,3,4,5,6,7,8,9,10,11,12],
'Months': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
})
print(months.head())
# Returns:
# Number Months
# 0 1 January
# 1 2 February
# 2 3 March
# 3 4 April
# 4 5 May
Let’s explore what we’ve done here:
- We loaded a new DataFrame,
months
, using the.from_dict()
DataFrame method - We then printed out the first five records using the
.head()
method
To merge our two DataFrames, let’s see how we can use the Pandas merge()
function:
df = pd.merge(
left=df,
right=months,
left_on='Month',
right_on='Number',
how='left'
)
print(df.head())
# Returns:
# Name Month Gender Number Months
# 0 Nik 1 M 1 January
# 1 Kate 1 F 1 January
# 2 James 1 M 1 January
# 3 Nik 2 M 2 February
# 4 Kate 2 F 2 February
Remember, a VLOOKUP is essentially a left-join between two tables. This is what we’ve done here, using the pandas merge()
function. The function takes a number of helpful arguments:
left=
andright=
: the DataFrames to use as your left and right tables in the joinleft_on=
andright_on=
: identifies which pandas columns to use to merge the DataFrames. If the columns are the same across the DataFrames, you can simply useon=
how=
: how to merge the tables (either a left, right, inner, or outer join)
In the example above, we used a left join to join our tables, thereby emulating a VLOOKUP in Python!
Conclusion: VLOOKUP in Python and Pandas using .map() or .merge()
Pandas make it incredibly easy to replicate VLOOKUP style functions. In many ways, they remove a lot of the issues that VLOOKUP has, including not only merging on the left-most column.
In this tutorial, you learned how to use Python and Pandas to emulate the popular Excel VLOOKUP function. The VLOOKUP function creates a left-join between two tables, allowing you to lookup values from another table.
You learned how to use the Pandas .map()
method to map a dictionary to another Pandas DataFrame column. You also learned how to use the Pandas merge()
function which allows you to merge two DataFrames based on a key or multiple keys.
Additional Resources
To learn more about related topics, check out the tutorials below:
- Combine Data in Pandas with merge, join, and concat
- Python Merge Dictionaries – Combine Dictionaries (7 Ways)
- Python: Combine Lists – Merge Lists (8 Ways)
The official documentation can be found here for .map() and .merge().
Pingback: Transforming Pandas Columns with map and apply • datagy