In this tutorial, you’ll learn how to add a new column to a Pandas DataFrame. The Pandas library provides a helpful way of working with tabular data. One of the most common tasks you’ll encounter is the need to add more data to a Pandas DataFrame.
By the end of this tutorial, you’ll have learned:
- How to add a new column to a Pandas DataFrame
- How to create a new column of constant values in a Pandas DataFrame
- How to create a new column based on the values of another column
Table of Contents
Creating a Sample Pandas DataFrame
To follow along with this tutorial, you can copy and paste the code below into your favorite code editor. If you have your own dataset, feel free to use that, though your results will vary. Let’s take a look at our dataset:
# Creating a Sample Pandas DataFrame
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
print(df)
# Returns:
# Name Location Amount
# 0 Jane Toronto 99.99
# 1 Mitch New York 123.12
# 2 Alex Los Angeles 150.23
# 3 Evan Vancouver 52.34
# 4 Melissa Seattle 12.34
In the DataFrame above, we have three columns: ['Name', 'Location', 'Amount']
. Now that we have a DataFrame, let’s get started with adding new columns!
How to Add a Column to a Pandas DataFrame with a Constant Value
In this section, you’ll learn how to add a column to a Pandas DataFrame that contains a constant value. The simplest way to do this is to directly assign a value to a new column. This assigns the value to every record in the DataFrame’s column.
Let’s see what this looks like:
# Adding a Constant Value to a Pandas DataFrame
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
df['Company'] = 'datagy'
print(df)
# Returns:
# Name Location Amount Company
# 0 Jane Toronto 99.99 datagy
# 1 Mitch New York 123.12 datagy
# 2 Alex Los Angeles 150.23 datagy
# 3 Evan Vancouver 52.34 datagy
# 4 Melissa Seattle 12.34 datagy
In the code block above, we assigned a single value (in this case, the string 'datagy'
) to an entire DataFrame column.
Adding a single, constant value to a Pandas DataFrame is often not the most common activity, since the information is often redundant. In the following section, you’ll learn how to add a column to a Pandas from a list of values.
How to Add a Column to a Pandas DataFrame From a List
A simple way to add a new column to a Pandas DataFrame is to assign a list to a new column. This allows you to directly assign a new column based on existing or new data.
Let’s take a look at how to add a new column from a list:
# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
df['Country'] = ['Canada', 'USA', 'USA', 'Canada', 'USA']
print(df)
# Returns:
# Name Location Amount Country
# 0 Jane Toronto 99.99 Canada
# 1 Mitch New York 123.12 USA
# 2 Alex Los Angeles 150.23 USA
# 3 Evan Vancouver 52.34 Canada
# 4 Melissa Seattle 12.34 USA
In the code above, we assigned a list to a new Pandas DataFrame column. It’s important to note here that the length of the list must match the number of records in the DataFrame exactly. Without this, Pandas will raise a ValueError
, indicating that the lengths do not match.
How to Add a Column to a Pandas DataFrame From a Dictionary
A simple way to add a new column to a Pandas DataFrame based on other columns is to map in a dictionary. This allows you to easily replicate a VLOOKUP in Pandas. This method is particularly helpful when you have a set number of items that correspond with other categories.
Let’s see how we can make in countries based on the city that a person is from:
# Add a Column Based on a Dictionary
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
df['Country'] = df['Location'].map({'Toronto':'Canada', 'New York': 'USA', 'Los Angeles': 'USA', 'Vancouver': 'Canada', 'Seattle': 'USA'})
print(df)
# Returns:
# Name Location Amount Country
# 0 Jane Toronto 99.99 Canada
# 1 Mitch New York 123.12 USA
# 2 Alex Los Angeles 150.23 USA
# 3 Evan Vancouver 52.34 Canada
# 4 Melissa Seattle 12.34 USA
In the code block above, we used the map()
method to map in a dictionary of values. We applied the method directly to another column, where the dictionary searches for the key and returns the corresponding value.
In the following section, you’ll learn how to add multiple columns to a Pandas DataFrame.
How to Add Multiple Columns to a Pandas DataFrame
In many cases you will want to add multiple columns to a Pandas DataFrame. Any of the methods above will work. For example, you can assign two columns by passing in two lists of data.
Let’s see how we can use a list of lists to to create two columns in Pandas:
# Creating Two Columns Using Pandas
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
df['Sample'], df['Sample2'] = [[1,2,3,4,5], [6,7,8,9,0]]
print(df)
# Returns:
# Name Location Amount Sample Sample2
# 0 Jane Toronto 99.99 1 6
# 1 Mitch New York 123.12 2 7
# 2 Alex Los Angeles 150.23 3 8
# 3 Evan Vancouver 52.34 4 9
# 4 Melissa Seattle 12.34 5 0
Let’s break down what we did above:
- We assigned two columns,
df['Sample']
anddf['Sample2']
- We passed in a list of lists, each containing five values
How to Add a New Column Derivative of Another Column of a Pandas DataFrame
In this section, you’ll learn how to add a new column derivative of another column. This allows you to add a new that is calculated based on the values of another column. For example, you can multiply the values in one column to calculate a new column. In the example below, you’ll learn how to add sales tax to a column based on one column:
# Calculating a New Column in Pandas
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})
df['With Tax'] = df['Amount'] * 1.13
print(df)
# Returns:
# Name Location Amount With Tax
# 0 Jane Toronto 99.99 112.9887
# 1 Mitch New York 123.12 139.1256
# 2 Alex Los Angeles 150.23 169.7599
# 3 Evan Vancouver 52.34 59.1442
# 4 Melissa Seattle 12.34 13.9442
How to Add a New Column to a Pandas DataFrame by Merging From Another DataFrame
In this final section, you’ll learn how to add a new column in a Pandas DataFrame by merging from another DataFrame. This can be helpful when working with relational data from a database, such as data that you download from a SQL database.
The Pandas merge()
function allows you to emulate comprehensive merging, including different merge types. Let’s take a look at how to add a new column by merging two DataFrames:
# Add a New Pandas Column by Merging Two DataFrames
import pandas as pd
df = pd.DataFrame({
'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle']})
df_locations = pd.DataFrame({
'City': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
'Country': ['Canada', 'USA', 'USA', 'Canada', 'USA']})
df = pd.merge(
left=df,
right=df_locations,
left_on='Location',
right_on='City',
how='left'
).drop(columns='City')
print(df)
# Returns:
# Name Location Country
# 0 Jane Toronto Canada
# 1 Mitch New York USA
# 2 Alex Los Angeles USA
# 3 Evan Vancouver Canada
# 4 Melissa Seattle USA
Let’s break down what we did in the code above:
- We loaded two DataFrames, one which we’ll merge into another
- We then used the Pandas
merge()
function - Finally, we used the
.drop()
method to drop the duplicate column that’s brought in
Conclusion
In this tutorial, you learned how to use Pandas to add a new DataFrame column. You first learned how to directly assign a constant value. Then, you learned how to add different values based on values in a list or from a dictionary. Then, you learned how to add multiple columns to a Pandas DataFrame at once. Then, you learned how add columns derivative of another column. Finally, you learned how to merge two DataFrames to add a column to a DataFrame.
Additional Resources
To learn more about related topics, check out the tutorials below: