Skip to content

How to Add a New Column to a Pandas DataFrame

How to Add a New Column to a Pandas DataFrame cover image

In this tutorial, you’ll learn how to add a new column to a Pandas DataFrame. The Pandas library provides a helpful way of working with tabular data. One of the most common tasks you’ll encounter is the need to add more data to a Pandas DataFrame.

By the end of this tutorial, you’ll have learned:

  • How to add a new column to a Pandas DataFrame
  • How to create a new column of constant values in a Pandas DataFrame
  • How to create a new column based on the values of another column

Creating a Sample Pandas DataFrame

To follow along with this tutorial, you can copy and paste the code below into your favorite code editor. If you have your own dataset, feel free to use that, though your results will vary. Let’s take a look at our dataset:

# Creating a Sample Pandas DataFrame
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

print(df)

# Returns:
#       Name     Location  Amount
# 0     Jane      Toronto   99.99
# 1    Mitch     New York  123.12
# 2     Alex  Los Angeles  150.23
# 3     Evan    Vancouver   52.34
# 4  Melissa      Seattle   12.34

In the DataFrame above, we have three columns: ['Name', 'Location', 'Amount']. Now that we have a DataFrame, let’s get started with adding new columns!

How to Add a Column to a Pandas DataFrame with a Constant Value

In this section, you’ll learn how to add a column to a Pandas DataFrame that contains a constant value. The simplest way to do this is to directly assign a value to a new column. This assigns the value to every record in the DataFrame’s column.

Let’s see what this looks like:

# Adding a Constant Value to a Pandas DataFrame
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

df['Company'] = 'datagy'

print(df)

# Returns:
#       Name     Location  Amount Company
# 0     Jane      Toronto   99.99  datagy
# 1    Mitch     New York  123.12  datagy
# 2     Alex  Los Angeles  150.23  datagy
# 3     Evan    Vancouver   52.34  datagy
# 4  Melissa      Seattle   12.34  datagy

In the code block above, we assigned a single value (in this case, the string 'datagy') to an entire DataFrame column.

Adding a single, constant value to a Pandas DataFrame is often not the most common activity, since the information is often redundant. In the following section, you’ll learn how to add a column to a Pandas from a list of values.

How to Add a Column to a Pandas DataFrame From a List

A simple way to add a new column to a Pandas DataFrame is to assign a list to a new column. This allows you to directly assign a new column based on existing or new data.

Let’s take a look at how to add a new column from a list:

# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

df['Country'] = ['Canada', 'USA', 'USA', 'Canada', 'USA']
print(df)

# Returns:
#       Name     Location  Amount Country
# 0     Jane      Toronto   99.99  Canada
# 1    Mitch     New York  123.12     USA
# 2     Alex  Los Angeles  150.23     USA
# 3     Evan    Vancouver   52.34  Canada
# 4  Melissa      Seattle   12.34     USA

In the code above, we assigned a list to a new Pandas DataFrame column. It’s important to note here that the length of the list must match the number of records in the DataFrame exactly. Without this, Pandas will raise a ValueError, indicating that the lengths do not match.

How to Add a Column to a Pandas DataFrame From a Dictionary

A simple way to add a new column to a Pandas DataFrame based on other columns is to map in a dictionary. This allows you to easily replicate a VLOOKUP in Pandas. This method is particularly helpful when you have a set number of items that correspond with other categories.

Let’s see how we can make in countries based on the city that a person is from:

# Add a Column Based on a Dictionary
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

df['Country'] = df['Location'].map({'Toronto':'Canada', 'New York': 'USA', 'Los Angeles': 'USA', 'Vancouver': 'Canada', 'Seattle': 'USA'})

print(df)

# Returns:
#       Name     Location  Amount Country
# 0     Jane      Toronto   99.99  Canada
# 1    Mitch     New York  123.12     USA
# 2     Alex  Los Angeles  150.23     USA
# 3     Evan    Vancouver   52.34  Canada
# 4  Melissa      Seattle   12.34     USA

In the code block above, we used the map() method to map in a dictionary of values. We applied the method directly to another column, where the dictionary searches for the key and returns the corresponding value.

In the following section, you’ll learn how to add multiple columns to a Pandas DataFrame.

How to Add Multiple Columns to a Pandas DataFrame

In many cases you will want to add multiple columns to a Pandas DataFrame. Any of the methods above will work. For example, you can assign two columns by passing in two lists of data.

Let’s see how we can use a list of lists to to create two columns in Pandas:

# Creating Two Columns Using Pandas
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

df['Sample'], df['Sample2'] = [[1,2,3,4,5], [6,7,8,9,0]]

print(df)

# Returns:
#       Name     Location  Amount  Sample  Sample2
# 0     Jane      Toronto   99.99       1        6
# 1    Mitch     New York  123.12       2        7
# 2     Alex  Los Angeles  150.23       3        8
# 3     Evan    Vancouver   52.34       4        9
# 4  Melissa      Seattle   12.34       5        0

Let’s break down what we did above:

  1. We assigned two columns, df['Sample'] and df['Sample2']
  2. We passed in a list of lists, each containing five values

How to Add a New Column Derivative of Another Column of a Pandas DataFrame

In this section, you’ll learn how to add a new column derivative of another column. This allows you to add a new that is calculated based on the values of another column. For example, you can multiply the values in one column to calculate a new column. In the example below, you’ll learn how to add sales tax to a column based on one column:

# Calculating a New Column in Pandas
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Amount': [99.99, 123.12, 150.23, 52.34, 12.34]})

df['With Tax'] = df['Amount'] * 1.13

print(df)

# Returns:
#       Name     Location  Amount  With Tax
# 0     Jane      Toronto   99.99  112.9887
# 1    Mitch     New York  123.12  139.1256
# 2     Alex  Los Angeles  150.23  169.7599
# 3     Evan    Vancouver   52.34   59.1442
# 4  Melissa      Seattle   12.34   13.9442

How to Add a New Column to a Pandas DataFrame by Merging From Another DataFrame

In this final section, you’ll learn how to add a new column in a Pandas DataFrame by merging from another DataFrame. This can be helpful when working with relational data from a database, such as data that you download from a SQL database.

The Pandas merge() function allows you to emulate comprehensive merging, including different merge types. Let’s take a look at how to add a new column by merging two DataFrames:

# Add a New Pandas Column by Merging Two DataFrames
import pandas as pd

df = pd.DataFrame({
    'Name': ['Jane', 'Mitch', 'Alex', 'Evan', 'Melissa'],
    'Location': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle']})

df_locations = pd.DataFrame({
    'City': ['Toronto', 'New York', 'Los Angeles', 'Vancouver', 'Seattle'],
    'Country': ['Canada', 'USA', 'USA', 'Canada', 'USA']})

df = pd.merge(
    left=df,
    right=df_locations,
    left_on='Location',
    right_on='City',
    how='left'
).drop(columns='City')

print(df)

# Returns:
#       Name     Location Country
# 0     Jane      Toronto  Canada
# 1    Mitch     New York     USA
# 2     Alex  Los Angeles     USA
# 3     Evan    Vancouver  Canada
# 4  Melissa      Seattle     USA

Let’s break down what we did in the code above:

  1. We loaded two DataFrames, one which we’ll merge into another
  2. We then used the Pandas merge() function
  3. Finally, we used the .drop() method to drop the duplicate column that’s brought in

Conclusion

In this tutorial, you learned how to use Pandas to add a new DataFrame column. You first learned how to directly assign a constant value. Then, you learned how to add different values based on values in a list or from a dictionary. Then, you learned how to add multiple columns to a Pandas DataFrame at once. Then, you learned how add columns derivative of another column. Finally, you learned how to merge two DataFrames to add a column to a DataFrame.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published.