Create New Columns in Pandas

  • by
Create New Columns in Pandas Cover Image
  • Save

Pandas is one of the quintessential libraries for data science in Python. A useful skill is the ability to create new columns, either by adding your own data or calculating data based on existing data.

Video Tutorial

Table of Contents

Loading Dataset

Let’s start off the tutorial by loading the dataset we’ll use throughout the tutorial. We can use the pd.DataFrame.from_dict() function to load a dictionary. We can then print out the dataframe to see what it looks like:

import pandas as pd

df = pd.DataFrame.from_dict({'First Name': ['Dolores', 'Maeve', 'Robert', 'Charlotte'],
      'Last Name': ['Abernathy', 'Millay', 'Ford', 'Hale'],
      'Age': [31, 40, 60, 35],
      'Height (cm)': [170, 165, 178, 162]})

print(df)

This returns the following:

	First Name	Last Name	Age	Height (cm)
0	Dolores	        Abernathy	31	170
1	Maeve	        Millay	        40	165
2	Robert	        Ford	        60	178
3	Charlotte	Hale	        35	162

Assign a Custom Value to a Column in Pandas

In order to create a new column where every value is the same value, this can be directly applied.

For example, if we wanted to add a column for what show each record is from (Westworld), then we can simply write:

df['Show'] = 'Westworld'
print(df)

This returns the following:

	First Name	Last Name	Age	Height (cm)	Show
0	Dolores	        Abernathy	31	170	        Westworld
1	Maeve	        Millay	        40	165	        Westworld
2	Robert	        Ford	        60	178	        Westworld
3	Charlotte	Hale	        35	162	        Westworld

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Assign Multiple Values to a Column in Pandas

Say you wanted to assign specific values to a new column, you can pass in a list of values directly into a new column.

Some important things to note here:

  • The order matters – the order of the items in your list will match the index of the dataframe, and
  • The length of the list must match the length of the dataframe.

To demonstrate this, let’s add a column with random numbers:

df['Number'] = [1, 2, 3, 4]
print(df)

This returns the following:


        First Name	Last Name	Age	Height (cm)	Number
0	Dolores	        Abernathy	31	170	        1
1	Maeve	        Millay	        40	165	        2
2	Robert	        Ford	        60	178	        3
3	Charlotte	Hale	        35	162	        4

Calculate a New Column in Pandas

It’s also possible to apply mathematical operations to columns in Pandas. This is done by assign the column to a mathematical operation.

As an example, let’s calculate how many inches each person is tall. This is done by dividing the height in centimeters by 2.54:

df['Height (inches)'] = df['Height (cm)'] / 2.54
print(df)

This returns the following:

	First Name	Last Name	Age	Height (cm)	Height (inches)
0	Dolores	        Abernathy	31	170	        66.929134
1	Maeve	        Millay	        40	165	        64.960630
2	Robert	        Ford	        60	178	        70.078740
3	Charlotte	Hale	        35	162	        63.779528

Add or Subtract Columns in Pandas

Similar to calculating a new column in Pandas, you can add or subtract (or multiple and divide) columns in Pandas.

If we wanted to add and subtract the Age and Number columns we can write:

df['Add'] = df['Age'] + df['Number']
df['Subtract'] = df['Age'] - df['Number']
print(df)

This returns:

  First Name  Last Name  Age  Height (cm)  Number  Add  Subtract
0    Dolores  Abernathy   31          170       1   32        30
1      Maeve     Millay   40          165       2   42        38
2     Robert       Ford   60          178       3   63        57
3  Charlotte       Hale   35          162       4   39        31

Combine String Columns in Pandas

There may be many times when you want to combine different columns that contain strings. For example, the columns for First Name and Last Name can be combined to create a new column called “Name”.

This can be done by writing the following:

df['Name'] = df['First Name'] + ' ' + df['Last Name']
print(df)

This returns the following:

  First Name  Last Name  Age  Height (cm)               Name
0    Dolores  Abernathy   31          170  Dolores Abernathy
1      Maeve     Millay   40          165       Maeve Millay
2     Robert       Ford   60          178        Robert Ford
3  Charlotte       Hale   35          162     Charlotte Hale

Split String Columns in Pandas

Similar to joining two string columns, a string column can also be split. If we wanted to split the Name column into two columns we can use the str.split() function and assign the result to two columns directly.

We can do this by writing:

df[['First Name New', 'Last Name New']] = df['Name'].str.split(' ', expand=True)
print(df)

This returns:

  First Name  Last Name  Age  ...               Name First Name New Last Name New
0    Dolores  Abernathy   31  ...  Dolores Abernathy        Dolores     Abernathy
1      Maeve     Millay   40  ...       Maeve Millay          Maeve        Millay
2     Robert       Ford   60  ...        Robert Ford         Robert          Ford
3  Charlotte       Hale   35  ...     Charlotte Hale      Charlotte          Hale

It’s important to note a few things here:

  • We immediate assign two columns using double square brackets,
  • The data is delimited by a space, and
  • We assign True to the expand argument.

Conclusion

In this post, you learned many different ways of creating columns in Pandas. This can be done by directly inserting data, applying mathematical operations to columns, and by working with strings.

To learn more about string operations like split, check out the official documentation here.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!