Pandas is one of the quintessential libraries for data science in Python. A useful skill is the ability to create new columns, either by adding your own data or calculating data based on existing data.
Table of Contents
Video Tutorial
Loading Dataset
Let’s start off the tutorial by loading the dataset we’ll use throughout the tutorial. We can use the pd.DataFrame.from_dict() function to load a dictionary. We can then print out the dataframe to see what it looks like:
import pandas as pd
df = pd.DataFrame.from_dict({'First Name': ['Dolores', 'Maeve', 'Robert', 'Charlotte'],
'Last Name': ['Abernathy', 'Millay', 'Ford', 'Hale'],
'Age': [31, 40, 60, 35],
'Height (cm)': [170, 165, 178, 162]})
print(df)
This returns the following:
First Name Last Name Age Height (cm)
0 Dolores Abernathy 31 170
1 Maeve Millay 40 165
2 Robert Ford 60 178
3 Charlotte Hale 35 162
Assign a Custom Value to a Column in Pandas
In order to create a new column where every value is the same value, this can be directly applied.
For example, if we wanted to add a column for what show each record is from (Westworld), then we can simply write:
df['Show'] = 'Westworld'
print(df)
This returns the following:
First Name Last Name Age Height (cm) Show
0 Dolores Abernathy 31 170 Westworld
1 Maeve Millay 40 165 Westworld
2 Robert Ford 60 178 Westworld
3 Charlotte Hale 35 162 Westworld
Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!
Assign Multiple Values to a Column in Pandas
Say you wanted to assign specific values to a new column, you can pass in a list of values directly into a new column.
Some important things to note here:
- The order matters – the order of the items in your list will match the index of the dataframe, and
- The length of the list must match the length of the dataframe.
To demonstrate this, let’s add a column with random numbers:
df['Number'] = [1, 2, 3, 4]
print(df)
This returns the following:
First Name Last Name Age Height (cm) Number
0 Dolores Abernathy 31 170 1
1 Maeve Millay 40 165 2
2 Robert Ford 60 178 3
3 Charlotte Hale 35 162 4
Calculate a New Column in Pandas
It’s also possible to apply mathematical operations to columns in Pandas. This is done by assign the column to a mathematical operation.
As an example, let’s calculate how many inches each person is tall. This is done by dividing the height in centimeters by 2.54:
df['Height (inches)'] = df['Height (cm)'] / 2.54
print(df)
This returns the following:
First Name Last Name Age Height (cm) Height (inches)
0 Dolores Abernathy 31 170 66.929134
1 Maeve Millay 40 165 64.960630
2 Robert Ford 60 178 70.078740
3 Charlotte Hale 35 162 63.779528
You can also create conditional columns in Pandas using complex if-else statements.
Add or Subtract Columns in Pandas
Similar to calculating a new column in Pandas, you can add or subtract (or multiple and divide) columns in Pandas.
If we wanted to add and subtract the Age and Number columns we can write:
df['Add'] = df['Age'] + df['Number']
df['Subtract'] = df['Age'] - df['Number']
print(df)
This returns:
First Name Last Name Age Height (cm) Number Add Subtract
0 Dolores Abernathy 31 170 1 32 30
1 Maeve Millay 40 165 2 42 38
2 Robert Ford 60 178 3 63 57
3 Charlotte Hale 35 162 4 39 31
Combine String Columns in Pandas
There may be many times when you want to combine different columns that contain strings. For example, the columns for First Name and Last Name can be combined to create a new column called “Name”.
This can be done by writing the following:
df['Name'] = df['First Name'] + ' ' + df['Last Name']
print(df)
This returns the following:
First Name Last Name Age Height (cm) Name
0 Dolores Abernathy 31 170 Dolores Abernathy
1 Maeve Millay 40 165 Maeve Millay
2 Robert Ford 60 178 Robert Ford
3 Charlotte Hale 35 162 Charlotte Hale
Split String Columns in Pandas
Similar to joining two string columns, a string column can also be split. If we wanted to split the Name column into two columns we can use the str.split() method and assign the result to two columns directly.
We can do this by writing:
df[['First Name New', 'Last Name New']] = df['Name'].str.split(' ', expand=True)
print(df)
This returns:
First Name Last Name Age ... Name First Name New Last Name New
0 Dolores Abernathy 31 ... Dolores Abernathy Dolores Abernathy
1 Maeve Millay 40 ... Maeve Millay Maeve Millay
2 Robert Ford 60 ... Robert Ford Robert Ford
3 Charlotte Hale 35 ... Charlotte Hale Charlotte Hale
It’s important to note a few things here:
- We immediately assign two columns using double square brackets,
- The data is delimited by a space, and
- We assign True to the expand argument.
Conclusion
In this post, you learned many different ways of creating columns in Pandas. This can be done by directly inserting data, applying mathematical operations to columns, and by working with strings.
To learn more about string operations like split, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the resources below:
Pingback: Set Pandas Conditional Column Based on Values of Another Column • datagy