In this tutorial, you’ll learn how to use Pandas to drop an index column. You’ll learn how to do this using the .reset_index()
dataframe method, the .set_index()
method, and how to read and write CSV files without an index.
Pandas Dataframes are tabular-like data structures that let you manipulate tabular data, such as from from CSV files. As you import your data, Pandas will attempt to interpret an index column. While many times these indices are relevant, there may times when you simply want to remove the index. Pandas provides a number of helpful ways of doing this, either after a dataframe is loaded or prior to loading the dataframe.
The Quick Answer: Use Pandas .reset_index(drop=True)
to Drop an Index Column
What is a Pandas Index Column?
The Pandas index is analogous to an Excel row number. But simply saying only that would do the index a great disservice. This is because it’s much, much more than a row number. We can think of the row index as being the way in which access a dataframe’s records – similar to an address or a dictionary’s key.
By default, unless a specific index is passed in, Pandas will simply generate an index for you. This index will start at the number 0 and go up to the length of the dataframe minus 1. However, if you’re working with specific data, such as time series data, you may want to index your data by another column.
Technically speaking, the data behind a Pandas Dataframe are backed by a hash table. This is similar to how Python dictionaries perform. Because of this, the using an index to locate your data makes it significantly faster than searching across the entire column’s values.
Note: While indices technically exist across the dataframe as well (i.e., along axis 1), when this article refers to an index, we’re only referring to the row index.
Loading a Sample Pandas Dataframe
To follow along with this tutorial, I have provided a sample Pandas dataframe below. Feel free to copy the code below into your favourite text editor to follow along. Alternatively, use your own data, though your results will, of course, vary.
# Loading a Sample Pandas Dataframe
import pandas as pd
df = pd.DataFrame.from_dict({
'Name': ['Jane', 'Nik', 'Kate', 'Melissa', 'Evan', 'Doug', 'Joe'],
'Age': [10, 35, 34, 23, 70, 55, 89],
'Height': [130, 178, 155, 133, 195, 150, 205],
'Weight': [80, 200, 220, 150, 140, 95, 180]
}).set_index('Name')
We can take a look at what this dataframe looks like by printing out the first five records using the df.head()
method. This returns the following:
Age Height Weight
Name
Jane 10 130 80
Nik 35 178 200
Kate 34 155 220
Melissa 23 133 150
Evan 70 195 140
We can see here that we now have a dataframe that has an index of Name
and three other columns. We used the .set_index()
method to set the dataframe index.
Dropping a Pandas Index Column Using reset_index
The most straightforward way to drop a Pandas dataframe index is to use the Pandas .reset_index()
method. By default, the method will only reset the index, forcing values from 0 - len(df)-1
as the index. The method will also simply insert the dataframe index into a column in the dataframe.
Let’s see what this looks like:
# Resetting a dataframe index with .reset_index()
df = df.reset_index()
print(df.head())
# Returns:
# Name Age Height Weight
# 0 Jane 10 130 80
# 1 Nik 35 178 200
# 2 Kate 34 155 220
# 3 Melissa 23 133 150
# 4 Evan 70 195 140
But what if we wanted to drop the dataframe index and not keep it. We could then pass in the drop=True
argument, asking Pandas to reset the index and to drop the original values. Let’s see what this looks like:
# Drop a Pandas Dataframe index with `.reset_index()
df = df.reset_index(drop=True)
print(df.head())
# Returns:
# Age Height Weight
# 0 10 130 80
# 1 35 178 200
# 2 34 155 220
# 3 23 133 150
# 4 70 195 140
We can see here that the dataframe’s index is reset to the default behaviour and that the original index is completely removed from the dataframe.
In the next section, you’ll learn how to use the Pandas .set_index()
method to drop a dataframe’s index in Pandas.
Dropping a Pandas Index Column Using set_index
We can also use a workaround of setting an index with a column that simply mirrors the normal index pattern. We can do this by first creating a column that contains the values from 0 through to the length of the list minus 1. We can do this directly using the .assign()
method, which can be used to add a column to a Pandas Dataframe. We then use the .set_index()
method to set that new column to the dataframe’s index.
Let’s see what this looks like:
# Delete a Pandas Dataframe Index with .set_index()
df = df.assign(Index=range(len(df))).set_index('Index')
print(df.head())
# Returns:
# Age Height Weight
# 0 10 130 80
# 1 35 178 200
# 2 34 155 220
# 3 23 133 150
# 4 70 195 140
What we’ve done here is first create a column called “Index” by using the .assign()
method. We then chain in the .set_index()
method to assign this new column to the index. This overwrites and deletes the former index.
In the next section, you’ll learn how to read a CSV file into a Pandas Dataframe without the implied index.
Read a CSV File into a Pandas Dataframe without an Index
You may encounter CSV files that are malformed, such as those that have a delimiter at the end of a given row. These may look like this:
Age,Height,Weight
10,130,80,
35,178,200,
34,155,220,
23,133,150,
70,195,140,
55,150,95,
89,205,180,
Because there is a trailing comma, Pandas will incorrectly interpret the first values to be the index values. When we read the file into a dataframe, it will look like this:
# Reading a malformed .csv file with Pandas
df = pd.read_csv('file.csv')
print(df.head())
# Returns:
# Age Height Weight
# 10 130 80 NaN
# 35 178 200 NaN
# 34 155 220 NaN
# 23 133 150 NaN
# 70 195 140 NaN
Of course, this is not what we want. We would want the data to be properly aligned with the columns, so that an empty column is returned at the end. Because these files can often be found, Pandas introduced a parameter that allows us to overwrite the default behaviour.
Let’s see what happens when we pass in index_col = False
into our function:
# Reading a malformed CSV file correctly with Pandas
df = pd.read_csv('file.csv', index_col=False)
print(df.head())
# Returns:
# Age Height Weight
# 0 10 130 80
# 1 35 178 200
# 2 34 155 220
# 3 23 133 150
# 4 70 195 140
We can see that by using the index_col=False
argument, that Pandas overwrites the default behaviour and assigns a proper index.
Conclusion
In this tutorial, you learned how to use Pandas to drop an index column. You learned how to use the Pandas .reset_index()
and .set_index()
methods to drop an index. You also learned how both read and write a CSV file to a Pandas Dataframe. Being able to work with Pandas indices is a useful skill as you learn how to manipulate data using Pandas.
To learn more about the Pandas .reset_index()
method, check out the official documentation here.
Additional Resources
To learn more about similar topics, check out some of these related articles:
I want to seek python language.
You’re in the right place!