Pandas: How to Drop a Dataframe Index Column

How to Drop a Pandas Index Column Cover Image

In this tutorial, you’ll learn how to use Pandas to drop an index column. You’ll learn how to do this using the .reset_index() dataframe method, the .set_index() method, and how to read and write CSV files without an index.

Pandas Dataframes are tabular-like data structures that let you manipulate tabular data, such as from from CSV files. As you import your data, Pandas will attempt to interpret an index column. While many times these indices are relevant, there may times when you simply want to remove the index. Pandas provides a number of helpful ways of doing this, either after a dataframe is loaded or prior to loading the dataframe.

The Quick Answer: Use Pandas .reset_index(drop=True) to Drop an Index Column

Quick Answer - Drop a Pandas Index Column
How to easily drop a Pandas Dataframe index column

What is a Pandas Index Column?

The Pandas index is analogous to an Excel row number. But simply saying only that would do the index a great disservice. This is because it’s much, much more than a row number. We can think of the row index as being the way in which access a dataframe’s records – similar to an address or a dictionary’s key.

By default, unless a specific index is passed in, Pandas will simply generate an index for you. This index will start at the number 0 and go up to the length of the dataframe minus 1. However, if you’re working with specific data, such as time series data, you may want to index your data by another column.

Technically speaking, the data behind a Pandas Dataframe are backed by a hash table. This is similar to how Python dictionaries perform. Because of this, the using an index to locate your data makes it significantly faster than searching across the entire column’s values.

Note: While indices technically exist across the dataframe as well (i.e., along axis 1), when this article refers to an index, we’re only referring to the row index.

Loading a Sample Pandas Dataframe

To follow along with this tutorial, I have provided a sample Pandas dataframe below. Feel free to copy the code below into your favourite text editor to follow along. Alternatively, use your own data, though your results will, of course, vary.

# Loading a Sample Pandas Dataframe
import pandas as pd

df = pd.DataFrame.from_dict({
    'Name': ['Jane', 'Nik', 'Kate', 'Melissa', 'Evan', 'Doug', 'Joe'],
    'Age': [10, 35, 34, 23, 70, 55, 89],
    'Height': [130, 178, 155, 133, 195, 150, 205],
    'Weight': [80, 200, 220, 150, 140, 95, 180]
}).set_index('Name')

We can take a look at what this dataframe looks like by printing out the first five records using the df.head() method. This returns the following:

         Age  Height  Weight
Name                        
Jane      10     130      80
Nik       35     178     200
Kate      34     155     220
Melissa   23     133     150
Evan      70     195     140

We can see here that we now have a dataframe that has an index of Name and three other columns. We used the .set_index() method to set the dataframe index.

Dropping a Pandas Index Column Using reset_index

The most straightforward way to drop a Pandas dataframe index is to use the Pandas .reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.

Let’s see what this looks like:

# Resetting a dataframe index with .reset_index()
df = df.reset_index()
print(df.head())

# Returns:
#       Name  Age  Height  Weight
# 0     Jane   10     130      80
# 1      Nik   35     178     200
# 2     Kate   34     155     220
# 3  Melissa   23     133     150
# 4     Evan   70     195     140

But what if we wanted to drop the dataframe index and not keep it. We could then pass in the drop=True argument, asking Pandas to reset the index and to drop the original values. Let’s see what this looks like:

# Drop a Pandas Dataframe index with `.reset_index()
df = df.reset_index(drop=True)
print(df.head())

# Returns:
#    Age  Height  Weight
# 0   10     130      80
# 1   35     178     200
# 2   34     155     220
# 3   23     133     150
# 4   70     195     140

We can see here that the dataframe’s index is reset to the default behaviour and that the original index is completely removed from the dataframe.

In the next section, you’ll learn how to use the Pandas .set_index() method to drop a dataframe’s index in Pandas.

Dropping a Pandas Index Column Using set_index

We can also use a workaround of setting an index with a column that simply mirrors the normal index pattern. We can do this by first creating a column that contains the values from 0 through to the length of the list minus 1. We can do this directly using the .assign() method, which can be used to add a column to a Pandas Dataframe. We then use the .set_index() method to set that new column to the dataframe’s index.

Let’s see what this looks like:

# Delete a Pandas Dataframe Index with .set_index()
df = df.assign(Index=range(len(df))).set_index('Index')
print(df.head())

# Returns:
#    Age  Height  Weight
# 0   10     130      80
# 1   35     178     200
# 2   34     155     220
# 3   23     133     150
# 4   70     195     140

What we’ve done here is first create a column called “Index” by using the .assign() method. We then chain in the .set_index() method to assign this new column to the index. This overwrites and deletes the former index.

In the next section, you’ll learn how to read a CSV file into a Pandas Dataframe without the implied index.

Read a CSV File into a Pandas Dataframe without an Index

You may encounter CSV files that are malformed, such as those that have a delimiter at the end of a given row. These may look like this:

Age,Height,Weight
10,130,80,
35,178,200,
34,155,220,
23,133,150,
70,195,140,
55,150,95,
89,205,180,

Because there is a trailing comma, Pandas will incorrectly interpret the first values to be the index values. When we read the file into a dataframe, it will look like this:

# Reading a malformed .csv file with Pandas
df = pd.read_csv('file.csv')
print(df.head())

# Returns:
#     Age  Height  Weight
# 10  130      80     NaN
# 35  178     200     NaN
# 34  155     220     NaN
# 23  133     150     NaN
# 70  195     140     NaN

Of course, this is not what we want. We would want the data to be properly aligned with the columns, so that an empty column is returned at the end. Because these files can often be found, Pandas introduced a parameter that allows us to overwrite the default behaviour.

Let’s see what happens when we pass in index_col = False into our function:

# Reading a malformed CSV file correctly with Pandas
df = pd.read_csv('file.csv', index_col=False)
print(df.head())

# Returns:
#    Age  Height  Weight
# 0   10     130      80
# 1   35     178     200
# 2   34     155     220
# 3   23     133     150
# 4   70     195     140

We can see that by using the index_col=False argument, that Pandas overwrites the default behaviour and assigns a proper index.

Conclusion

In this tutorial, you learned how to use Pandas to drop an index column. You learned how to use the Pandas .reset_index() and .set_index() methods to drop an index. You also learned how both read and write a CSV file to a Pandas Dataframe. Being able to work with Pandas indices is a useful skill as you learn how to manipulate data using Pandas.

To learn more about the Pandas .reset_index() method, check out the official documentation here.

Additional Resources

To learn more about similar topics, check out some of these related articles: