Skip to content

How to Drop One or More Columns in Pandas

How to Drop One or More Columns in Pandas

In this tutorial, you’ll learn how to use Pandas to drop one or more columns. When working with large DataFrames in Pandas, you’ll find yourself wanting to remove one or more of these columns. This allows you to drop data that isn’t relevant or needed for your analysis.

By the end of this tutorial, you’ll have learned:

  • How to drop a Pandas column by position
  • How to drop a Pandas column by name
  • How to drop a Pandas column by condition
  • How to drop multiple Pandas columns
  • How to drop Pandas columns safely, if they exist
  • And more

Loading a Sample DataFrame

To follow along with the tutorial line-by-line, I have provided a sample Pandas DataFrame below. Simply copy and paste the code into your favorite code editor. If you have your own DataFrame, feel free to use that, though your results will of course vary.

You can explore the sample Pandas DataFrame below:

# Loading a Sample Pandas DataFrame
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

print(df)

# Returns:
#    Name   Age Location  Current
# 0   Nik  33.0   Canada     True
# 1  Kate  32.0      USA    False
# 2  Evan  36.0     None    False
# 3  Kyra   NaN     None     True

In the sample DataFrame above, we have a dataset with four columns:

  • Name: a column containing strings
  • Age: a column containing a user’s age
  • Location: the locations of a user
  • Current: whether the user is current or not, represented as booleans

Some of these columns also contain missing data, in order to later demonstrate how to drop columns with missing data.

How to Drop a Pandas Column by Name

To drop a Pandas DataFrame column, you can use the .drop() method, which allows you to pass in the name of a column to drop. Let’s take a look at the .drop() method and the parameters that it accepts:

# Understanding the Pandas .drop() Method
import pandas as pd

df.drop(
    labels=None, 
    axis=0, 
    index=None, 
    columns=None, 
    level=None, 
    inplace=False, 
    errors='raise'
)

In the code block above, the various parameters and their default arguments are shown.

In order to drop a single Pandas column, you can pass in a string, representing the column, into either:

  • The labels= parameter, or
  • The columns= parameter.

The parameters accept a column label or a list of column names (as you’ll see in the following section). Let’s see how we can use the labels= parameter to drop a single column:

# Dropping a Single Column Using the .drop() Method
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(labels='Current', axis=1)

print(df)

# Returns:
#    Name   Age Location
# 0   Nik  33.0   Canada
# 1  Kate  32.0      USA
# 2  Evan  36.0     None
# 3  Kyra   NaN     None

In the above example, we passed the column we wanted to drop ('Current') into the .drop() method. However, this method is a bit verbose. We can simplify this code by using the columns= parameter instead, which replaces specifying the axis=1 parameter.

Let’s see how we can simplify dropping a single column using the .drop() method:

# Dropping a Single Column Using the .drop() Method
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(columns='Current')

print(df)

# Returns:
#    Name   Age Location
# 0   Nik  33.0   Canada
# 1  Kate  32.0      USA
# 2  Evan  36.0     None
# 3  Kyra   NaN     None

In the code block above, we used the columns= parameter to specify the column to drop. This simplifies the code and makes it easier to read since it’s immediately clear that you want to drop columns (rather than rows).

How to Drop Pandas Columns by Name In Place

In the examples above, you learned how to drop Pandas columns by name. However, these operations happened by re-assigning the DataFrame to a variable of the same name. You can also use the operation in place, which can help with memory management. This can be especially valuable when working with a larger DataFrame.

In the code block below, you’ll learn how to drop a Pandas column by name in place:

# Dropping a Single Column Using the .drop() Method In Place
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df.drop(columns='Current', inplace=True)

print(df)

# Returns:
#    Name   Age Location
# 0   Nik  33.0   Canada
# 1  Kate  32.0      USA
# 2  Evan  36.0     None
# 3  Kyra   NaN     None

In the code block above, we used the .drop() method by passing in inplace=True. This allowed for the operation to occur in place. This means that the method didn’t return a DataFrame, but modified the original.

In the following section, you’ll learn how to drop multiple columns in Pandas.

How to Drop Multiple Pandas Columns by Names

When using the Pandas DataFrame .drop() method, you can drop multiple columns by name by passing in a list of columns to drop. This method works as the examples shown above, where you can either:

  1. Pass in a list of columns into the labels= argument and use index=1
  2. Pass in a list of columns into the columns= argument

Let’s see how you can use the .drop() method to drop multiple columns by name, by dropping the 'Current' and 'Location' columns:

# Drop Multiple Pandas Columns By Name
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(columns=['Location', 'Current'])
# Or use:
# df = df.drop(labels=['Location', 'Current'], axis=1)

print(df)

# Returns:
#    Name   Age
# 0   Nik  33.0
# 1  Kate  32.0
# 2  Evan  36.0
# 3  Kyra   NaN

Let’s break down what we did above:

  1. We used the .drop() method using the columns= parameter and passed in a list of columns
  2. Alternatively, you could use the method and pass in the labels= parameter, though you’ll need to also pass in the axis=1 argument.

Similarly, you could use the method in place, by passing in the inplace=True argument, as shown below:

# Drop Multiple Pandas Columns By Name In Place
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df.drop(columns=['Location', 'Current'], inplace=True)
# Or use:
# df.drop(labels=['Location', 'Current'], axis=1, inplace=True)

print(df)

# Returns:
#    Name   Age
# 0   Nik  33.0
# 1  Kate  32.0
# 2  Evan  36.0
# 3  Kyra   NaN

In the following section, you’ll learn how to use Pandas to drop a column by position or index.

How to Drop a Pandas Column by Position/Index

Dropping a Pandas column by its position (or index) can be done by using the .drop() method. The method allows you to access columns by their index position.

This is done using the df.columns attribute, which returns a list-like structure of all the columns in the DataFrame. From this, you can use list slicing to select the columns you want to drop.

To better understand how this works, let’s take a look at an example, where we’ll drop the third column:

# Dropping the Third Column by Index Position
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(columns=df.columns[2])

print(df)

# Returns:
#    Name   Age  Current
# 0   Nik  33.0     True
# 1  Kate  32.0    False
# 2  Evan  36.0    False
# 3  Kyra   NaN     True

In the .drop() method call, we pass in the selection made by indexing the df.columns list-like array to get the third item. Technically, this extracts the specific column name, which in this case would be 'Location'.

In the following section, you’ll learn how to drop multiple columns by position or index.

How to Drop Multiple Pandas Columns by Position/Index

Dropping multiple columns by position or index works in a similar way to what was shown in the previous section. By using the .drop() method, you can access columns by their position using the df.columns attribute.

To better illustrate this, let’s drop the first and third columns of our DataFrame in place:

# Dropping Multiple Columns by Index Positions
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df.drop(columns=df.columns[[0, 2]], inplace=True)

print(df)

# Returns:
#     Age  Current
# 0  33.0     True
# 1  32.0    False
# 2  36.0    False
# 3   NaN     True

It’s important to note in the example above that we used double square bracket indexing, [[0, 2]], in order to select more than one column.

In the following section, you’ll learn how to safely drop a column if it exists.

How to Drop a Pandas Column If It Exists

By default, the Pandas .drop() method will raise an error if you attempt to drop a column that doesn’t exist. This behavior is controlled by the errors= parameter, which defaults to the value of 'raise'.

The alternative argument that can be passed in is 'ignore', which will ignore any resulting errors if a column doesn’t exist.

Let’s see how we can safely remove a column if it exists in Pandas:

# Drop a Column in Pandas If It Exists
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(columns='FAKE COLUMN', errors='ignore')

print(df)

# Returns:
#    Name   Age Location  Current
# 0   Nik  33.0   Canada     True
# 1  Kate  32.0      USA    False
# 2  Evan  36.0     None    False
# 3  Kyra   NaN     None     True

In the example above, we tried to drop the column 'FAKE COLUMN'. However, since the column doesn’t exist nothing was dropped.

This removes the need to write any if-else or try-except blocks to handle these operations safely. That said, if you wanted to alert the user to a missing column, then you would need to do this.

In the code block below, we’ll use a try-except block to alert the user that a column doesn’t exist, rather than letting the error pass silently:

# Dropping a Column If It Exists Using Try-Except
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

try:
    df = df.drop(columns='FAKE COLUMN')
except KeyError:
    print("Column doesn't exist!")

# Returns: Column doesn't exist!

In the example above, we wrapped our .drop() method call in a try-except block. The except statement checks for a KeyError and alerts the user with a print statement.

In the following section, you’ll learn how to drop a Pandas column using a condition.

How to Drop Pandas Columns by Condition

In this section, you’ll learn how to drop a Pandas column by using a condition. For example, you’ll learn how to drop columns that don’t contain the letter 'a'. This can be done, similar to the examples above, using the .drop() method. We can filter the list of column names to only include column names that include the letter we want to include.

# Dropping Columns That Meet a Condition in Pandas
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.drop(columns=[col for col in df.columns if 'a' not in col])
print(df)

# Returns:
#    Name Location
# 0   Nik   Canada
# 1  Kate      USA
# 2  Evan     None
# 3  Kyra     None

Let’s break down what we did in the code block above:

  1. We used the .drop() method to drop specific columns, as indicated by using the columns= parameter.
  2. We passed in a list comprehension that returned only column names from the DataFrame if the letter 'a' was not in the column name.

In the following section, you’ll learn how to drop columns containing missing values.

How to Drop Pandas Columns Containing Missing Values

There are two ways in which you may want to drop columns containing missing values in Pandas:

  1. Drop any column containing any number of missing values
  2. Drop columns containing a specific amount of missing values

In order to drop columns that contain any number of missing values, you can use the .dropna() method. Let’s take a look at how this works:

# Dropping Columns that Contain Any Number of Missing Values
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.dropna(axis=1)
print(df)

# Returns:
#    Name  Current
# 0   Nik     True
# 1  Kate    False
# 2  Evan    False
# 3  Kyra     True

By default, the .dropna() method will drop columns (when the axis is specified to be 1) where any number of values are missing. This is controlled using the default how='any' parameter.

Dropping Pandas Columns Where a Number of Records Are Missing

If we wanted to drop columns that contained a specific number of missing values, we could use the thresh= parameter, which allows you to pass in an integer representing the minimum number of records that must be non-empty in a given column.

Let’s see how we can drop columns that contain at least two missing values:

# Dropping Columns That Contain A Minimum Number of Missing Values
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.dropna(axis=1, thresh=3)
print(df)

# Returns:
#    Name   Age  Current
# 0   Nik  33.0     True
# 1  Kate  32.0    False
# 2  Evan  36.0    False
# 3  Kyra   NaN     True

Let’s break down what the code above is doing, as it’s not immediately clear:

  • The thresh= parameter determines the number of of non-missing values that must exist at a minimum.
  • In this case, we specified that there must be at least three non-missing values. This implies, then, that we would have 2 or more missing values, given the length of the DataFrame.

Dropping Pandas Columns Where a Percentage of Records Are Missing

Arguably, a better way to set this threshold is by using the percentage of values that are missing. Say you wanted to drop columns where 50% or more values were missing. You could do this using the .isnull() method, combined with the .drop() method.

Using the .isnull() method, we can calculate the percentage of records that are missing in a column and filter this down to only columns with more values missing than a given threshold.

Let’s see how we can drop columns where 50% or more values are missing:

# Dropping Columns Where 50%+ Values are Missing
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

columns = df.columns[df.isnull().mean() >= 0.5]
df = df.drop(columns=columns)
print(df)

# Returns:
#    Name   Age  Current
# 0   Nik  33.0     True
# 1  Kate  32.0    False
# 2  Evan  36.0    False
# 3  Kyra   NaN     True

Let’s break down what we did in the code block above:

  1. We defined a new variable, columns , which chains the .isnull() and .mean() methods together to return a percentage of the number of missing values in each column.
  2. This returns an array of boolean values, representing if the given column meets the condition or not.
  3. We then sliced the list of columns using this boolean array to filter down to columns to drop.
  4. In this case, we only dropped the 'Location' column, since 50% of the values were missing.

In the following section, you’ll learn how to drop Pandas columns following a specific column.

How to Drop Pandas Columns After a Specific Column

We can also use the .loc selector to drop columns based on their name. This allows us to define columns we want to drop, following a specific column. The .loc accessor selects rows and columns and allows us to define slices of records to include.

Pandas Select Columns with loc
Understanding the .loc accessor

Say we wanted to drop all columns following the 'Age' column, we could write the following:

# Dropping Columns After a Specific Column Name
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.loc[:, :'Age']
print(df)

# Returns:
#    Name   Age
# 0   Nik  33.0
# 1  Kate  32.0
# 2  Evan  36.0
# 3  Kyra   NaN

The .loc accessor specifies that we want to keep all rows (indicated by the first :) and keep all columns up to the 'Age' column. As we can see, the columns that follow that column are dropped!

Pandas Select Columns with iloc
Understanding the iloc selector

Similarly, we can use the .iloc accessor to drop any columns before or after a given column number.

Say we wanted to drop any column before the third column. We could write the following:

# Dropping Columns Before a Specific Column Number
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.iloc[:, 2:]
print(df)

# Returns:
#   Location  Current
# 0   Canada     True
# 1      USA    False
# 2     None    False
# 3     None     True

The .iloc accessor allows us to provide slices of rows and columns we want to select. Any columns, in this case, that aren’t selected are dropped.

How to Drop Pandas Columns of Specific Data Types

In order to drop Pandas columns of a specific data type, you can use the .select_dtypes() method. This method allows you to pass in a list of data types that you want to drop from the DataFrame.

Let’s take a look at an example of how we can use the method to drop columns that are boolean data types:

# Dropping Pandas Columns of a Specific Data Type
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

df = df.select_dtypes(exclude=['boolean'])
print(df)

# Returns:
#    Name   Age Location
# 0   Nik  33.0   Canada
# 1  Kate  32.0      USA
# 2  Evan  36.0     None
# 3  Kyra   NaN     None

In the above example, we used the .select_dtypes() method with the exclude= parameter. This parameter accepts a list of different data types to exclude. Any column that matches the excluded data type will be dropped from the resulting DataFrame.

How to Pop Pandas Columns

In this section, you’ll learn how to use the Pandas .pop() method to drop a column and save it to a resulting variable. This can be particularly helpful in machine learning problems, where you may want to remove a column representing the target variable. In these cases, you’ll often want to save that column as a Series of its own.

Let’s see how we can use the .pop() method to drop the 'Name' column from the DataFrame and save it to its own Series:

# Popping a Pandas DataFrame Column
import pandas as pd

df = pd.DataFrame({
    'Name': ['Nik', 'Kate', 'Evan', 'Kyra'],
    'Age': [33, 32, 36, None],
    'Location': ['Canada', 'USA', None, None],
    'Current': [True, False, False, True]})

names = df.pop('Name')
print(df)

# Returns:
#     Age Location  Current
# 0  33.0   Canada     True
# 1  32.0      USA    False
# 2  36.0     None    False
# 3   NaN     None     True

In the code block above, we popped the column 'Name' and assigned it to the variable names. This means that the column was dropped from the DataFrame a Pandas Series was created containing the values from that column.

Conclusion

In this tutorial, you learned a number of useful ways in which you can use Pandas to drop columns from a DataFrame. Being able to work with DataFrames is an important skill for data analysts and data scientists alike. Understanding the vast flexibility that Pandas offers in working with columns can make your workflow significantly easier and more streamlined!

Throughout this tutorial, you learned how to use the Pandas .drop() method, the .dropna() method, the .pop() method and the .loc and .iloc accessors. All of these methods allow you to drop columns in a Pandas DataFrame in different ways.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published.