Skip to content

How to Get Column Names in a Pandas DataFrame

How to Get Column Names in a Pandas DataFrame Cover Image

In this tutorial, you’ll learn how to use Pandas to get the column names of a DataFrame. There are many different ways to accomplish this to get the result that you’re looking for.

This tutorial covers all of these different scenarios and provides detailed steps. Being able to get and see all of the columns in a Pandas DataFrame can allow you to better work with your data.

Loading a Sample Pandas DataFrame

To follow along with this tutorial, we’ve provided a sample Pandas DataFrame. If you have your own data, feel free to use that – though your results will, of course, vary. To get started, simply copy and paste the code from the block below:

# Loading a Sample Pandas DataFrame
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')
print(df.head())

# Returns:
#         Date  Name  NumSold   Total  Active
# 0  01-Jan-23   Nik       14  329.67    True
# 1  02-Jan-23  Evan       12  475.71   False
# 2  03-Jan-23  Kyra       16  569.64    True
# 3  04-Jan-23  Kate       13  528.23   False
# 4  05-Jan-23   NaN       19  974.65    True

In the code block above, we loaded a Pandas DataFrame using the pd.read_csv() function. The DataFrame has five columns of mixed data types with some missing values.

How to Get Column Names from a Pandas DataFrame

Pandas provides a number of helpful ways in which to get column names. The simplest way to do this is simply to pass the DataFrame into the list() function, which returns a list of all the column names.

# Get all column names as a list
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')
print(list(df))

# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']

Doing this returns a list of all of the column names in the DataFrame, in the order in which they appear.

You can also access all of the column names in a Pandas DataFrame by using the .columns attribute of a DataFrame. This returns a Pandas Index object containing all of the columns.

While this object is iterable, if you want to convert it to a list, you either need to apply the .tolist() method or pass the item into the list() function.

# Using the .columns Attribute to Get Column Names
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')

print(df.columns.tolist())

# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']

This approach uses more code to run.

Why would you use the Pandas DataFrame .columns attribute to get DataFrame columns?

The Pandas .columns attribute allows you to check membership in DataFrame columns. This allows you to easily check whether a column exists or not, without needing to create a separate list of columns.

How to Get a List of Pandas Column Names from CSV File

Pandas also makes it very easy to get a list of column names from a CSV file. In order to do this, you can specify reading only a single row of data. From there, you can run the method described in the section above to get column names.

# Get List of Pandas Column Names from a CSV
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv', 
	nrows=1
)

print(list(df))

# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']

In the example above, we load only a single row of data using the nrows=1 argument. This allows you to load as little data as possible (thereby saving memory and time), while still being able to access the column names.

How to Check if a Column Exists in a Pandas DataFrame

Pandas makes it very easy to check if a column exists in a DataFrame using the .columns attribute. As mentioned above, the .columns attribute returns an Index object, which is list like and can be checked for membership.

In order to check whether or not a column exists in a DataFrame, you can use the in keyword.

Let’s take a look at how we can check if the NumSold column exists:

# Checking if a Column Exists in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print('NumSold' in df.columns)

# Returns: True

In the code above, we used the in keyword to check if the string NumSold exists in the iterable df.columns.

If we wanted to print a message if a column does or doesn’t exist, then we can wrap this in an if-else block, as shown below:

# Checking if a Column Exists or Not
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

if 'NumSold' in df.columns:
    print('Column exists!')
else:
    print("Column doesn't exist!")

# Returns: Column exists!

In the example above, we wrote an if-else block that allows us to alert the user whether or not a column exists.

How to Count the Number of Columns in a Pandas DataFrame

In this section, you’ll learn how to count the number of columns in a Pandas DataFrame. The object returned by the df.columns attribute can be counted by the len() function.

Let’s see how we can count the number of columns in a Pandas DataFrame:

# Counting the Number of Columns in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(len(df.columns))

# Returns: 5

Let’s break down what we did in the code block above:

  1. We loaded the df.columns attribute into the len() function.
  2. This returned the length of the list-like object containing the column names.

In the next section, you’ll get a dictionary of column names and their corresponding data types.

How to Get a Dictionary of Column Names and Data Types

In this section, you’ll learn how to create a dictionary of column names and data types in a DataFrame. This allows you to better understand what types your columns are in your DataFrame.

The Pandas .dtypes attribute returns a Series object containing the column name and the data type of the column. Let’s see what this looks like when printed out:

# Accessing the Data Types of a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(df.dtypes)

# Returns:
# Date        object
# Name        object
# NumSold      int64
# Total      float64
# Active        bool
# dtype: object

If you want to convert this Series to a Python dictionary, we can simply pass this into the dict() function. Let’s see what this looks like:

# Get a Dictionary of a Column Names and Data Types
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(dict(df.dtypes))

# Returns:
# {'Date': dtype('O'), 'Name': dtype('O'), 'NumSold': dtype('int64'), 'Total': dtype('float64'), 'Active': dtype('bool')}

Using the method above, we can generate a dictionary where the keys are the column names and the data type are the keys.

How to Get Column Names of Specific Data Types in Pandas

Pandas makes it very easy to get a list of column names of specific data types. This can be done using the .select_dtypes() method and the list() function. The .select_dtypes() method is applied to a DataFrame to select a single data type or multiple data types.

You can choose to include or exclude specific data types. Let’s see how we can select only columns belonging to boolean data types:

# Get Column Names of Boolean Data Types
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(list(df.select_dtypes('bool')))

# Returns: ['Active']

In the example above, we narrow our DataFrame down to only boolean columns. Then, we get the column names as a list by passing the DataFrame into the list() function.

The Pandas .select_dtypes() function also makes it easy to select all numeric columns. Rather than passing in a list of numeric data types, you can simply pass in 'number'.

# Get Column Names of Numeric Data Types in Pandas
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(list(df.select_dtypes('number')))

# Returns: ['NumSold', 'Total']

The method makes it incredibly simple to get the column names of a specific data type or of multiple data types.

How to Get Pandas Column Names by Index

In this section, you’ll learn how to get Pandas column names by index (or indices). This allows you to get the name of a column at a specific position (or positions).

In an earlier section, you learned how to get a list of all column names by passing the DataFrame into the list() function. Because lists are indexable, we can access the name of a column at a specific index position.

Let’s see how we can access the name of the column in the second position:

# Get the Pandas Column Name by Index
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(list(df)[1])

# Returns: Name

In the example above, we indexed the resulting list to access item 1, meaning the second item.

Similarly, we can use slicing to access multiple names. For example, if we wanted to access the last two column names, we can use negative indexing.

# Get the Last 2 Column Names in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(list(df)[-2:])

# Returns: ['Total', 'Active']

The example above shows that we can access the last two column names of a Pandas DataFrame using list slicing.

How to Get Pandas Column Names in Multi-Index DataFrames

In this section, you’ll learn how to get Pandas column names from a multi-index DataFrame. For this, we’ll load a separate DataFrame, as shown below:

# Loading a Sample Multi-Index Pandas DataFrame
import pandas as pd
import numpy as np

col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
                                ['a', 'b', 'a', 'b']])

df = pd.DataFrame(np.random.randn(4, 4), columns=col)
print(df)

# Returns:
#         one                 two          
#           a         b         a         b
# 0  0.542193 -1.820652 -0.169705 -1.654994
# 1 -0.317596  0.446695  0.720554  0.801922
# 2  1.011754  0.126223 -0.340901  0.007976
# 3  0.657133 -1.029230 -1.262736  0.771902

Let’s see what happens when we try to access the column names using the previously shared methods:

  1. Using list(df) and
  2. Using df.columns
# Accessing Columns of a Multi-Index DataFrame
import pandas as pd
import numpy as np

col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
                                ['a', 'b', 'a', 'b']])
df = pd.DataFrame(np.random.randn(4, 4), columns=col)

option1 = list(df)
option2 = df.columns

print(f'{option1=}')
print(f'{option2=}')

# Returns:
# option1=[('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')]
# option2=MultiIndex([('one', 'a'),
#             ('one', 'b'),
#             ('two', 'a'),
#             ('two', 'b')],
#            )

We can see that both options return a list-like structure containing tuples of the pairs of column names.

Pandas makes it very simple to also access only a single level of the column names. For example, if we only wanted to access the top level of columns, we can use the .get_level_values() method on the .columns attribute, as shown below:

# Accessing a Single Level of Columns of a Multi-Index DataFrame
import pandas as pd
import numpy as np

col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
                                ['a', 'b', 'a', 'b']])
df = pd.DataFrame(np.random.randn(4, 4), columns=col)

print(df.columns.get_level_values(0))

# Returns:
# Index(['one', 'one', 'two', 'two'], dtype='object')

In the following section, you’ll learn how to get Pandas column names that meet a condition.

How to Get Pandas Column Names Meeting a Condition

In order to get Pandas column names where the condition in the column is met, you can simply filter the list of columns using a boolean mask. Say we wanted to get the column names that contain either a lowercase or upper case 'a'. In order to do this, we could write the following:

# Get Pandas Columns Meeting a Condition
import pandas as pd
import re

url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

cols = [col for col in list(df) if re.search(r'[aA]', col)]
print(cols)

# Returns: ['Date', 'Name', 'Total', 'Active']

Let’s break down what we did above:

  1. We imported both Pandas and the regular expression library re
  2. We used a list comprehension to return only columns containing either a lowercase or upper case 'a'
  3. We printed our resulting list

In the following section, you’ll learn how to get Pandas column names for columns containing missing values.

How to Get Pandas Column Names with Missing Values

To get a list of Pandas column names for columns containing missing values, we can simply slice the df.columns object. We can add a boolean condition that identifies which columns contain missing values, by chaining the .isna().any() methods.

Let’s see what this looks like:

# Get Pandas Column Names of Columns with Missing Values
import pandas as pd

url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(df.columns[df.isna().any()])

# Returns: 
# Index(['Name'], dtype='object')

We can see from the code block above that we were able to filter our resulting list of columns using a boolean mask. The array that gets returned from df.isna().any() contains only boolean values. This means that any column containing any number of missing values is labelled as True, and columns without missing values are marked as False.

How to Get Pandas Column Names with Duplicate Values

Getting column names of columns containing duplicate values works in a very similar way to the example above. We can apply a mask to the resulting array of column names created by df.columns. This mask represents a boolean array of columns containing duplicate values.

Let’s see what this looks like:

# Get Pandas Column Names with Duplicate Values
import pandas as pd

url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
df.loc[1, 'Name'] = 'Nik'

mask = df.apply(lambda x: x.duplicated().any(), axis=0)
print(df.columns[mask])

# Returns: Index(['Name', 'Active'], dtype='object')

The above code block is a bit more complicated, so let’s break down what we did here:

  1. We assigned a new duplicate value using the .loc accessor
  2. We created a mask variable, which applies a function to each column to identify whether any duplicate values exist in each column
  3. We then apply the mask to the df.columns attribute to filter the array

In the following section, you’ll learn how to get an alphabetized list of column names in Pandas.

How to Get an Alphabetical List of Pandas Column Names

Pandas makes it incredibly easy to get an alphabetical list of Pandas column names, using the sorted() function. Because the sorted() function returns a sorted list, we can simply pass in our entire DataFrame to return an alphabetical list.

Let’s see how to get an alphabetical list of column names in Pandas:

# Get an Alphabetical List of Column Names in Pandas
import pandas as pd

url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(sorted(df))

# Returns: 
# ['Active', 'Date', 'Name', 'NumSold', 'Total']

What’s great about this approach is that it’s very little code to write. Similarly, you could sort either the list(df) array or the values returned from df.columns, but this saves you some typing!

How to Get Column Names Starting with a Letter in Pandas

In this section, you’ll learn how to get column names of a Pandas DataFrame that start with a particular letter. In order to do this, we can apply a boolean mask to the resulting array produces by using the df.columns attribute.

To create our boolean mask, we can apply the .str.startswith() method to the array. This method will mask any values that start with a given letter.

Let’s see how we can get the column names that start with the letter 'N':

# Get Column Names Starting with a Letter in Pandas
import pandas as pd

url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)

print(df.columns[df.columns.str.startswith('N')])

# Returns:
# Index(['Name', 'NumSold'], dtype='object')

Frequently Asked Questions

How can you get a list of all column names in a Pandas DataFrame?

The easiest way to get a list of all column names in a Pandas DataFrame is to use list(df). Alternatively, you can use the df.columns attribute.

Conclusion

In this comprehensive guide, you learned the many different ways to get column names of a Pandas DataFrame. You first learned the different ways to get all of the column names in a Pandas DataFrame. Then, you learned how to get column names in different, niche ways. For example, you learned how to get column names that match a particular data type or contain missing values.

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts