In this tutorial, you’ll learn how to use Pandas to get the column names of a DataFrame. There are many different ways to accomplish this to get the result that you’re looking for.
This tutorial covers all of these different scenarios and provides detailed steps. Being able to get and see all of the columns in a Pandas DataFrame can allow you to better work with your data.
Table of Contents
Loading a Sample Pandas DataFrame
To follow along with this tutorial, we’ve provided a sample Pandas DataFrame. If you have your own data, feel free to use that – though your results will, of course, vary. To get started, simply copy and paste the code from the block below:
# Loading a Sample Pandas DataFrame
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')
print(df.head())
# Returns:
# Date Name NumSold Total Active
# 0 01-Jan-23 Nik 14 329.67 True
# 1 02-Jan-23 Evan 12 475.71 False
# 2 03-Jan-23 Kyra 16 569.64 True
# 3 04-Jan-23 Kate 13 528.23 False
# 4 05-Jan-23 NaN 19 974.65 True
In the code block above, we loaded a Pandas DataFrame using the pd.read_csv()
function. The DataFrame has five columns of mixed data types with some missing values.
How to Get Column Names from a Pandas DataFrame
Pandas provides a number of helpful ways in which to get column names. The simplest way to do this is simply to pass the DataFrame into the list()
function, which returns a list of all the column names.
# Get all column names as a list
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')
print(list(df))
# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']
Doing this returns a list of all of the column names in the DataFrame, in the order in which they appear.
You can also access all of the column names in a Pandas DataFrame by using the .columns
attribute of a DataFrame. This returns a Pandas Index
object containing all of the columns.
While this object is iterable, if you want to convert it to a list, you either need to apply the .tolist()
method or pass the item into the list()
function.
# Using the .columns Attribute to Get Column Names
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv')
print(df.columns.tolist())
# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']
This approach uses more code to run.
.columns
attribute to get DataFrame columns? The Pandas .columns
attribute allows you to check membership in DataFrame columns. This allows you to easily check whether a column exists or not, without needing to create a separate list of columns.
How to Get a List of Pandas Column Names from CSV File
Pandas also makes it very easy to get a list of column names from a CSV file. In order to do this, you can specify reading only a single row of data. From there, you can run the method described in the section above to get column names.
# Get List of Pandas Column Names from a CSV
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv',
nrows=1
)
print(list(df))
# Returns:
# ['Date', 'Name', 'NumSold', 'Total', 'Active']
In the example above, we load only a single row of data using the nrows=1
argument. This allows you to load as little data as possible (thereby saving memory and time), while still being able to access the column names.
How to Check if a Column Exists in a Pandas DataFrame
Pandas makes it very easy to check if a column exists in a DataFrame using the .columns
attribute. As mentioned above, the .columns
attribute returns an Index
object, which is list like and can be checked for membership.
In order to check whether or not a column exists in a DataFrame, you can use the in
keyword.
Let’s take a look at how we can check if the NumSold
column exists:
# Checking if a Column Exists in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print('NumSold' in df.columns)
# Returns: True
In the code above, we used the in
keyword to check if the string NumSold
exists in the iterable df.columns
.
If we wanted to print a message if a column does or doesn’t exist, then we can wrap this in an if-else
block, as shown below:
# Checking if a Column Exists or Not
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
if 'NumSold' in df.columns:
print('Column exists!')
else:
print("Column doesn't exist!")
# Returns: Column exists!
In the example above, we wrote an if-else
block that allows us to alert the user whether or not a column exists.
How to Count the Number of Columns in a Pandas DataFrame
In this section, you’ll learn how to count the number of columns in a Pandas DataFrame. The object returned by the df.columns
attribute can be counted by the len()
function.
Let’s see how we can count the number of columns in a Pandas DataFrame:
# Counting the Number of Columns in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(len(df.columns))
# Returns: 5
Let’s break down what we did in the code block above:
- We loaded the
df.columns
attribute into thelen()
function. - This returned the length of the list-like object containing the column names.
In the next section, you’ll get a dictionary of column names and their corresponding data types.
How to Get a Dictionary of Column Names and Data Types
In this section, you’ll learn how to create a dictionary of column names and data types in a DataFrame. This allows you to better understand what types your columns are in your DataFrame.
The Pandas .dtypes
attribute returns a Series object containing the column name and the data type of the column. Let’s see what this looks like when printed out:
# Accessing the Data Types of a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(df.dtypes)
# Returns:
# Date object
# Name object
# NumSold int64
# Total float64
# Active bool
# dtype: object
If you want to convert this Series to a Python dictionary, we can simply pass this into the dict()
function. Let’s see what this looks like:
# Get a Dictionary of a Column Names and Data Types
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(dict(df.dtypes))
# Returns:
# {'Date': dtype('O'), 'Name': dtype('O'), 'NumSold': dtype('int64'), 'Total': dtype('float64'), 'Active': dtype('bool')}
Using the method above, we can generate a dictionary where the keys are the column names and the data type are the keys.
How to Get Column Names of Specific Data Types in Pandas
Pandas makes it very easy to get a list of column names of specific data types. This can be done using the .select_dtypes()
method and the list()
function. The .select_dtypes()
method is applied to a DataFrame to select a single data type or multiple data types.
You can choose to include or exclude specific data types. Let’s see how we can select only columns belonging to boolean data types:
# Get Column Names of Boolean Data Types
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(list(df.select_dtypes('bool')))
# Returns: ['Active']
In the example above, we narrow our DataFrame down to only boolean columns. Then, we get the column names as a list by passing the DataFrame into the list()
function.
The Pandas .select_dtypes()
function also makes it easy to select all numeric columns. Rather than passing in a list of numeric data types, you can simply pass in 'number'
.
# Get Column Names of Numeric Data Types in Pandas
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(list(df.select_dtypes('number')))
# Returns: ['NumSold', 'Total']
The method makes it incredibly simple to get the column names of a specific data type or of multiple data types.
How to Get Pandas Column Names by Index
In this section, you’ll learn how to get Pandas column names by index (or indices). This allows you to get the name of a column at a specific position (or positions).
In an earlier section, you learned how to get a list of all column names by passing the DataFrame into the list()
function. Because lists are indexable, we can access the name of a column at a specific index position.
Let’s see how we can access the name of the column in the second position:
# Get the Pandas Column Name by Index
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(list(df)[1])
# Returns: Name
In the example above, we indexed the resulting list to access item 1, meaning the second item.
Similarly, we can use slicing to access multiple names. For example, if we wanted to access the last two column names, we can use negative indexing.
# Get the Last 2 Column Names in a Pandas DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(list(df)[-2:])
# Returns: ['Total', 'Active']
The example above shows that we can access the last two column names of a Pandas DataFrame using list slicing.
How to Get Pandas Column Names in Multi-Index DataFrames
In this section, you’ll learn how to get Pandas column names from a multi-index DataFrame. For this, we’ll load a separate DataFrame, as shown below:
# Loading a Sample Multi-Index Pandas DataFrame
import pandas as pd
import numpy as np
col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
['a', 'b', 'a', 'b']])
df = pd.DataFrame(np.random.randn(4, 4), columns=col)
print(df)
# Returns:
# one two
# a b a b
# 0 0.542193 -1.820652 -0.169705 -1.654994
# 1 -0.317596 0.446695 0.720554 0.801922
# 2 1.011754 0.126223 -0.340901 0.007976
# 3 0.657133 -1.029230 -1.262736 0.771902
Let’s see what happens when we try to access the column names using the previously shared methods:
- Using
list(df)
and - Using
df.columns
# Accessing Columns of a Multi-Index DataFrame
import pandas as pd
import numpy as np
col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
['a', 'b', 'a', 'b']])
df = pd.DataFrame(np.random.randn(4, 4), columns=col)
option1 = list(df)
option2 = df.columns
print(f'{option1=}')
print(f'{option2=}')
# Returns:
# option1=[('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')]
# option2=MultiIndex([('one', 'a'),
# ('one', 'b'),
# ('two', 'a'),
# ('two', 'b')],
# )
We can see that both options return a list-like structure containing tuples of the pairs of column names.
Pandas makes it very simple to also access only a single level of the column names. For example, if we only wanted to access the top level of columns, we can use the .get_level_values()
method on the .columns
attribute, as shown below:
# Accessing a Single Level of Columns of a Multi-Index DataFrame
import pandas as pd
import numpy as np
col = pd.MultiIndex.from_arrays([['one', 'one', 'two', 'two'],
['a', 'b', 'a', 'b']])
df = pd.DataFrame(np.random.randn(4, 4), columns=col)
print(df.columns.get_level_values(0))
# Returns:
# Index(['one', 'one', 'two', 'two'], dtype='object')
In the following section, you’ll learn how to get Pandas column names that meet a condition.
How to Get Pandas Column Names Meeting a Condition
In order to get Pandas column names where the condition in the column is met, you can simply filter the list of columns using a boolean mask. Say we wanted to get the column names that contain either a lowercase or upper case 'a'
. In order to do this, we could write the following:
# Get Pandas Columns Meeting a Condition
import pandas as pd
import re
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
cols = [col for col in list(df) if re.search(r'[aA]', col)]
print(cols)
# Returns: ['Date', 'Name', 'Total', 'Active']
Let’s break down what we did above:
- We imported both Pandas and the regular expression library
re
- We used a list comprehension to return only columns containing either a lowercase or upper case
'a'
- We printed our resulting list
In the following section, you’ll learn how to get Pandas column names for columns containing missing values.
How to Get Pandas Column Names with Missing Values
To get a list of Pandas column names for columns containing missing values, we can simply slice the df.columns
object. We can add a boolean condition that identifies which columns contain missing values, by chaining the .isna().any() methods.
Let’s see what this looks like:
# Get Pandas Column Names of Columns with Missing Values
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(df.columns[df.isna().any()])
# Returns:
# Index(['Name'], dtype='object')
We can see from the code block above that we were able to filter our resulting list of columns using a boolean mask. The array that gets returned from df.isna().any()
contains only boolean values. This means that any column containing any number of missing values is labelled as True
, and columns without missing values are marked as False
.
How to Get Pandas Column Names with Duplicate Values
Getting column names of columns containing duplicate values works in a very similar way to the example above. We can apply a mask to the resulting array of column names created by df.columns
. This mask represents a boolean array of columns containing duplicate values.
Let’s see what this looks like:
# Get Pandas Column Names with Duplicate Values
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
df.loc[1, 'Name'] = 'Nik'
mask = df.apply(lambda x: x.duplicated().any(), axis=0)
print(df.columns[mask])
# Returns: Index(['Name', 'Active'], dtype='object')
The above code block is a bit more complicated, so let’s break down what we did here:
- We assigned a new duplicate value using the
.loc
accessor - We created a
mask
variable, which applies a function to each column to identify whether any duplicate values exist in each column - We then apply the mask to the
df.columns
attribute to filter the array
In the following section, you’ll learn how to get an alphabetized list of column names in Pandas.
How to Get an Alphabetical List of Pandas Column Names
Pandas makes it incredibly easy to get an alphabetical list of Pandas column names, using the sorted()
function. Because the sorted()
function returns a sorted list, we can simply pass in our entire DataFrame to return an alphabetical list.
Let’s see how to get an alphabetical list of column names in Pandas:
# Get an Alphabetical List of Column Names in Pandas
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(sorted(df))
# Returns:
# ['Active', 'Date', 'Name', 'NumSold', 'Total']
What’s great about this approach is that it’s very little code to write. Similarly, you could sort either the list(df)
array or the values returned from df.columns
, but this saves you some typing!
How to Get Column Names Starting with a Letter in Pandas
In this section, you’ll learn how to get column names of a Pandas DataFrame that start with a particular letter. In order to do this, we can apply a boolean mask to the resulting array produces by using the df.columns
attribute.
To create our boolean mask, we can apply the .str.startswith()
method to the array. This method will mask any values that start with a given letter.
Let’s see how we can get the column names that start with the letter 'N'
:
# Get Column Names Starting with a Letter in Pandas
import pandas as pd
url = 'https://raw.githubusercontent.com/datagy/mediumdata/master/PandasColumns.csv'
df = pd.read_csv(url)
print(df.columns[df.columns.str.startswith('N')])
# Returns:
# Index(['Name', 'NumSold'], dtype='object')
Frequently Asked Questions
The easiest way to get a list of all column names in a Pandas DataFrame is to use list(df)
. Alternatively, you can use the df.columns
attribute.
Conclusion
In this comprehensive guide, you learned the many different ways to get column names of a Pandas DataFrame. You first learned the different ways to get all of the column names in a Pandas DataFrame. Then, you learned how to get column names in different, niche ways. For example, you learned how to get column names that match a particular data type or contain missing values.
Additional Resources
To learn more about related topics, check out the tutorials below: