In this post, you’ll learn how to get Pandas columns names as list. You’ll learn a number of different methods including how to return a list of column names and a list that’s sorted alphabetically. You’ll learn which of the methods is fastest, even if it’s not the fastest to write. Finally, you’ll learn to check if a column exists in a dataframe.
Table of Contents
Loading a Sample Dataframe
To follow along with this tutorial, load the sample dataframe provided below by copying this code into your favourite text editor! We use the Seaborn .load_dataset()
function to load a built-in dataset.
If you don’t yet have Seaborn installed, you can install it using pip install seaborn
in your terminal. To learn more about Seaborn, check out my tutorial series here.
import pandas as pd from seaborn import load_dataset df = load_dataset('penguins') print(df.head())
This returns the following dataframe:
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female 3 Adelie Torgersen NaN NaN NaN NaN NaN 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!
Get Pandas Column Names
Pandas provides a very helpful attribute, the .columns
attribute, to access column names. By default, this returns an object of type Index
, which isn’t immediately iterable.
Before we dive any further, let’s take a look at what the .columns
attribute returns:
print(df.columns)
This returns:
Index(['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex'], dtype='object')
Get Column Names as a List
To get a list of Pandas column names, we can simply turn our Index
object into a list by using Python’s list()
function.
Let’s take a look at how this works:
>>> print(list(df.columns)) ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
Similarly, you could use the Pandas .tolist()
method, which works as below:
>>> print(df.columns.tolist()) ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
Alternatively, you could also use a much less efficient (though, really only noticeable for much larger dataframes) method:
>>> print(list(df)) ['species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'sex']
Get Pandas Column Names as a Sorted List Alphabetically
Now that we have a list of dataframe column names, we can sort this list alphabetically. To accomplish this, we can use the Python sorted()
function.
By default, Python will sort a list with capitalization in mind. If you don’t want this to happen, you can use the key=str.lower
attribute.
Let’s see how this looks:
>>> print(sorted(list(df.columns))) ['bill_depth_mm', 'bill_length_mm', 'body_mass_g', 'flipper_length_mm', 'island', 'sex', 'species']
If the capitalization is causing you issues, simply write the following:
>>> print(sorted(list(df.columns), key=str.lower))
Check if a Column Exists in a Pandas Dataframe
To see if a column exists in a Pandas dataframe, we can use the Python in
operator. This returns a boolean, specifically a True
value if an item exists in the list.
Now, let’s see if the column species
exists in our dataframe:
>>> print('species' in df.columns) True
Similarly, if we wanted to see if the column age
exists in a dataframe, we can write:
>>> print('age' in df.columns) False
Conclusion
In this post, you learned how to get a list of columns from a Pandas dataframe. You learned different ways to create a list out of this, sort that list alphabetically, and see whether or not a column exists in a given Pandas dataframe.
To learn about the Pandas .columns
attribute, check out the official documentation here.