4 Ways to Use Pandas to Select Columns in a Dataframe

  • by
Pandas Select Columns Cover Image
  • Save

This article explores all the different ways you can use to select columns in Pandas, including using loc, iloc, and how to create copies of dataframes. You’ll learn a ton of different tricks for selecting columns using handy follow along examples.

Let’s get started!

Why Select Columns in Python?

The data you work with in lots of tutorials has very clean data with a limited number of columns. But this isn’t true all the time.

In many cases, you’ll run into datasets that have many columns – most of which are not needed for your analysis.

In this case, you’ll want to select out a number of columns.

This often has the added benefit of using less memory on your computer (when removing columns you don’t need), as well as reducing the amount of columns you need to keep track of mentally.

Creating our Dataframe

To get started, let’s create our dataframe to use throughout this tutorial. We’ll create one that has multiple columns, but a small amount of data (to be able to print the whole thing more easily).

We’ll need to import pandas and create some data. Simply copy the code and paste it into your editor or notebook.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/datagy/pivot_table_pandas/master/select_columns.csv')
print(df.head())

This returns the following:

      Name  Age Height  Score  Random_A  Random_B  Random_C  Random_D  Random_E
0      Joe   28    5'9     30        73        59         5         4        31
1  Melissa   26    5'5     32        30        85        38        32        80
2      Nik   31   5'11     34        80        71        59        71        53
3   Andrea   33    5'6     38        16        63        86        81        42
4     Jane   32    5'8     29        19        40        48         5        68

Let’s take a quick look at what makes up a dataframe in Pandas:

What is Pandas Dataframe
  • Save

Using loc to Select Columns

The loc function is a great way to select a single column or multiple columns in a dataframe if you know the column name(s).

This method is great for:

  • Selecting columns by column name,
  • Selecting rows along columns,
  • Selecting columns using a single label, a list of labels, or a slice

The loc method looks like this:

Pandas Select Columns with loc
  • Save

Now, if you wanted to select only the name column and the first three rows, you would write:

selection = df.loc[:2,'Name']
print(selection)

This returns:

0        Joe
1    Melissa
2        Nik

You’ll probably notice that this didn’t return the column header.

Note: Indexes in Pandas start at 0. That means if you wanted to select the first item, we would use position 0, not 1.

If you wanted to select multiple columns, you can include their names in a list:

selection = df.loc[:2,['Name', 'Age', 'Height', 'Score']]
print(selection)

This returns:

      Name  Age Height  Score
0      Joe   28    5'9     30
1  Melissa   26    5'5     32
2      Nik   31   5'11     34

Additionally, you can slice columns if you want to return those columns as well as those in between. The same code we wrote above, can be re-written like this:

selection = df.loc[:2,'Name':'Score']
print(selection)

This returns:

      Name  Age Height  Score
0      Joe   28    5'9     30
1  Melissa   26    5'5     32
2      Nik   31   5'11     34

Now, let’s take a look at the iloc method for selecting columns in Pandas.

Using iloc to Select Columns

The iloc function is one of the primary way of selecting data in Pandas. The method “iloc” stands for integer location indexing, where rows and columns are selected using their integer positions.

This method is great for:

  • Selecting columns by column position (index),
  • Selecting rows along with columns,
  • Selecting columns using a single position, a list of positions, or a slice of positions

The standard format of the iloc method looks like this:

Pandas Select Columns with iloc
  • Save

Now, for example, if we wanted to select the first two rows and first three columns of our dataframe, we could write:

selection = df.iloc[:2,:2]
print(selection)

This returns:

      Name  Age
0      Joe   28
1  Melissa   26

Note that we didn’t write df.iloc[0:2,0:2], but that would have yielded the same result.

If we wanted to select all columns with iloc, we could do that by writing:

selection = df.iloc[:2,]
print(selection)

This returns:

      Name  Age Height  Score  Random_A  Random_B  Random_C  Random_D  Random_E
0      Joe   28    5'9     30        73        59         5         4        31
1  Melissa   26    5'5     32        30        85        38        32        80

Similarly, we could select all rows by leaving out the first values (but including a colon before the comma).

selection = df.iloc[:,:2]
print(selection)

This returns:

      Name  Age
0      Joe   28
1  Melissa   26
2      Nik   31
3   Andrea   33
4     Jane   32

Select a Single Column in Pandas

Now, if you want to select just a single column, there’s a much easier way than using either loc or iloc.

This can be done by selecting the column as a series in Pandas. You can pass the column name as a string to the indexing operator.

For example, to select only the Name column, you can write:

selection = df['Name']
print(selection)

Doing this, this returns the following:

0        Joe
1    Melissa
2        Nik
3     Andrea
4       Jane

Similarly, you can select columns by using the dot operator. To do the same as above using the dot operator, you could write:

selection = df.Name
print(selection)

This returns the same as above:

0        Joe
1    Melissa
2        Nik
3     Andrea
4       Jane

However, using the dot operator is often not recommended (while it’s easier to type). This is because you can’t:

  1. Select columns with spaces in the name,
  2. Use columns that have the same names as dataframe methods (such as ‘type’),
  3. Pick columns that aren’t strings, and
  4. Select multiple columns.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Select Multiple Columns in Pandas

Similar to the code you wrote above, you can select multiple columns.

To do this, simply wrap the column names in double square brackets.

If you wanted to select the Name, Age, and Height columns, you would write:

selection = df[['Name', 'Age', 'Height']]
print(selection)

This returns:

      Name  Age Height
0      Joe   28    5'9
1  Melissa   26    5'5
2      Nik   31   5'11
3   Andrea   33    5'6
4     Jane   32    5'8

What’s great about this method, is that you can return columns in whatever order you want. If you wanted to switch the order around, you could just change it in your list:

selection = df[['Name', 'Height', 'Age']]
print(selection)

Which returns:

      Name Height  Age
0      Joe    5'9   28
1  Melissa    5'5   26
2      Nik   5'11   31
3   Andrea    5'6   33
4     Jane    5'8   32

Copying Columns vs. Selecting Columns

Something important to note for all the methods covered above, it might looks like fresh dataframes were created for each. However, that’s not the case!

In Python, the equal sign (“=”), creates a reference to that object.

Because of this, you’ll run into issues when trying to modify a copied dataframe.

In order to avoid this, you’ll want to use the .copy() method to create a brand new object, that isn’t just a reference to the original.

To accomplish this, simply append .copy() to the end of your assignment to create the new dataframe.

For example, if we wanted to create a filtered dataframe of our original that only includes the first four columns, we could write:

new_df = df.iloc[:,:4].copy()
print(new_df)

This results in this code below:

      Name  Age Height  Score
0      Joe   28    5'9     30
1  Melissa   26    5'5     32
2      Nik   31   5'11     34
3   Andrea   33    5'6     38
4     Jane   32    5'8     29

This is incredibly helpful if you want to work the only a smaller subset of a dataframe.

Conclusion: Using Pandas to Select Columns

Thanks for reading all the way to end of this tutorial!

Using follow-along examples, you learned how to select columns using the loc method (to select based on names), the iloc method (to select based on column/row numbers), and, finally, how to create copies of your dataframes.

You also learned how to make column selection easier, when you want to select all rows.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook!