Skip to content

Pandas: Iterate over a Pandas Dataframe Rows

Pandas Iterate over Dataframe Rows Cover Image

In this tutorial, you’ll learn how to use Python and Pandas to iterate over a Pandas dataframe rows.

The tutorial will begin by explore why iterating over Pandas dataframe rows is often not necessary and is often much slower than alternatives like vectorization. That being said, there are times where you may need to iterate over a Pandas dataframe rows – because of this, we’ll explore four different methods by which you can do this. You’ll learn how to use the Pandas .iterrows().itertuples(), and .items() methods. You’ll also learn how to use Python for loops to loop over each row in a Pandas dataframe.

The Quick Answer: Use Pandas .iterrows()

Quick Answer - Iterate over a Pandas Dataframe Rows

Why Iterating Over Pandas Dataframe Rows is a Bad Idea

Pandas itself warns against iterating over dataframe rows. The official documentation indicates that in most cases it actually isn’t needed, and any dataframe over 1,000 records will begin noticing significant slow downs. Pandas recommends using either vectorization if possible. If, however, you need to apply a specific formula, then using the .apply() method is an attactive alternative.

While iterating over rows may seem like a logical tool for those coming from tools like Excel, however, many processes can be much better applied. Iterating over rows, unless necessary, is a bad habit to fall into.

In order of preference, my recommended approach is to:

  1. Vectorize if possible,
  2. Use the .apply() method if you need to apply a function that requires row-level information

The alternatives listed above are much more idiomatic and easier to read. While using the .apply() method is slower than vectorization, it can often be easier for beginners to wrap their heads around.

Loading a Sample Pandas Dataframe

If you want to follow along with a sample dataframe, feel free to copy the code below. We’ll load a small dataframe so that we can print it out in its entirety. You likely won’t encounter any major performance hiccups running this dataframe, but they’ll become more and more noticeable as your dataset grows.

Let’s start by loading the data and printing it out.

import pandas as pd

df = pd.DataFrame.from_dict(
    {
        'Year': [2018, 2019, 2020, 2021],
        'Sales': [1000, 2300, 1900, 3400],
    }
)

print(df)

# Returns:
#    Year  Sales
# 0  2018   1000
# 1  2019   2300
# 2  2020   1900
# 3  2021   3400

In the next section, you’ll learn how to vectorize your dataframe operations in order to save some memory and time!

How to Vectorize Instead of Iterating Over Rows

In this section, you’ll learn (albeit, very briefly), how to vectorize a dataframe operation.

In the example below, you’ll learn how to square a number in a column. If you were to iterate over each row, you would perform the calculation as many times as there are records in the column. By vectorizing, however, you can apply a transformation directly to a column.

Let’s see what vectorization looks like by using some Python code:

df['Sales Squared'] = df['Sales'] ** 2

print(df)

# Returns:
#    Year  Sales  Sales Squared
# 0  2018   1000        1000000
# 1  2019   2300        5290000
# 2  2020   1900        3610000
# 3  2021   3400       11560000

Now that you know how to apply vectorization to a data, let’s explore how to use the Pandas .iterrows() method to iterate over a Pandas dataframe rows.

How to Use Pandas iterrows to Iterate over a Dataframe Rows

To actually iterate over Pandas dataframes rows, we can use the Pandas .iterrows() method. The method generates a tuple-based generator object. This means that each tuple contains an index (from the dataframe) and the row’s values. One important this to note here, is that .iterrows() does not maintain data types. If you want to maintain data types, check out the next section on .itertuples().

Let’s see how the .iterrows() method works:

# Use .iterrows() to iterate over Pandas rows
for idx, row in df.iterrows():
    print(idx, row['Year'], row['Sales'])

# Returns:
# 0 2018 1000
# 1 2019 2300
# 2 2020 1900
# 3 2021 3400

As you can see, the method above generates a tuple, which we can unpack. The first item contains the index of the row and the second is a Pandas series containing the row’s data.

The .iterrows() method is quite slow because it needs to generate a Pandas series for each row.

in the next section, you’ll learn how to use the .itertuples() method to loop over a Pandas dataframe’s rows.

How to Use Pandas itertuples to Iterate over a Dataframe Rows

The .itertuples() is an interesting method that, like the .iterrows() method, returns a generator object of each row in a Pandas dataframe.

Unlike the previous method, the .itertuples() method returns a named tuple for each row in the dataframe. A named tuple is much like a normal tuple, only that each item is given an attribute name.

Let’s take a look at what this looks like by printing out each named tuple returned by the .itertuples() method:

# Use .iterrows() to iterate over dataframe rows
for row in df.itertuples():
    print(row)

# Returns:
# Pandas(Index=0, Year=2018, Sales=1000)
# Pandas(Index=1, Year=2019, Sales=2300)
# Pandas(Index=2, Year=2020, Sales=1900)
# Pandas(Index=3, Year=2021, Sales=3400)

We can see that each item in the tuple is given an attribute name. We can access the tuples’ items by calling its attribute.

Let’s see how we can print out each row’s Year attribute in Python:

# Use .itertuples() to iterate over dataframe rows
for row in df.itertuples():
    print(row.Year)

# Returns:
# 2018
# 2019
# 2020
# 2021

In the next section, you’ll learn how to use the .items() method to loop over a dataframe’s items in Pandas.

How to Use Pandas items to Iterate over a Dataframe Rows

The Pandas .items() method lets you access each item in a Pandas row. It generates generator objects for each column and their items.

This, of course, takes even longer as it first needs to generate a generator, not just for each row, but for each column.

Let’s take a look at what this looks like:

# Use .items() to iterate over dataframe rows
for column_name, data in df.items():
    print(column_name, data)

# Returns:
# Sales 0    1000
# 1    2300
# 2    1900
# 3    3400
# Name: Sales, dtype: int64

In the next section, you’ll learn how to use a Python for loop to loop over a Pandas dataframe’s rows.

How to Use a For Loop to Iterate over a Pandas Dataframe Rows

In this final section, you’ll learn how to use a Python for loop to loop over a Pandas dataframe’s rows.

We can use the Pandas .iloc accessor to access different rows while looping over the length of the for loop.

Let’s see what this method looks like in Python:

for i in range(len(df)):
    print(df.iloc[i, :])

# Returns:
# Year     2018
# Sales    1000
# Name: 0, dtype: int64
# Year     2019
# Sales    2300
# Name: 1, dtype: int64
# Year     2020
# Sales    1900
# Name: 2, dtype: int64
# Year     2021
# Sales    3400
# Name: 3, dtype: int64

You could also access just a column, or a set of columns, by not just using the :. To learn more about the iloc accessor, check out my in-depth tutorial here.

Conclusion

In this tutorial, you learned all about iterating over rows in a Pandas dataframe. You began by learning why iterating over a dataframe row by row is a bad idea, and why vectorization is a much better alternative for most tasks. You also learned how to iterate over rows in a Pandas dataframe using three different dataframe methods as well as a for loop using the dataframe index.

To learn more about the Pandas .iterrows() method, check out the official documentation here.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

1 thought on “Pandas: Iterate over a Pandas Dataframe Rows”

  1. Pingback: Pandas Shift: Shift a Dataframe Column Up or Down • datagy

Leave a Reply

Your email address will not be published. Required fields are marked *