Pandas Mean: Calculate the Pandas Average • datagy

In this post, you’ll learn how to calculate the Pandas mean (average) for one column, multiple columns, or an entire dataframe. You’ll also learn how to skip na values or include them in your calculation.

Table of Contents

Loading a Sample Dataframe

If you want a sample dataframe to follow along with, load the sample dataframe below. The data represents people’s salaries over a period of four years:

import pandas as pd
df = pd.DataFrame.from_dict(
    {
        'Year': [2018, 2019, 2020, 2021],
        'Carl': [1000, 2300, 1900, 3400],
        'Jane': [1500, 1700, 1300, 800],
        'Melissa': [800, 2300, None, 2300]
    }
).set_index('Year')

print(df)

This returns the following dataframe:

      Carl  Jane  Melissa
Year
2018  1000  1500    800.0
2019  2300  1700   2300.0
2020  1900  1300      NaN
2021  3400   800   2300.0

Pandas Mean on a Single Column

It’s very easy to calculate a mean for a single column. We can simply call the .mean() method on a single column and it returns the mean of that column.

For example, let’s calculate the average salary Carl had over the years:

>>> carl = df['Carl'].mean()
>>> print(carl)

2150.0

We can see here that Carl’s average salary over the four years has been 2150.

Pandas Mean on a Row

Now, say you wanted to calculate the average for a dataframe row. We can do this by simply modifying the axis= parameter.

Let’s say we wanted to return the average for everyone’s salaries for the year 2018. We can access the 2018 row data by using .loc (which you can learn more about by checking out my tutorial here).

YOUTUBE: https://www.youtube.com/watch?v=VIa1ETYnFuc

>>> year_2018 = df.loc[2018,:].mean()
>>> print(year_2018)

1100

Now, alternatively, you could return the mean for everyone row. You can do this by not including the row selection and modifying the axis= parameter.

Let’s give this a shot:

row_averages = df.mean(axis=1)
print(row_averages)

This returns the following series:

Year
2018    1100.000000
2019    2100.000000
2020    1600.000000
2021    2166.666667
dtype: float64

Pandas Average on Multiple Columns

If you wanted to calculate the average of multiple columns, you can simply pass in the .mean() method to multiple columns being selected.

In the example below, we return the average salaries for Carl and Jane. Note that you need to use double square brackets in order to properly select the data:

averages = df[['Carl', 'Jane']].mean()
print(averages)

This returns the following:

Carl    2150.0
Jane    1325.0
dtype: float64

Pandas Mean on Entire Dataframe

Finally, if you wanted to return the mean for every column in a Pandas dataframe, you can simply apply the .mean() method to the entire dataframe.

Let’s give this a shot by writing the code below:

>>> entire_dataframe = df.mean()
>>> print(entire_dataframe)

Carl       2150.0
Jane       1325.0
Melissa    1800.0
dtype: float64

Now you’re able to calculate the mean for the entire dataframe.

Include NAs in Calculating Pandas Mean

One important thing to note is that by default, missing values will be excluded from calculating means. It thereby treats a missing value, rather than a 0.

If you wanted to calculate the mean by including missing values, you could first assign values using the Pandas .fillna() method. Check out my tutorial here to learn more:

Let’s calculate the mean with both including and excluding the missing value in Melissa’s column:

>>> print(df['Melissa'].mean())
>>> print(df['Melissa'].fillna(0).mean())

1800.0
1350.0

Use Pandas Describe to Calculate Means

Finally, let’s use the Pandas .describe() method to calculate the mean (as well as some other helpful statistics). To learn more about the Pandas .describe() method, check out my tutorial here.

Let’s see how we can get the mean and some other helpful statistics:

>>> print(df.describe())

              Carl         Jane      Melissa
count     4.000000     4.000000     3.000000
mean   2150.000000  1325.000000  1800.000000
std     994.987437   386.221008   866.025404
min    1000.000000   800.000000   800.000000
25%    1675.000000  1175.000000  1550.000000
50%    2100.000000  1400.000000  2300.000000
75%    2575.000000  1550.000000  2300.000000
max    3400.000000  1700.000000  2300.000000

If you only wanted to return the mean, you could simply use the .loc accessor to access the data:

>>> print(df.describe().loc['mean'])

Carl       2150.0
Jane       1325.0
Melissa    1800.0
Name: mean, dtype: float64

Conclusion

In this post, you learned how to calculate the Pandas mean, using the .mean() method. You learned how to calculate a mean based on a column, a row, multiple columns, and the entire dataframe. Additionally, you learned how to calculate the mean by including missing values.

To learn more about the Pandas .mean() method, check out the official documentation here.

Pandas Mean: Calculate Pandas Average for One or Multiple Columns

Loading a Sample Dataframe

Pandas Mean on a Single Column

Pandas Mean on a Row

Pandas Average on Multiple Columns

Pandas Mean on Entire Dataframe

Include NAs in Calculating Pandas Mean

Use Pandas Describe to Calculate Means

Conclusion

Nik Piepenbreier

Leave a Reply Cancel reply

Pandas Mean: Calculate Pandas Average for One or Multiple Columns

Loading a Sample Dataframe

Pandas Mean on a Single Column

Pandas Mean on a Row

Pandas Average on Multiple Columns

Pandas Mean on Entire Dataframe

Include NAs in Calculating Pandas Mean

Use Pandas Describe to Calculate Means

Conclusion

Nik Piepenbreier

Leave a Reply Cancel reply

Thank you!