In this post, you’ll learn how to calculate the Pandas mean (average) for one column, multiple columns, or an entire dataframe. You’ll also learn how to skip `na`

values or include them in your calculation.

Table of Contents

## Loading a Sample Dataframe

If you want a sample dataframe to follow along with, load the sample dataframe below. The data represents people’s salaries over a period of four years:

import pandas as pd df = pd.DataFrame.from_dict( { 'Year': [2018, 2019, 2020, 2021], 'Carl': [1000, 2300, 1900, 3400], 'Jane': [1500, 1700, 1300, 800], 'Melissa': [800, 2300, None, 2300] } ).set_index('Year') print(df)

This returns the following dataframe:

Carl Jane Melissa Year 2018 1000 1500 800.0 2019 2300 1700 2300.0 2020 1900 1300 NaN 2021 3400 800 2300.0

## Pandas Mean on a Single Column

It’s very easy to calculate a mean for a single column. We can simply call the `.mean()`

method on a single column and it returns the mean of that column.

For example, let’s calculate the average salary Carl had over the years:

>>> carl = df['Carl'].mean() >>> print(carl) 2150.0

We can see here that Carl’s average salary over the four years has been `2150`

.

## Pandas Mean on a Row

Now, say you wanted to calculate the average for a dataframe row. We can do this by simply modifying the `axis=`

parameter.

Let’s say we wanted to return the average for everyone’s salaries for the year 2018. We can access the 2018 row data by using `.loc`

(which you can learn more about by checking out my tutorial here).

YOUTUBE: https://www.youtube.com/watch?v=VIa1ETYnFuc

>>> year_2018 = df.loc[2018,:].mean() >>> print(year_2018) 1100

Now, alternatively, you could return the mean for everyone row. You can do this by not including the row selection and modifying the `axis=`

parameter.

Let’s give this a shot:

row_averages = df.mean(axis=1) print(row_averages)

This returns the following series:

Year 2018 1100.000000 2019 2100.000000 2020 1600.000000 2021 2166.666667 dtype: float64

## Pandas Average on Multiple Columns

If you wanted to calculate the average of multiple columns, you can simply pass in the `.mean()`

method to multiple columns being selected.

In the example below, we return the average salaries for Carl and Jane. Note that you need to use double square brackets in order to properly select the data:

averages = df[['Carl', 'Jane']].mean() print(averages)

This returns the following:

Carl 2150.0 Jane 1325.0 dtype: float64

## Pandas Mean on Entire Dataframe

Finally, if you wanted to return the mean for *every* column in a Pandas dataframe, you can simply apply the `.mean()`

method to the entire dataframe.

Let’s give this a shot by writing the code below:

>>> entire_dataframe = df.mean() >>> print(entire_dataframe) Carl 2150.0 Jane 1325.0 Melissa 1800.0 dtype: float64

Now you’re able to calculate the mean for the entire dataframe.

## Include NAs in Calculating Pandas Mean

One important thing to note is that by default, missing values will be excluded from calculating means. It thereby treats a missing value, rather than a 0.

If you wanted to calculate the mean by *including* missing values, you could first assign values using the Pandas `.fillna()`

method. Check out my tutorial here to learn more:

Let’s calculate the mean with both including and excluding the missing value in Melissa’s column:

>>> print(df['Melissa'].mean()) >>> print(df['Melissa'].fillna(0).mean()) 1800.0 1350.0

## Use Pandas Describe to Calculate Means

Finally, let’s use the Pandas `.describe()`

method to calculate the mean (as well as some other helpful statistics). To learn more about the Pandas `.describe()`

method, check out my tutorial here.

Let’s see how we can get the mean and some other helpful statistics:

>>> print(df.describe()) Carl Jane Melissa count 4.000000 4.000000 3.000000 mean 2150.000000 1325.000000 1800.000000 std 994.987437 386.221008 866.025404 min 1000.000000 800.000000 800.000000 25% 1675.000000 1175.000000 1550.000000 50% 2100.000000 1400.000000 2300.000000 75% 2575.000000 1550.000000 2300.000000 max 3400.000000 1700.000000 2300.000000

If you only wanted to return the mean, you could simply use the `.loc`

accessor to access the data:

>>> print(df.describe().loc['mean']) Carl 2150.0 Jane 1325.0 Melissa 1800.0 Name: mean, dtype: float64

## Conclusion

In this post, you learned how to calculate the Pandas mean, using the `.mean()`

method. You learned how to calculate a mean based on a column, a row, multiple columns, and the entire dataframe. Additionally, you learned how to calculate the mean by including missing values.

To learn more about the Pandas `.mean()`

method, check out the official documentation here.