The Standard Deviation is a measure that describes how spread out values in a data set are. **In Python, Standard Deviation can be calculated in many ways – the easiest of which is using either Statistics’ or NumPy’s standard deviation np.std() function.**

In this tutorial, you’ll learn what the standard deviation is, how to calculate it using built-in functions, and how to use Python to generate the statistics from scratch!

## What is Standard Deviation?

**Standard deviation is a helpful way to measure how “spread out” values in a data set are.**

But how do you interpret a standard deviation?

**A small standard deviation means that most of the numbers are close to the mean (average) value. However, a large standard deviation means that the values are further away from the mean.**

Without it, you wouldn’t be able to easily and effectively dive into data sets. Two data sets could have the same average value but could be entirely different in terms of how those values are distributed. This is where the standard deviation is important.

The standard deviation formula looks like this:

σ = √Σ (x_{i} – μ)^{2} / (n-1)

Let’s break this down a bit:

- σ (“sigma”) is the symbol for standard deviation
- Σ is a fun way of writing “sum of”
- x
_{i}represents every value in the data set - μ is the mean (average) value in the data set
- n is the sample size

## Why is the Standard Deviation Important?

As explained above,** standard deviation is a key measure that explains how spread out values are in a data set**. A small standard deviation happens when data points are fairly close to the mean. However, a large standard deviation happens when values are less clustered around the mean.

A data set can have the same mean as another data set, but be very different. Let’s take a look at this with an example:

- Data set #1 = [1,1,1,1,1,1,1,1,2,10]
- Data set #2 = [2,2,2,2,2,2,2,2,2,2]

Both of these datasets have the same average value (2), but are actually very different.

We’ll get back to these examples later when we calculate standard deviation to illustrate this point.

## How to Calculate Standard Deviation in Python?

**The easiest way to calculate standard deviation in Python is to use either the statistics module or the Numpy library.**

### Using the Statistics Module

The statistics module has a built-in function called stdev, which follows the syntax below:

`standard_deviation = stdev([data], xbar)`

- [data] is a set of data points
- xbar is a boolean parameter (either True or False), to take the actual mean of the data set as a value

Let’s try this with an example:

```
import statistics
sample = [1,2,3,4,5,5,5,5,10]
standard_deviation = statistics.stdev(sample)
print(standard_deviation)
# Returns 2.55
```

### Using Numpy to Calculate Standard Deviation

Numpy has a function named `np.std()`

, which is used to calculate the standard deviation of a sample. The function uses the following syntax:

```
np.std(
[data], # The data to use
ddof=1 # The degrees of freedom to use
)
```

The formula takes two parameters:

`data`

is the sample of data`ddof`

is a value of degrees of freedom. We apply 1, since we are calculating the standard deviation for a sample (rather than an entire population)

Now, let’s try this with an example:

```
import numpy as np
sample = [1,2,3,4,5,5,5,5,10]
standard_deviation = np.std(sample, ddof=1)
print(standard_deviation)
# Returns 2.55
```

In the next section, you’ll learn how to calculate a standard deviation for a list.

## Calculate Standard Deviation for List

To calculate the standard deviation for a list that holds values of a sample, we can use either method we explored above. For this example, let’s use Numpy:

```
import numpy as np
sample_list = [10,30,43,23,67,49,78,98]
standard_deviation = np.std(sample_list, ddof=1)
print(standard_deviation)
# Returns 29.65
```

In the example above, we pass in a list of values into the `np.std()`

function. This converts the list to a NumPy array and then calculates the standard deviation. NumPy handles converting the list to an array implicitly to streamline the process of calculating a standard deviation.

## Calculate Standard Deviation for Dictionary Values

To calculate the standard deviation for dictionary values in Python, you need to let Python know you only want the values of that dictionary.

For the example below, we’ll be working with peoples’ heights in centimetres and calculating the standard deviation:

```
import numpy as np
sample_dictionary = {
'John': 170,
'Meaghan': 155,
'Kate': 160,
'Peter': 185,
'Jane': 145
}
standard_deviation = np.std(list(sample_dictionary.values()), ddof=1)
print(standard_deviation)
# Returns 15.25
```

This is very similar, except we use the list function to turn the dictionary values into a list. This can be very helpful when working with data extracted from an API where data are often stored in the JSON format.

## Pandas Standard Deviation

If you are working with Pandas, you may be wondering if Pandas has a function for standard deviations. **Pandas lets you calculate a standard deviation for either a series, or even an entire Pandas DataFrame.**

The standard syntax looks like this:

```
df.std(
self,
axis=None,
skipna=None,
level=None,
ddof=1,
numeric_only=None
)
```

Let’s explore these parameters:

`axis`

is either 0 for index or 1 for columns`skipna`

is used to include/exclude null/NA values in the calculation`level`

determines if the axis is a multi-index and tells Pandas which level to count`ddof`

defaults to 1 as the formula is used for samples`numeric_only`

includes only numeric values in the calculation

Let’s try this out with an example, using peoples’ heights and weights:

```
import pandas as pd
dataframe_dictionary = {
'Name': ['John', 'Meaghan', 'Kate', 'Peter', 'Jane'],
'Height': [170,155,160,185,145],
'Weight': [160, 120, 125, 200, 135]
}
df = pd.DataFrame(data = dataframe_dictionary)
standard_deviation = df.std()
print(standard_deviation)
# Returns
# Height 15.247951
# Weight 32.901368
```

If you wanted to return the standard distribution only for one column, say `'height'`

, you could write:

```
import pandas as pd
dataframe_dictionary = {
'Name': ['John', 'Meaghan', 'Kate', 'Peter', 'Jane'],
'Height': [170,155,160,185,145],
'Weight': [160, 120, 125, 200, 135]
}
df = pd.DataFrame(data = dataframe_dictionary)
standard_deviation = df['Height'].std()
print(standard_deviation)
# Returns 15.247951
```

You can learn more about the Pandas `pd.std()`

function by checking out the official documentation here.

## Python Standard Deviation From Scratch

For our final example, let’s build the standard deviation from scratch, the see what is real going on.

To begin, let’s take another look at the formula:

σ = √Σ (x_{i} – μ)^{2} / (n-1)

In the code below, the steps needed are broken out:

```
import math
sample_list = [170,155,160,185,145]
# Need: (1) mean value, (2) difference between each value and mean, squared, (3) sample size
# Finding Mean value
sums = 0
for i in range(len(sample_list)):
sums += sample_list[i]
mean = sums / len(sample_list)
# Finding square of difference of mean and each value
difference_squared = 0
for i in range(len(sample_list)):
difference_squared += (sample_list[i] - mean) ** 2
# Finding Square Root
standard_deviation = math.sqrt(difference_squared / ((len(sample_list)) - 1))
print(standard_deviation)
# Returns 15.25
```

## Conclusion

In this post, we learned all about the standard deviation. We started off by learning what it is and how it’s calculated, and why it’s significant. Then, we learned how to calculate the standard deviation in Python, using the statistics module, Numpy, and finally applying it to Pandas. We closed the tutorial off by demonstrating how the standard deviation can be calculated from scratch using basic Python!

## Additional Resources

To learn more about related topics, check out the tutorials below:

Pingback: Pandas Quantile: Calculate Percentiles of a Dataframe • datagy

Pingback: Normalize a Pandas Column or Dataframe (w/ Pandas or sklearn) • datagy

Pingback: How to Calculate a Z-Score in Python (4 Ways) • datagy