Skip to content

How to Normalize NumPy Arrays

How to Normalize NumPy Arrays Cover Image

In this tutorial, you’ll learn how normalize NumPy arrays, including multi-dimensional arrays. Normalization is an important skill for any data analyst or data scientist. Normalizing a vector means that its vector magnitude is equal to 1, as a unit vector. This is an important and common preprocessing step that is used commonly in machine learning. This can be especially helpful when working with distance-based machine learning models, such as the K-Nearest Neighbor algorithm.

By the end of this tutorial, you’ll have learned:

  • How to use NumPy functions to normalize an array
  • How to normalize multi-dimensional arrays in NumPy

How to Use NumPy to Normalize a Vector

In order to normalize a vector in NumPy, we can use the np.linalg.norm() function, which returns the vector’s norm value. We can then use the norm value to divide each value in the array to get the normalized array.

We can generate a reproducible NumPy array using the np.random.rand() function, which is used to generate random values. By passing in a random seed value, we can reproduce our results:

# Generating a Random Array
import numpy as np

np.random.seed(123)
arr = np.random.rand(10)

print(arr)

# Returns:
# [0.69646919 0.28613933 0.22685145 0.55131477 0.71946897 0.42310646
#  0.9807642  0.68482974 0.4809319  0.39211752]

Because NumPy operations happen element-wise, we can apply the transformation directly to the array. Let’s see what this operation looks like in Python:

# Calculating a Vector Norm with NumPy
import numpy as np

# Generate an Array
np.random.seed(123)
arr = np.random.rand(10)

# Calculate the vector norm
vector_norm = np.linalg.norm(arr)
print(vector_norm)

# Returns: 1.8533621078442797

In the code above, we calculated the vector norm. Once we have this value calculated we can divide each value in the array to get the normalized vector.

# Normalizing a NumPy Vector
import numpy as np

np.random.seed(123)
arr = np.random.rand(10)

normalized_vector = arr / np.linalg.norm(arr)
print(normalized_vector)

# Returns:
# [0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131
#  0.5291811  0.36950671 0.2594916  0.21157092]

Normalize a NumPy Array using Sklearn

When working on machine learning projects, you may be working with sklearn. Scikit-learn comes with a function that allows you to normalize NumPy arrays. The function allows your code to be a bit more explicit than the method shown above.

Let’s see how we can use the normalize() function from Scikit-learn to normalize an array:

# Normalize a NumPy Array with Scikit-learn
import numpy as np
from sklearn.preprocessing import normalize
np.random.seed(123)
arr = np.random.rand(10)

print(normalize([arr]))

# Returns:
# [[0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131
#   0.5291811  0.36950671 0.2594916  0.21157092]]

We can see that this method returned the same array as above. It’s important to note here that the function expects multiple samples. Because of this, we reshaped the array by nested it in a list.

Normalize 2-Dimensional NumPy Arrays Using Sklearn

In this section, you’ll learn how to normalize a 2-dimensional array. We can create a reproducible array using the same function but reshaping it into multiple dimensions. Let’s see how we can do this using the reshape() method.

# Creating a 2-Dimensional NumPy Array
import numpy as np
from sklearn.preprocessing import normalize
np.random.seed(123)
arr = np.random.rand(20).reshape(2, 10)
print(arr)

# Returns:
# [[0.69646919 0.28613933 0.22685145 0.55131477 0.71946897 0.42310646
#   0.9807642  0.68482974 0.4809319  0.39211752]
#  [0.34317802 0.72904971 0.43857224 0.0596779  0.39804426 0.73799541
#   0.18249173 0.17545176 0.53155137 0.53182759]]

Now that we have our array created, we can pass the array into the normalize() function from sklearn in order to create normalized arrays:

# Normalize a 2-Dimensional Array in NumPy
import numpy as np
from sklearn.preprocessing import normalize
np.random.seed(123)
arr = np.random.rand(20).reshape(2, 10)

print(normalize(arr))

# Returns:
# [[0.37578689 0.15438933 0.12239996 0.29746738 0.38819665 0.22829131
#   0.5291811  0.36950671 0.2594916  0.21157092]
#  [0.23254994 0.49403067 0.29719255 0.04043992 0.26972931 0.5000926
#   0.12366305 0.11889251 0.3601986  0.36038577]]

Conclusion

In this tutorial, you learned how to normalize a NumPy array. Normalizing arrays allows you to more easily compare arrays of different scales. You first learned how to use purely NumPy to normalize an array. Then, you learned how to use Scikit-learn to make your code more explicit. Finally, you learned how to use Scikit-learn in order to normalize multi-dimensional arrays.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published. Required fields are marked *