Skip to content

Convert a Pandas DataFrame to a NumPy Array

Pandas provides simplicity and flexibility in converting a Pandas DataFrame to a NumPy array. Being able to convert between the Pandas format into a NumPy format is a versatile skill for any data analyst or data scientist. The Pandas .to_numpy() method provides flexibility in transferring between formats in a modern API.

By the end of this tutorial, you’ll have learned:

  • How to use the .to_numpy() method to convert a Pandas DataFrame to a NumPy array,
  • How to set the data types when converting to a NumPy array
  • How to work with missing values when converting a DataFrame to a NumPy array

Understanding the Pandas to_numpy Method

The Pandas to_numpy method provides the most convenient and Pythonic way to convert a Pandas DataFrame or Series to a NumPy array. The method provides three different parameters, all of which have default arguments. This means that you can run the method without needing to pass in additional information.

Let’s take a look at the make-up of the Pandas .to_numpy() method:

# Understanding the Pandas .to_numpy() Method
import pandas as pd
df = pd.DataFrame()
df.to_numpy(dtype=None, copy=False, na_value=_NoDefault.no_default)

The table below breaks down the parameters of the method as well as the default arguments of each parameter:

ParameterDescriptionDefault ArgumentAccepted Value
dtype=The data type passed to the arrayNonestring or data type
copy=Whether to return a value that is not a view on another array.Falsebool
na_value=The value to use for missing values.No defaultany
The parameters and default arguments of the Pandas to_numpy() method

Now that you have a strong understanding of the Pandas .to_numpy() method, let’s explore how to convert a Pandas DataFrame to a NumPy array.

Converting a Pandas DataFrame to a NumPy Array

The Pandas .to_numpy() method provides the simplest and most Pandas DataFrames to NumPy arrays. While you previously had to access the values and convert them explicitly, the method now provides a simple way to convert DataFrames into NumPy arrays.

Let’s load a Pandas DataFrame of floats and convert it to a Pandas DataFrame:

# Convert a Pandas DataFrame to a NumPy Array
import pandas as pd
df = pd.DataFrame({
    'A': [1.2, 2.3, 3.4],
    'B': [4.5, 5.6, 6.7]
})

df_to_numpy = df.to_numpy()
print(df_to_numpy)

# Returns:
# [[1.2 4.5]
#  [2.3 5.6]
#  [3.4 6.7]]

Let’s break down what we did in the code block above:

  1. We created a Pandas DataFrame containing floating point values
  2. We then created a new variable, df_to_numpy, which is the result of applying the Pandas .to_numpy() method to the DataFrame
  3. We printed the newly created NumPy array

In the next section, we’ll explore how to work with heterogeneous DataFrames.

Handling Data Types When Converting a Pandas DataFrame to a NumPy Array

One of the notable characteristics of Pandas DataFrames is that they can contain heterogeneous data. However, this is not true for NumPy arrays. Because of this, applying the Pandas .to_numpy() method to a heterogenous DataFrame will result in using the lowest data type. This means that in a DataFrame with an integer column and a floating point column, the resulting data type will be a floating point.

Let’s see what this looks like:

# Coercing Data Types When Converting a DataFrame to a NumPy Array
import pandas as pd
df = pd.DataFrame({
    'A': [1.2, 2.3, 3.4],
    'B': [1, 2, 3]
})

df_to_numpy = df.to_numpy()
print(df_to_numpy)

# Returns:
# [[1.2 1. ]
#  [2.3 2. ]
#  [3.4 3. ]]

We can also specify the data type, by making use of the dtype= parameter. Let’s coerce all of our values to be integers:

# Specify Data Types When Converting a DataFrame to a NumPy array
import pandas as pd
df = pd.DataFrame({
    'A': [1.2, 2.3, 3.4],
    'B': [1, 2, 3]
})

df_to_numpy = df.to_numpy(dtype='int')
print(df_to_numpy)

# Returns:
# [[1 1]
#  [2 2]
#  [3 3]]

In the next section, you’ll learn how to handle missing data when converting a Pandas DataFrame.

Handling Missing Data When Converting a Pandas DataFrame to a NumPy Array

When Pandas converts a DataFrame to a NumPy array with missing values, it uses the data type of the array to identify what the missing value should be.

However, you can also coerce missing values to be represented using a custom value. This can be done using the na_value= parameter, which will adapt the argument value to the data type of the array.

Let’s see what this looks like by passing in the value of -99 into the na_value= parameter:

# Specify Missing Values When Converting DataFrames to Arrays
import pandas as pd
df = pd.DataFrame({
    'A': [1.2, 2.3, 3.4],
    'B': [4.5, 5.6, None]
})

df_to_numpy = df.to_numpy(na_value=-99)
print(df_to_numpy)

# Returns:
# [[  1.2   4.5]
#  [  2.3   5.6]
#  [  3.4 -99. ]]

In the next section, you’ll learn how to convert only some Pandas DataFrame columns.

Convert Only Some Pandas DataFrame Columns to a NumPy Array

So far, you have learned how to convert an entire Pandas DataFrame to a NumPy array. However, there may be times when you want to convert only some columns. We can do this selecting only specific columns and applying the method to these columns.

Let’s create a new DataFrame and select only the first two columns:

# Convert Select Pandas Columns to a NumPy Array
import pandas as pd
df = pd.DataFrame({
    'A': [1,2,3],
    'B': [4,5,6],
    'C': [7,8,9]
})

df_to_numpy = df[['A', 'B']].to_numpy()
print(df_to_numpy)

# Returns:
# [[1 4]
#  [2 5]
#  [3 6]]

We can see that only those columns were converted. In the following section, you’ll learn how to convert a Pandas Series to a NumPy array.

Convert a Single Pandas DataFrame Column (Series) to a NumPy Array

So far, we have been using the Pandas .to_numpy() method on DataFrames. However, it also works as a Series method. This allows us to convert a single Pandas Series object into a NumPy array. The process works in the same way as the methods illustrated above. Let’s convert a single column into a NumPy array:

# Convert a Pandas Series to a NumPy Array
import pandas as pd
df = pd.DataFrame({
    'A': [1,2,3],
    'B': [4,5,6],
    'C': [7,8,9]
})

df_to_numpy = df['A'].to_numpy()
print(df_to_numpy)

# Returns:
# [1 2 3]

Frequently Asked Questions

How do you convert a Pandas DataFrame to a NumPy array?

To convert a Pandas DataFrame to a NumPy array, you can apply the .to_numpy() method, which will return a number array. The method can be tweaked to specify data types and missing values in the resulting array,

Does Pandas to_numpy() work with non-numeric data?

The Pandas to_numpy() method allows for non-numeric data. The method will coerce all values into the object data type.

Conclusion

In this post, you learned how to convert a Pandas DataFrame into a NumPy array, using the .to_numpy() method. You first learned how the method functions by exploring its arguments and default parameters. Then, you learned how to use the method by applying it to DataFrames, including working with different data types and missing values. Finally, you learned how to convert only some columns or a single column to a NumPy array.

Additional Resources

To learn more about related topics, check out the tutorials below:

Leave a Reply

Your email address will not be published.