Pandas: Convert Column Values to Strings

Pandas Convert Column Values to Strings Cover Image

In this tutorial, you’ll learn how to use Python’s Pandas library to convert a column’s values to a string data type. You will learn how to convert Pandas integers and floats into strings. You’ll also learn how strings have evolved in Pandas, and the advantages of using the Pandas string dtype. You’ll learn four different ways to convert a Pandas column to strings and how to convert every Pandas dataframe column to a string.

The Quick Answer: Use pd.astype('string')

Quick Answer - Pandas Convert Column Values to Strings

Loading a Sample Dataframe

In order to follow along with the tutorial, feel free to load the same dataframe provided below. We’ll load a dataframe that contains three different columns: 1 of which will load as a string and 2 that will load as integers.

We’ll first load the dataframe, then print its first five records using the .head() method.

Let’s get started:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

print('df head:')
print(df.head())

This returns the following information:

df head:
    Name  Age  Income
0    Nik   30   70000
1   Jane   31   72000
2   Matt   29   83000
3   Kate   33   90000
4  Clark   43  870000

Let’s start the tutorial off by learning a little bit about how Pandas handles string data.

Want to learn more about Python f-strings? Check out my in-depth tutorial, which includes a step-by-step video to master Python f-strings!

What is the String Datatype in Pandas?

To explore how Pandas handles string data, we can use the .info() method, which will print out information on the dataframe, including the datatypes for each column.

Let’s take a look at what the data types are:

print(df.head())

# Returns:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
#  #   Column  Non-Null Count  Dtype
# ---  ------  --------------  -----
#  0   Name    5 non-null      object
#  1   Age     5 non-null      int64
#  2   Income  5 non-null      int64
# dtypes: int64(2), object(1)
# memory usage: 248.0+ bytes

We can see here that by default, Pandas will store strings using the object datatype. The object data type is used for strings and for mixed data types, but it’s not particularly explicit.

Beginning in version 1.0, Pandas has had a dedicated string datatype. While this datatype currently doesn’t offer any explicit memory or speed improvements, the development team behind Pandas has indicated that this will occur in the future.

Because of this, the tutorial will use the string datatype throughout the tutorial. If you’re using a version lower than 1.0, please replace string with str in all instances.

Let’s get started by using the preferred method for using Pandas to convert a column to a string.

Need to check if a key exists in a Python dictionary? Check out this tutorial, which teaches you five different ways of seeing if a key exists in a Python dictionary, including how to return a default value.

Convert a Pandas Dataframe Column Values to String using astype

Pandas comes with a column (series) method, .astype(), which allows us to re-cast a column into a different data type.

Many tutorials you’ll find only will tell you to pass in 'str' as the argument. While this holds true for versions of Pandas lower than 1.0, if you’re using 1.0 or later, pass in 'string' instead.

Doing this will ensure that you are using the string datatype, rather than the object datatype. This will ensure significant improvements in the future.

Let’s take a look at how we can convert a Pandas column to strings, using the .astype() method:

df['Age'] = df['Age'].astype('string')
print(df.info())

This returns the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Age     5 non-null      string
 2   Income  5 non-null      int64
dtypes: int64(1), object(1), string(1)
memory usage: 248.0+ bytes

We can see that our Age column, which was previously stored as int64 is now stored as the string datatype.

In the next section, you’ll learn how to use the .map() method to convert a Pandas column values to strings.

Want to learn more about Python list comprehensions? Check out this in-depth tutorial that covers off everything you need to know, with hands-on examples. More of a visual learner, check out my YouTube tutorial here.

Convert a Pandas Dataframe Column Values to String using map

Similar to the .astype() Pandas series method, you can use the .map() method to convert a Pandas column to strings.

Let’s take a look at what this looks like:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

df['Age'] = df['Age'].map(str)
print(df.info())

This returns the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Age     5 non-null      object
 2   Income  5 non-null      int64
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes

We can see here that by using the .map() method, we can’t actually use the string datatype. Because of this, the data are saved in the object datatype. Because of this, I would not recommend this approach if you’re using a version higher than 1.0.

In the next section, you’ll learn how to use the .apply() method to convert a Pandas column’s data to strings.

Need to automate renaming files? Check out this in-depth guide on using pathlib to rename files. More of a visual learner, the entire tutorial is also available as a video in the post!

Convert a Pandas Dataframe Column Values to String using apply

Similar to the method above, we can also use the .apply() method to convert a Pandas column values to strings. This comes with the same limitations, in that we cannot convert them to string datatypes, but rather only the object datatype.

Let’s see what this looks like:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

df['Age'] = df['Age'].apply(str)
print(df.info())

This returns the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Age     5 non-null      object
 2   Income  5 non-null      int64
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes

In the next section, you’ll learn how to use the value.astype() method to convert a dataframe column’s values to strings.

Want to learn more about Python for-loops? Check out my in-depth tutorial that takes your from beginner to advanced for-loops user! Want to watch a video instead? Check out my YouTube tutorial here.

Convert a Pandas Dataframe Column Values to String using values.astype

Finally, we can also use the .values.astype() method to directly convert a column’s values into strings using Pandas.

Let’s see what this looks like:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

df['Age'] = df['Age'].values.astype(str)
print(df.info())

This returns the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Age     5 non-null      object
 2   Income  5 non-null      int64
dtypes: int64(1), object(2)
memory usage: 248.0+ bytes

In the next section, you’ll learn how to use .applymap() to convert all columns in a Pandas dataframe to strings.

Want to learn more about Python for-loops? Check out my in-depth tutorial that takes your from beginner to advanced for-loops user! Want to watch a video instead? Check out my YouTube tutorial here.

Convert All Pandas Dataframe Columns to String Using Applymap

In this final section, you’ll learn how to use the .applymap() method to convert all Pandas dataframe columns to string.

Let’s take a look at what this looks like:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

df = df.applymap(str)
print(df.info())

This returns:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      object
 1   Age     5 non-null      object
 2   Income  5 non-null      object
dtypes: object(3)
memory usage: 248.0+ bytes

If, instead, we wanted to convert the datatypes to the new string datatype, then we could loop over each column. This would look like this:

import pandas as pd

df = pd.DataFrame({
    'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
    'Age': [30, 31, 29, 33, 43],
    'Income':[70000, 72000, 83000, 90000, 870000]
})

for col in df.columns:
    df[col] = df[col].astype('string')

print(df.info())

This returns the following:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Name    5 non-null      string
 1   Age     5 non-null      string
 2   Income  5 non-null      string
dtypes: string(3)
memory usage: 248.0 bytes

Want to learn more about calculating the square root in Python? Check out my tutorial here, which will teach you different ways of calculating the square root, both without Python functions and with the help of functions.

Conclusion

In this tutorial, you learned how to use Python Pandas to convert a column’s values to strings. You learned the differences between the different ways in which Pandas stores strings. You also learned four different ways to convert the values to string types. Finally, you learned how to convert all dataframe columns to string types in one go.

To learn more about how Pandas intends to handle strings, check out this API documentation here.