In this tutorial, you’ll learn how to use Python’s Pandas library to convert a column’s values to a string data type. You will learn how to convert Pandas integers and floats into strings. You’ll also learn how strings have evolved in Pandas, and the advantages of using the Pandas string dtype. You’ll learn four different ways to convert a Pandas column to strings and how to convert every Pandas dataframe column to a string.
The Quick Answer: Use pd.astype('string')
Table of Contents
Loading a Sample Dataframe
In order to follow along with the tutorial, feel free to load the same dataframe provided below. We’ll load a dataframe that contains three different columns: 1 of which will load as a string and 2 that will load as integers.
We’ll first load the dataframe, then print its first five records using the .head()
method.
Let’s get started:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
print('df head:')
print(df.head())
This returns the following information:
# df head:
# Name Age Income
# 0 Nik 30 70000
# 1 Jane 31 72000
# 2 Matt 29 83000
# 3 Kate 33 90000
# 4 Clark 43 870000
Let’s start the tutorial off by learning a little bit about how Pandas handles string data.
What is the String Datatype in Pandas?
To explore how Pandas handles string data, we can use the .info()
method, which will print out information on the dataframe, including the datatypes for each column.
Let’s take a look at what the data types are:
print(df.info())
# Returns:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null int64
# 2 Income 5 non-null int64
# dtypes: int64(2), object(1)
# memory usage: 248.0+ bytes
We can see here that by default, Pandas will store strings using the object
datatype. The object
data type is used for strings and for mixed data types, but it’s not particularly explicit.
Beginning in version 1.0, Pandas has had a dedicated string
datatype. While this datatype currently doesn’t offer any explicit memory or speed improvements, the development team behind Pandas has indicated that this will occur in the future.
Because of this, the tutorial will use the string
datatype throughout the tutorial. If you’re using a version lower than 1.0, please replace string
with str
in all instances.
Let’s get started by using the preferred method for using Pandas to convert a column to a string.
Convert a Pandas Dataframe Column Values to String using astype
Pandas comes with a column (series) method, .astype()
, which allows us to re-cast a column into a different data type.
Many tutorials you’ll find only will tell you to pass in 'str'
as the argument. While this holds true for versions of Pandas lower than 1.0, if you’re using 1.0 or later, pass in 'string'
instead.
Doing this will ensure that you are using the string
datatype, rather than the object
datatype. This will ensure significant improvements in the future.
Let’s take a look at how we can convert a Pandas column to strings, using the .astype()
method:
df['Age'] = df['Age'].astype('string')
print(df.info())
This returns the following:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null string
# 2 Income 5 non-null int64
# dtypes: int64(1), object(1), string(1)
# memory usage: 248.0+ bytes
We can see that our Age
column, which was previously stored as int64
is now stored as the string
datatype.
In the next section, you’ll learn how to use the .map()
method to convert a Pandas column values to strings.
Convert a Pandas Dataframe Column Values to String using map
Similar to the .astype()
Pandas series method, you can use the .map()
method to convert a Pandas column to strings.
Let’s take a look at what this looks like:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
df['Age'] = df['Age'].map(str)
print(df.info())
This returns the following:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null object
# 2 Income 5 non-null int64
# dtypes: int64(1), object(2)
# memory usage: 248.0+ bytes
We can see here that by using the .map()
method, we can’t actually use the string
datatype. Because of this, the data are saved in the object
datatype. Because of this, I would not recommend this approach if you’re using a version higher than 1.0.
In the next section, you’ll learn how to use the .apply()
method to convert a Pandas column’s data to strings.
Convert a Pandas Dataframe Column Values to String using apply
Similar to the method above, we can also use the .apply()
method to convert a Pandas column values to strings. This comes with the same limitations, in that we cannot convert them to string
datatypes, but rather only the object
datatype.
Let’s see what this looks like:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
df['Age'] = df['Age'].apply(str)
print(df.info())
This returns the following:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null object
# 2 Income 5 non-null int64
# dtypes: int64(1), object(2)
# memory usage: 248.0+ bytes
In the next section, you’ll learn how to use the value.astype()
method to convert a dataframe column’s values to strings.
Convert a Pandas Dataframe Column Values to String using values.astype
Finally, we can also use the .values.astype()
method to directly convert a column’s values into strings using Pandas.
Let’s see what this looks like:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
df['Age'] = df['Age'].values.astype(str)
print(df.info())
This returns the following:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null object
# 2 Income 5 non-null int64
# dtypes: int64(1), object(2)
# memory usage: 248.0+ bytes
In the next section, you’ll learn how to use .applymap()
to convert all columns in a Pandas dataframe to strings.
Convert All Pandas Dataframe Columns to String Using Applymap
In this final section, you’ll learn how to use the .applymap()
method to convert all Pandas dataframe columns to string.
Let’s take a look at what this looks like:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
df = df.applymap(str)
print(df.info())
This returns:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null object
# 1 Age 5 non-null object
# 2 Income 5 non-null object
# dtypes: object(3)
# memory usage: 248.0+ bytes
If, instead, we wanted to convert the datatypes to the new string
datatype, then we could loop over each column. This would look like this:
import pandas as pd
df = pd.DataFrame({
'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'],
'Age': [30, 31, 29, 33, 43],
'Income':[70000, 72000, 83000, 90000, 870000]
})
for col in df.columns:
df[col] = df[col].astype('string')
print(df.info())
This returns the following:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 3 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Name 5 non-null string
# 1 Age 5 non-null string
# 2 Income 5 non-null string
# dtypes: string(3)
# memory usage: 248.0 bytes
Conclusion
In this tutorial, you learned how to use Python Pandas to convert a column’s values to strings. You learned the differences between the different ways in which Pandas stores strings. You also learned four different ways to convert the values to string types. Finally, you learned how to convert all dataframe columns to string types in one go.
To learn more about how Pandas intends to handle strings, check out this API documentation here.
Unfortunately, I didn’t see how export column values to string. Example, [88, 99] to “88, 99”.
Hi Dom – you could apply the join method to the resulting list. Check out my post here: http://datagy.io/list-to-string-python/
I didn’t see how export column values to string too.
Are you looking to convert the whole column to a single string?