In this post, you’ll learn how to use the Pandas melt function. The pd.melt()
function is used to reshape a Pandas from wide to long format. What this means is that one or more columns are used as identifiers and all other columns are used as values. In short, this allows you to “unpivot” your DataFrame.
By the end of this tutorial, you’ll have learned:
- What unpivoting, or melting, your data means
- How to understand and use the Pandas melt function
- How to use the Pandas melt function to melt a single or multiple variables
Why Use Pandas Melt to Unpivot Data?
This is helpful if you’re given data in a wide format, such as report you find online or you may have been given by a colleague. This data is easy to understand, but it’s harder to reshape into some other form of analysis. For example, if you wanted to create a pivot table of your data, then having melted data can be very helpful.
Loading a Sample Dataframe
Let’s begin by loading a sample dataframe in wide-format.:
import pandas as pd
df = pd.DataFrame(
{'name': ['John', 'Mel', 'Peter', 'Frank', 'Nik', 'Eva'],
'age': [27, 32, 35, 31, 29, 30],
'education': ['master', 'doctor', 'MBA', 'bachelor', 'MBA', 'bachelor'],
'salary': [100000,120000, 85000, 75000, 95000, 80000]})
print(df.head())
This generates our dataset in a wide-format. The printout shows:
name age education salary
0 John 27 master 100000
1 Mel 32 doctor 120000
2 Peter 35 MBA 85000
3 Frank 31 bachelor 75000
4 Nik 29 MBA 95000
What is the Pandas Melt Syntax?
The general format for the melt function looks like this:
pandas.melt(
frame,
id_vars=None,
value_vars=None,
var_name=None,
value_name='value',
col_level=None
)
Let’s take a look at all of these in a bit more detail:
frame=
the data frame we’re looking to meltid_vars=
the column(s) we’re using as identifying variablesvalue_vars=
the columns we’re looking to unpivot. If this isn’t specified, any column not in id_vars is used.var_name
= the name used for the variable column.value_name=
the name used for the value column. Will default to values.col_level=
if columns are a multi-index, use this to melt.
How to Unpivot Your Data Using the Pandas Melt Function
Let’s take a look at how we can use the Pandas melt function to unpivot the dataset.
From what we learned earlier, we need to reassign the dataframe:
melted = pd.melt(
df,
id_vars = 'name',
var_name = 'Attribute',
value_name = 'Value')
print(melted.head())
Notice that we didn’t include the value_vars. Since we wanted to unpivot all of our data, Pandas was able to assign all the columns.
This is the same as writing:
melted = pd.melt(
df,
id_vars = 'name',
value_vars = ['age', 'education', 'salary'],
var_name = 'Attribute',
value_name = 'Value'
)
print(melted.head())
Both of these return the following output:
name Attribute Value
0 John age 27
1 Mel age 32
2 Peter age 35
3 Frank age 31
4 Nik age 29
Conclusion
Using just a few steps in Python, we were able to unpivot data using Pandas and the melt function to transform a cross-tab style, wide dataset into one that’s become much more useful for further analysis.
Additional Resources
To learn more about related topics, check out the tutorials below:
- Pivot Tables in Pandas with Python
- Combine Data in Pandas with merge, join, and concat
- Introduction to Pandas for Data Science
- Find out more about the melt function by checking out its official documentation.
Pingback: Pivot Tables in Pandas with Python for Python and Pandas • datagy