Unpivot Your Data with Pandas and Python Melt Function • datagy

In this tutorial, you’ll learn how to use the Pandas melt function to reshape wide data into a long data format. If you’re working with data in Python, reshaping and transforming your data is an essential skill, and that’s where Pandas melt() comes into play. Cleaning your data and transforming it is an essential data science and data engineering skill.

This guide offers a clear and engaging introduction to this powerful function, packed with examples that cover everything from the basics to advanced usage.

By the end of this tutorial, you’ll have learned the following:

What the Pandas melt() function is and its purpose in reshaping data
How to set up and use the pandas melt function, with a step-by-step walkthrough of its syntax and parameters
The distinction between identifier variables, value variables, and measured variables, and how to choose the right combination for your needs
How to customize column names in the melted DataFrame using id_vars and value_vars
Advanced melting techniques such as working with MultiIndex DataFrames and integrating melt with other pandas operations

Want to learn more about transforming your Pandas DataFrames? Check out these resources below: Data Cleaning and Preparation in Pandas and Python, Transforming Pandas Columns with map and apply, Combine Data in Pandas with merge, join, and concat.

Table of Contents

Understanding the Pandas melt() Function

The Pandas melt() function is used to convert a wide-format DataFrame into a long-format DataFrame.

This means that we’re taking an organized, pivot-style DataFrame where multiple columns represent different variables. We are then converting it into one where data are organized into a key-value structure (with variables in one column and their values in another).

Let’s dive in and examine the syntax for the Pandas melt function.

# Understanding the Pandas melt() Function
import pandas as pd

pd.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name="value", col_level=None, ignore_index=True,)

We can see that the function offers quite a few different parameters, for which all but one have default arguments provided. The table below breaks down the parameters, their default arguments, a brief description and accepted values.

Parameter	Default Argument	Description	Accepted Values
frame	–	The input DataFrame to be melted	A Pandas DataFrame
id_vars	None	Identifier variables that will remain unchanged in the resulting DataFrame	Scalar/list of Scalars/column index (integer)
value_vars	None	Variables to transform into long format	Scalar/list of Scalars/column index (integer)
var_name	None	Custom name for the resulting ‘variable’ column	A string representing the new column name
value_name	“value”	Custom name for the resulting ‘value’ column	A string representing the new column name
col_level	None	For MultiIndex DataFrames, the level of column to be melted	An integer/string/column index
ignore_index	True	Adds a new index instead of keeping the previous index structure	Boolean: True/False

The parameters of the Pandas melt() function

By default, when id_vars, value_vars, and col_level are not provided, Pandas assumes all non-indentifier columns are value_vars and it melts the entire DataFrame.

Let’s now dive into how we can use the Pandas melt() function by going through different examples.

Loading a Sample Pandas DataFrame

For this tutorial, we’ll be working with a simple Pandas DataFrame that we use in the Pandas pivot table tutorial. It’s been cut down a bit, in order to make understanding the function a little simpler.

Let’s take a look at how we can load it and print the first few records. The dataset contains information on different purchases of clothing types.

# Loading a Sample Pandas DataFrame
import pandas as pd
df = pd.DataFrame({'Date': ['2020-07-11','2020-09-23','2020-04-02','2020-02-28','2020-03-19'],
 'Region': ['East', 'North', 'South', 'East', 'West'],
 'Type': ['Children', 'Children', 'Women', 'Children', 'Women'],
 'Units': [18.0, 14.0, 17.0, 26.0, 3.0],
 'Sales': [306.0, 448.0, 425.0, 832.0, 33.0]})

print(df)

# Returns:
#          Date Region      Type  Units  Sales
# 0  2020-07-11   East  Children   18.0  306.0
# 1  2020-09-23  North  Children   14.0  448.0
# 2  2020-04-02  South     Women   17.0  425.0
# 3  2020-02-28   East  Children   26.0  832.0
# 4  2020-03-19   West     Women    3.0   33.0

Now that you have a good sense of the data, let’s dive into using the function.

Using the Pandas melt() Function to Unpivot a DataFrame

In this section, we’ll focus on how to use the id_vars parameter of the pandas melt function to unpivot a DataFrame.

When it comes to melting a DataFrame, id_vars refers to the identifier variables, i.e., the columns in your DataFrame that you want to keep unchanged, while the remaining columns are pivoted from wide to long format. These identifier variables provide context to the resulting melted DataFrame.

Let’s see how we can use Pandas to melt a DataFrame:

# Melting a DataFrame with id_vars
melted = pd.melt(
    frame=df, 
    id_vars='Region'
)
print(melted)

# Returns:
#    Region variable       value
# 0    East     Date  2020-07-11
# 1   North     Date  2020-09-23
# 2   South     Date  2020-04-02
# 3    East     Date  2020-02-28
# 4    West     Date  2020-03-19
# ...
# 15   East    Sales       306.0
# 16  North    Sales       448.0
# 17  South    Sales       425.0
# 18   East    Sales       832.0
# 19   West    Sales        33.0

In the code block above, we specified that we wanted to use id_vars='Region'. By specifying id_vars='Region', we are telling pandas to keep the ‘Region’ column unchanged while melting the rest of the columns in the DataFrame.

Similarly, the id_vars= parameter accepts multiple column labels. Let’s explore this in the next section.

Using Multiple Identifier Variables in Pandas melt()

We can also melt a DataFrame using multiple variables. In order to use multiple columns in the Pandas melt function’s id_vars=, we can pass in a list of column labels. This means that multiple columns will remain unchanged in the resulting DataFrame.

Let’s see how we can melt the DataFrame, keeping the Region and Type columns the same, by passing in a list of labels:

# Adding a Second id_vars
melted = pd.melt(
    frame=df,
    id_vars=['Region', 'Type']
)
print(melted)

# Returns:
#    Region      Type variable       value
# 0    East  Children     Date  2020-07-11
# 1   North  Children     Date  2020-09-23
# 2   South     Women     Date  2020-04-02
# 3    East  Children     Date  2020-02-28
# 4    West     Women     Date  2020-03-19
# 5    East  Children    Units        18.0
# 6   North  Children    Units        14.0
# 7   South     Women    Units        17.0
# 8    East  Children    Units        26.0
# 9    West     Women    Units         3.0
# 10   East  Children    Sales       306.0
# 11  North  Children    Sales       448.0
# 12  South     Women    Sales       425.0
# 13   East  Children    Sales       832.0
# 14   West     Women    Sales        33.0

We can see in the resulting DataFrame above that the Region and Type columns have remained unchanged, but the other columns have been unpivoted.

Using Value Variables in the Pandas melt() Function

By default, Pandas will use all remaining columns in the value_vars= parameter, meaning that all columns will be unpivoted. If we wanted to only melt (or unpivot) a single column, we could pass a column label into the value_vars= parameter.

Let’s see how we can unpivot only the Sales column, broken out by the Region:

# Melting a DataFrame with value_vars
melted = pd.melt(
    frame=df, 
    id_vars='Region', 
    value_vars='Sales'
)
print(melted)

# Returns:
#   Region variable  value
# 0   East    Sales  306.0
# 1  North    Sales  448.0
# 2  South    Sales  425.0
# 3   East    Sales  832.0
# 4   West    Sales   33.0

In the example above, we kept the Region column as it was and only melted the Sales data. This means that we returned a single record for each region’s sales values.

Similarly, we can include multiple columns in the values field. Let’s see what this looks like in the next section.

Using Multiple Value Variables in Pandas melt()

By passing multiple column labels into the value_vars= parameter of the Pandas melt() function, you can un-pivot multiple columns. This will return key-value pairs for the two columns, broken out by the id_vars= column(s).

Let’s take a look at what this looks like:

# Adding a Second Value Var
melted = pd.melt(
    frame=df, 
    id_vars='Region', 
    value_vars=['Sales', 'Units']
)
print(melted)

# Returns:
#   Region variable  value
# 0   East    Sales  306.0
# 1  North    Sales  448.0
# 2  South    Sales  425.0
# 3   East    Sales  832.0
# 4   West    Sales   33.0
# 5   East    Units   18.0
# 6  North    Units   14.0
# 7  South    Units   17.0
# 8   East    Units   26.0
# 9   West    Units    3.0

We can see that the Region column has remained consistent. Two columns (variable and value) have been added, representing the key-value pairs for the Sales and Units columns.

In the following section, you’ll learn how to customize the names of the value and identifier variables using the Pandas melt() function.

Customizing the Names of Value and Identifier Variables in Pandas melt()

By default, Pandas will assign the column names of variable and value when melting a DataFrame. We can override this behavior by using the var_name= and value_name= parameters.

Both of these parameters accept a single string as their input. Let’s see how we can customize these, by passing in Category and Amount as arguments.

# Naming id_vars and value_vars
melted = pd.melt(
    frame=df, 
    id_vars='Region', 
    value_vars='Sales', 
    var_name='Category', 
    value_name='Amount'
)
print(melted)

# Returns:
#   Region Category  Amount
# 0   East    Sales   306.0
# 1  North    Sales   448.0
# 2  South    Sales   425.0
# 3   East    Sales   832.0
# 4   West    Sales    33.0

In the resulting DataFrame above, we added var_name='Category and value_name='Amount' . This resulted in passing cleaner, more understandable names to our columns.

Retaining the Original DataFrame Index in Pandas melt()

By default, Pandas will override the former index. You can override this behavior by modifying the ignore_index= parameter. By default, this is set to True, but if we change it to False, the original index will be retained. Let’s see what this looks like:

# Keeping the Original Index
melted = pd.melt(
    frame=df, 
    id_vars='Region', 
    value_vars=['Sales', 'Units'], 
    ignore_index=False
)
print(melted)

# Returns:
#   Region variable  value
# 0   East    Sales  306.0
# 1  North    Sales  448.0
# 2  South    Sales  425.0
# 3   East    Sales  832.0
# 4   West    Sales   33.0
# 0   East    Units   18.0
# 1  North    Units   14.0
# 2  South    Units   17.0
# 3   East    Units   26.0
# 4   West    Units    3.0

In the example above, we modified the ignore_index= parameter to False. This means that the original index was retained (which explains why we’re seeing the indices of 0 through 4 repeated).

Using col_level in the Pandas melt Function with Multi-Index Columns

The col_level parameter comes in handy when working with DataFrames containing MultiIndex columns. It helps you determine which level of the column index should be used when melting the DataFrame. Let’s explore this with a simple example:

# Working with col_level in Pandas melt
import pandas as pd

# Creating a sample DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples(
    [('Sales', '2019'), ('Sales', '2020'), ('Profit', '2019'), ('Profit', '2020')],
    names=['Category', 'Year']
)
data = [
    [200, 250, 100, 120],
    [300, 340, 150, 160],
    [150, 180, 80, 90],
    [250, 310, 120, 140],
]

index = ['North', 'South', 'East', 'West']
df = pd.DataFrame(data, index=index, columns=columns)
print("Original DataFrame:")
print(df)

# Melting a DataFrame with col_level
melted = pd.melt(
    frame=df,
    col_level='Year',
    id_vars=None  # In this case, we are not using id_vars
)

print(melted)

# Returns:
# Melted DataFrame:
#    Year  variable  value
# 0  2019       NaN    200
# 1  2019       NaN    300
# 2  2019       NaN    150
# 3  2019       NaN    250
# 4  2020       NaN    250
# ...

In this example, we have a DataFrame with MultiIndex columns, where ‘Category’ (Level 0) represents ‘Sales’ and ‘Profit’, and ‘Year’ (Level 1) represents ‘2019’ and ‘2020’. When we set col_level='Year', the pandas melt function uses the ‘Year’ level of the MultiIndex as the ‘variable’ column in the resulting melted DataFrame.

In summary, the col_level parameter allows you to manage the melting process in DataFrames with MultiIndex columns, giving you more control over how your data is restructured when using the pandas melt function.

Conclusion

Congratulations on reaching the end of this comprehensive tutorial on the Pandas melt function! We’ve covered a wide array of concepts, tools, and examples that explored the many features of this powerful data manipulation tool. To recap, here’s what we’ve learned:

The purpose and functionality of the Pandas melt function in reshaping data from wide to long format
How to set up and use the Pandas melt function effectively, delving into the syntax and parameters of the melt function
Discerning the difference between identifier variables, value variables, and measured variables, and why choosing the right combination of id_vars and value_vars is critical
Customizing column names in the melted DataFrame using id_vars, value_vars, var_name, and value_name parameters
Advanced melting practices, such as working with MultiIndex DataFrames and integrating the melt function with other Pandas operations

With the knowledge gained from this tutorial, you are now well-equipped to tackle complex data manipulation tasks and reshape data for improved handling and analysis. Keep exploring further resources and practice your skills to become an expert in data restructuring using pandas. Remember, the key to mastering any tool is consistent practice and experimentation. Happy data melting!

Find out more about the melt function by checking out its official documentation.

Unpivot Your Data with the Pandas Melt Function

Understanding the Pandas melt() Function

Loading a Sample Pandas DataFrame

Using the Pandas melt() Function to Unpivot a DataFrame

Using Multiple Identifier Variables in Pandas melt()

Using Value Variables in the Pandas melt() Function

Using Multiple Value Variables in Pandas melt()

Customizing the Names of Value and Identifier Variables in Pandas melt()

Retaining the Original DataFrame Index in Pandas melt()

Using col_level in the Pandas melt Function with Multi-Index Columns

Conclusion

Nik Piepenbreier

1 thought on “Unpivot Your Data with the Pandas Melt Function”

Leave a Reply Cancel reply

Unpivot Your Data with the Pandas Melt Function

Understanding the Pandas melt() Function

Loading a Sample Pandas DataFrame

Using the Pandas melt() Function to Unpivot a DataFrame

Using Multiple Identifier Variables in Pandas melt()

Using Value Variables in the Pandas melt() Function

Using Multiple Value Variables in Pandas melt()

Customizing the Names of Value and Identifier Variables in Pandas melt()

Retaining the Original DataFrame Index in Pandas melt()

Using col_level in the Pandas melt Function with Multi-Index Columns

Conclusion

Nik Piepenbreier

1 thought on “Unpivot Your Data with the Pandas Melt Function”

Leave a Reply Cancel reply

Thank you!