Exploring the Pandas Style API

  • Save

Pandas is the quintessential tool for data analysis in Python, but it’s not always the easiest to make data look presentable. For that, many analysts still turn to Excel to add data styles (such as currencies) or conditional formatting before sharing the data with our broader audiences. In this post, we’ll explore how to take these features that are commonplace in Excel and demonstrate how to take these on using the Pandas Style API!

Why would we want to style data? Our end goal should be to make the data easier for our readers to understand while maintaining the usability of the underlying data available in the dataframe. For example, 10% may be easier to understand than the value 0.10, but the proportion of 0.10 is more usable for further analysis.

Table of Contents: The Pandas Style API

What is the Pandas Style API?

Pandas developed the styling API in 2019 and it’s gone through active development since then. The API returns a new Styler object, which has useful methods to apply formatting and styling to dataframes. The end styling is accomplished with CSS, through style-functions that are applied to scalars, series, or entire dataframes, via attribute:value pairs.

The Styler objects has two key methods:

  1. Styler.applymap – which applies styles element-wise
  2. Styler.apply – which applies styles column- / row- / dataframe-wise

Let’s get started by loading our data first.

Loading Our Sample Dataset

We’ll use the same dataset that’s available in our pivot table tutorial and we’ll use some of the steps we outlined there. If you’re not familiar with Pivot Tables in Pandas, we recommend checking out our tutorial.

import pandas as pd

df = pd.read_excel('https://github.com/datagy/pivot_table_pandas/raw/master/sample_pivot.xlsx', parse_dates=['Date'])

print(df.head())
Out:
        Date 	        Region 	Type 	                Units 	Sales
0 	2020-07-11 	East 	Children's Clothing 	18 	306
1 	2020-09-23 	North 	Children's Clothing 	14 	448
2 	2020-04-02 	South 	Women's Clothing 	17 	425
3 	2020-02-28 	East 	Children's Clothing 	26 	832
4 	2020-03-19 	West 	Women's Clothing 	3 	33

We can see that we have a number of sales, providing information on Region, Type, # of Units Sold and the total Sales Cost.

Let’s create a pivot table out of this, following our tutorial:

pivot = pd.pivot_table(df, index = ['Region', 'Type'], values = 'Sales', aggfunc = 'sum')

print(pivot)
Output:
                            Sales
Region Type                      
East   Children's Clothing  45849
       Men's Clothing       51685
       Women's Clothing     70229
North  Children's Clothing  37306
       Men's Clothing       39975
       Women's Clothing     61419
South  Children's Clothing  18570
       Men's Clothing       18542
       Women's Clothing     22203
West   Children's Clothing  20182
       Men's Clothing       19077
       Women's Clothing     22217

Now that we have our data loaded and stored in a dataframe called pivot we can start styling our data in Pandas.

Are you enjoying our content? Consider following us on social media! Follow us on LinkedIn, Twitter, or Instagram!

Data Type Labels to Pandas

In our dataframe pivot, the columns Sales represents the total number of sales in dollars. This isn’t immediately clear to the reader, however, as there is no dollar sign and the thousand values aren’t separated by commas. Let’s explore how to do this:

pivot.style.format({'Sales':'${0:,.0f}'})
Output:
                            Sales
Region Type                      
East   Children's Clothing  $45,849
       Men's Clothing       $51,685
       Women's Clothing     $70,229
North  Children's Clothing  $37,306
       Men's Clothing       $39,975
       Women's Clothing     $61,419
South  Children's Clothing  $18,570
       Men's Clothing       $18,542
       Women's Clothing     $22,203
West   Children's Clothing  $20,182
       Men's Clothing       $19,077
       Women's Clothing     $22,217

We can see that the data is immediately easier to understand!

String formats can be applied in different ways. Some other examples include:

  • Float with 2 decimal places: {:.2f}
  • Pad numbers with zeroes: {:0>2d}
  • Percent with 2 decimal places: {:.2%}

To learn more about these, check out this excellent tutorial by Real Python.

If we wanted to pass formatting in for multiple columns, it might be easier to define a dictionary that can be passed onto the styling function. For example, we could write a dictionary like below:

format_dictionary = {
   'column1':'format1', 
   'column2':'format2'
   }

Which could then be passed onto an object like below:

pivot.style.format(format_dictionary)

Adding Conditional Formatting

Conditional formatting is a great tool easily available in Excel. It allows us to easily identify values based on their content. It’s equally easy in Pandas, but hidden away a little bit. We’ll show just how easy it is to achieve conditional formatting in Pandas.

For example, if we wanted to highlight any number of sales that exceed $50,000 (say, they were eligible for a bonus after that point). We can do this using the applymap method. Before we begin, we’ll define a function we can pass onto the applymap method.

def highlight_fifty(val):
    color = 'red' if val > 50000 else 'black'
    return 'color: %s' % color

We can now pass this function into the applymap method:

pivot_highlight = pivot.style.applymap(highlight_fifty)
pivot_highlight

This returns the following dataframe:

Pandas Style - Highlight Values
  • Save

We can also chain the data styling with our conditional formatting:

pivot_highlight = pivot.style.applymap(highlight_fifty).format({'Sales':'${0:,.0f}'})
pivot_highlight
Pandas Style - Highlight in Red
  • Save

Making Chained Methods Easier to Read

Chaining methods is an incredibly useful feature in Python, but it’s not always the easiest to read. We can split the chain across multiple lines by using the \ character, as shown below:

pivot_highlight = pivot.style.format({'Sales':'${0:,.0f}'})\
   .highlight_max(color='green')\
   .highlight_min(color='red')
pivot_highlight

Now, say we wanted to highlight the maximum and minimum values, we can achieve this with another Styler object. While we could accomplish this using functions and the applymap method, Pandas thankfully has methods built-in directly to highlight the maximum and minimum values. In the example below, we provide named-colors, but you can also provide hex values to be more specific.

pivot.style.format({'Sales':'${0:,.0f}'}) \
   .highlight_max(color='green') \
   .highlight_min(color='red')
Pandas Style - Adding Min/Max Values
  • Save

Adding Color Scales to Pandas

Sometimes we will want to identify the values within a column relative to one another. This is where color scales come into play. We can accomplish this quite easy as a style method using the background_gradient method. Let’s give this a shot:

pivot.style.background_gradient(cmap='Greens')

This returns a dataframe like below:

Pandas Style - Adding Color Scales to a Dataframe
  • Save

You can also use different cmaps. To learn more about cmaps, check out this Matplotlib guide.

Check out some other Python tutorials on datagy, including our guide to For Loops and our complete Overview of SQLite for Python.

Limiting Columns for Formatting

Let’s now generate a pivot table that has multiple columns of values:

pivot2 = pd.pivot_table(df, index = ['Region', 'Type'], values = 'Sales', aggfunc = ['sum','count'])

This creates a pivot table that looks like this:

 		                sum 	count
		                Sales 	Sales
Region 	Type 		
East 	Children's Clothing 	45849 	113
        Men's   Clothing 	51685 	122
        Women's Clothing 	70229 	176
North 	Children's Clothing 	37306 	85
        Men's   Clothing 	39975 	89
        Women's Clothing 	61419 	142
South 	Children's Clothing 	18570 	45
        Men's   Clothing 	18542 	39
        Women's Clothing 	22203 	53
West 	Children's Clothing 	20182 	42
        Men's   Clothing 	19077 	41
        Women's Clothing 	22217 	53

Now, let’s apply the background_gradient method:

pivot2.style.background_gradient(cmap='Greens')

This returns the following dataframe:

Adding Color Bars
  • Save

If we wanted to limit this to only one column, we can use the subset parameter, as shown below:

pivot2.style.background_gradient(subset=['sum'], cmap='Greens')
Adding Color Bars to Select Columns
  • Save

Adding Color Bars to Pandas

Another illustrative way to add context to the size of a value in a column is to add color bars. This is an incredibly easy way to provide visuals that are also easy to print out. We can accomplish this using Python by using the code below:

pivot.style.format({'Sales':'${0:,.0f}'}).bar(color='Green')

This returns the following dataframe:

Adding Color Bars
  • Save

Color bars allow us to see the scale more easily. We can also use the align=center parameter, to have the bars show on the left if values are negative and on the right if they are positive.

How to Reuse Styles in Pandas

After you’ve spent some time creating a style you really like, you may want to reuse it. Thankfully, Pandas makes it easy without having to duplicate the code you meticulously created. You use the .use method on the Style object of another datagram. For example, if we have two dataframes, style1 and style 2, we can reuse the style of style1 by using the following:

style1.style.format({'Sales':'${0:,.0f}'}).bar(color='Green')
style2.use(style1.export())

Hiding Index or Columns

Since we’re talking about getting data ready for displaying, let’s talk about another piece that Excel makes quite easy: hiding columns. We can accomplish this in Pandas using styler objects as well. If we wanted to hide the index, we could write:

df.head().style.hide_index()
Date 	                Region 	Type 	                Units 	        Sales
2020-07-11 00:00:00 	East 	Children's Clothing 	18.000000 	306
2020-09-23 00:00:00 	North 	Children's Clothing 	14.000000 	448
2020-04-02 00:00:00 	South 	Women's Clothing 	17.000000 	425
2020-02-28 00:00:00 	East 	Children's Clothing 	26.000000 	832
2020-03-19 00:00:00 	West 	Women's Clothing 	3.000000 	33

Similarly, if we wanted to hide a column, we could write:

df.head().style.hide_columns(['Units'])
        Date 	                Region 	Type 	                Sales
0 	2020-07-11 00:00:00 	East 	Children's Clothing 	306
1 	2020-09-23 00:00:00 	North 	Children's Clothing 	448
2 	2020-04-02 00:00:00 	South 	Women's Clothing 	425
3 	2020-02-28 00:00:00 	East 	Children's Clothing 	832
4 	2020-03-19 00:00:00 	West 	Women's Clothing 	33

Exporting Styled Dataframes to Excel

I mentioned earlier in the article that the Style API is Pandas is still experimental. We can’t export all of these methods currently, but can currently export background-color and color. For example, if we wanted to export the following dataframe:

pivot.style.bar(color='Green')

We could use the .to_excel method to extract our styled dataframe to an Excel workbook:

pivot.style.background_gradient(cmap='Greens').to_excel('Styled_Dataframe.xlsx', engine='openpyxl')

Maybe Just Use Excel?

Finally, there may just be instances where taking your data to Excel could end up being more efficient? In that case, you can just use the df.to_clipboard() method to copy your entire dataframe to your clipboard!

df.to_clipboard()

Conclusion: Exploring the Pandas Style API

In this post, we learned how to style a Pandas dataframe using the Pandas Style API. We learned how to add data type styles, conditional formatting, color scales and color bars. Similar to the styles found in Excel, Pandas makes it easy to apply styling to dataframes. This allows us to better represent data and find trends within the data visually.

Did We Mention We Have an eBook?

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!

2 thoughts on “Exploring the Pandas Style API”

  1. Hi Nik,

    Thanks for sharing your knwoledge about pandas! 🙂
    I was wondering: do you know how to to set color to the header of your dataframe?
    As well, do you know how to display properly the columns of your dataframe when you save it with to_excel? (I mean you can see clearly the data inside a column when you open your file with excel)

    Thanks

    1. Hi there! Thanks so much for your comment!

      To answer your first question, you’ll need to run the following code in your Jupyter notebook:
      from IPython.display import HTML

      Then, create a styles list like below:
      styles = [dict(select=’th’, props=[(“color”, “blue”)])
      html = df.style.set_table_styles(styles)
      html

      This would change the color of the headers to blue. But it’s a bit roundabout and not really intuitive.

      To answer your second question: only some of the styles can currently be exported to Excel. This includes the following:background-color, border-style, border-width, border-color, color, font-family, font-style, font-weight, text-align, text-decoration, vertical-align, white-space: nowrap. So it’s certainly a bit limited.

      You might want to consider a package for styling Excel files after they’re created. I cover this in a bit of detail in a post on Towards Data Science, which you can find here: https://towardsdatascience.com/automate-excel-reporting-with-python-233dd61fb0f2

Leave a Reply

Your email address will not be published. Required fields are marked *