Skip to content

Understanding and Using Data Classes in Python

Understanding and Using Data Classes in Python

Python DataClasses make creating simple classes (commonly used to store data) much easier by including many boilerplate methods. These classes were introduced in Python 3.7 (back in 2018!) and can be accessed using the standard Python library.

In this tutorial, you’ll learn how to:

  • Create data classes in Python
  • Understand when and why to use data classes
  • Use default fields in data classes
  • Make data classes immutable
  • Use inheritance in data classes

Let’s dive in!

What are Python’s Data Classes?

Classes are blueprints for objects that store data (attributes) and functionality (methods). Regular classes in Python tend to be functionality-oriented. For example, you might create classes for managing database connections (where the functionality could be connecting to, querying, and closing a database connection) or a calculator (which would focus on performing calculations).

Python data classes, however, as the name implies are focused on the data that they store. By creating boilerplate methods for you, data classes allow you to focus on creating clean, efficient, and (sometimes) immutable data structures.

Python data classes create the following default methods for you:

  • __init__() for initializing a class,
  • __repr__() for providing a string representation of a class, and
  • __eq__() for checking for equality

You might be thinking, “this isn’t a ton? Why bother?”. Let’s explore how data classes can save you a ton of headaches and probably some carpal tunnel.

Creating Data Classes (Starting with Normal Classes)

Now, let’s take a look at how we can create a Python class and then recreate it as a data class.

# Creating a simple class
class Employee:
    def __init__(
            self, 
            name: str, 
            location: str, 
            year: int
            ) -> None:
        
        self.name = name
        self.location = location
        self.year = year

In the code cell above, we created a simple class with three attributes to caputure employee information:

  • The name of the employee,
  • The role of the employee, and
  • The year the person started at the job.

Now, let’s create one of these Employee objects. Note that this is called instantiating the class.

# Instantiating our Employee class
Nik = Employee(name='Nik', location='Toronto', year=2019)

Great! That seemed to have worked well! Let’s see what happens when we try and print our object:

# Printing our object
print(Nik)

# Returns:
# <__main__.Employee object at 0x104ed77d0>

We can see that while this code didn’t fail, it didn’t really represent the Nik object very well. In order to do this, we can use the __repr__() method when defining our class. Let’s see how this changes things:

# Creating a simple class with a __repr__ method
class Employee:
    def __init__(
            self, 
            name: str, 
            location: str, 
            year: int
            ) -> None:
        
        self.name = name
        self.location = location
        self.year = year

    def __repr__(self):
        return f'Employee(name={self.name}, location={self.location}, year={self.year})'
    
Nik = Employee(name='Nik', location='Toronto', year=2019)
print(Nik)

# Returns:
# Employee(name=Nik, role=Data Nerd, year=2019)

We can see that we’re now able to print a much better representation of our class. However, we needed to create a simple method (self.__repr__()) to accomplish this.

Now, really creating a Data Class

This is where Python’s data classes come into play: they remove the need to create boilerplate classes for expected functionality.

Let’s see how we can recreate our class as a Python data class:

# Creating our first data class
from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    location: str
    year: int

Creating our data class begins by first importing the dataclass decorator from the dataclasses library. Because this library has been built into the standard library since Python 3.7, there’s nothing further to install.

We can then create a class using the class keyword the way that we normally would. The main thing you’ll change right away is including the @dataclass decorator on the line above the class keyword.

One of the big changes you’ll notice immediately is that you skip creating the __init__() method. Instead, you list out the class variables as fields. These fields require a type annotation. While the type annotations are required, they’re not enforced. So, technically, creating an Employee object with the following instantiation Employee(1, 2, 3) will work.

Let’s now check out what it looks like when we create the same object and print it out:

# Instantiating our Data Class
Nik = Employee(name='Nik', location='Toronto', year=2019)
print(Nik)

# Returns:
# Employee(name='Nik', location='Toronto', year=2019)

We can see that without defining a __repr__() method, that the class printed out in a much nicer manner.

We can now access fields of the data class similar to how we would a normal class. By using the .field_name syntax, we’re able to access different fields and methods, as shown below:

# Access fields of a data class
print(Nik.location)

# Returns:
# Toronto

Let’s explore some of the other methods that the data class creates.

Boilerplate Methods Created by Data Classes

One of the primary things that make data classes useful is their ability to create simple, useful methods directly. In the previous section, we saw how the data class decorator creates the __repr__() method. Let’s take a look at some of the other methods the decorator handles for you:

  • __init__() – Initializes the data class by automatically setting up constructor arguments corresponding to the defined attributes.
  • __repr__() – Provides a readable string representation of the object, useful for debugging.
  • __eq__() – Allows comparison of objects by checking if their attribute values are equal.
  • __lt__(), __le__(), __gt__(), __ge__() (Optional with order=True) – These methods allow for ordering comparisons (less than, greater than, etc.) if specified.
  • __hash__() (Optional with frozen=True or unsafe_hash=True) – Generates a hash method to allow instances to be used in hash-based collections like sets or as dictionary keys.

We can see here that by default, data classes handle creating three methods for you. However, using optional parameters, you can actually create another five!

Let’s take a look at some of these. First, we’ll explore the __eq__() method, which is used to check if the attributes of objects are equal. Let’s create another object, Nik2, with the same attributes and compare the two:

# Checking for equality
Nik2 = Employee(name='Nik', location='Toronto', year=2019)
print(Nik == Nik2)

# Returns:
# True

We’ll cover off the remaining methods in future sections as we explore customizing the dataclass decorator. , For now, let’s explore how to add default fields to our class.

Adding Default Fields to Data Classes

Similar to regular Python classes, you can assign default fields to data classes. Rather than doing this in the __init__() method, you do this while listing out the fields in your class. Let’s take a look at how we can set a default value to the year field:

# Adding a Default Field to a Data Class
@dataclass
class Employee:
    name: str
    location: str
    year: int = 2019

We can see now that we have a default field assigned to the year field. Let’s create another object and see how it works now:

# Using our data class with a default field
Katie = Employee(name='Katie', location='London')
print(Katie)

# Returns:
# Employee(name='Katie', location='London', year=2019)

When we created another object, we didn’t need to fill the year attribute.

When default values don’t work in data classes

There are two primary exceptions to data classes working in Python data classes:

  1. When a non-default value follows default values, and
  2. When mutable objects are used as default values

Non-Default Values Following Default Ones Throw a TypeError

Let’s check out both of these exceptions, starting with improper ordering. Let’s recreate our class and change the spot we include a default value:

# Non-Default values following default ones throw a TypeError
@dataclass
class Employee:
    name: str
    year: int = 2019
    location: str

# Throws:
# TypeError: non-default argument 'role' follows default argument

In the code cell above, a TypeError was thrown when we changed the ordering. This one is easier to solve: simply change the order of the values.

Mutable Defaults Throw a ValueError

In some cases, you may want to set a field to be a mutable data type, such as a list. Say we wanted to track the years that a person had been at a company, rather than the year they joined. It might seem like we can simply assign a list as the default value. However, in Python, this would create a pointer to that same list for every instance of that class, meaning different objects would share the same list. This is, of course, not something we really want. Let’s see what this looks like:

# Mutable default fields throw a ValueError
@dataclass
class Employee:
    name: str
    location: str
    years: list = []

# Throws:
# ValueError: mutable default <class 'list'> for field years is not allowed: use default_factory

In order to fix this error, we need to use the field() function from the dataclasses library. This will allow us to pass in a default factory, such as a list, to use as our default field.

# Using mutable types as default values
from dataclasses import field

@dataclass
class Employee:
    name: str
    location: str
    years: list[int] = field(default_factory=list)

Now that we have recreated our class, list create a new object that uses this multable type.

Nik = Employee(name='Nik', location='Toronto')
print(Nik)

# Returns:
# Employee(name='Nik', location='Toronto', years=[])

Phew! Now that we have a good understanding of default fields, let’s dive into another important topic: adding methods to data classes.

Methods in Data Classes

While data classes are primarily data-oriented, that doesn’t mean that they can’t have methods. Adding methods works much in the same way in data classes as it does in normal Python classes. Let’s take a look at how we can create a method on our data class. We’ll create a method that prints out some information about the employee. We’ll return back to a previous version of the class for simplicity:

# Adding a method to a data class
@dataclass
class Employee:
    name: str
    location: str
    year: int

    def display_info(self) -> None:
        print(f'{self.name} has been with the company since {self.year} and lives in {self.location}.')

Nik = Employee(name='Nik', location='Toronto', year=2019)
Nik.display_info()

# Returns
# Nik has been with the company since 2019 and lives in Toronto.

This was a pretty simple example. However, it shows demonstrates how you’re able to access attributes of the object within the method call. Similarly, you could go ahead and use methods to modify attributes.

Say that we had a field in our data class that identified the number of years they’ve been at the company. We can add a method that augments the number of years they’ve been there.

# Adding a method to a data class
@dataclass
class Employee:
    name: str
    location: str
    year: int
    num_years: int

    def add_years(self, years):
        self.num_years += years

Nik = Employee(name='Nik', location='Toronto', year=2019, num_years=0)

print(f'{Nik.name} has been at the company for {Nik.num_years} years.')
Nik.add_years(1)
print(f'{Nik.name} has been at the company for {Nik.num_years} years.')

# Returns
# Nik has been at the company for 0 years.
# Nik has been at the company for 1 years.

In the code block above, we implemented a new method in our class. The method accepts a number of years to add to an employee’s tenure. We then create our object, print out a statement about their tenure, increase the number of years, and then print it out again.

Customizing How Objects are Printed

While data classes automatically generate a __repr__() method, you can customize how the class represents the object. This can be done by using the field() function as part of defining the property. This allows us to decide to not print our an attribute when the object is printed.

Let’s take a look at what this looks like:

# Removing an object's attribute from being printed
from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    location: str
    year: int = field(repr=False)

Nik = Employee(name='Nik', location='Toronto', year=2019)
print(Nik)

# Returns:
# Employee(name='Nik', location='Toronto')

We can see that by modifying the attribute using the field() function, that the __repr__() method didn’t include the identified field in its representation.

But, what if you wanted to change the representation altogether? Data classes don’t create the __str__() method, so we can define it ourselves. Let’s see what this looks like:

# Defining a custom __str__ method
from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    location: str
    year: int = field(repr=False)

    def __str__(self) -> str:
        return f'{self.name} started in {self.year} and lives in {self.location}.'

Nik = Employee(name='Nik', location='Toronto', year=2019)
print(Nik)

# Returns:
# Nik started in 2019 and lives in Toronto.

We can see how simple it was to create a __str__() method. This method will run by default when we use the print() function to print an object.

Comparisons in Data Classes

Being able to compare two different objects is incredibly powerful. We already looked object equality using the == operator. However, we can also use data classes to automatically implement different comparators. __lt__(), __le__(), __gt__(), __ge__() allow for ordering comparisons (less than, greater than, etc.) if specified.

Rather than creating individual methods for each of these, as you would with a regular class, data classes simplify the implementation of these methods.

Let’s modify our data class slightly to better illustrate this:

# Using comparisons in data classes
@dataclass
class Employee:
    name: str
    num_years: int
    age: int

Nik = Employee(name='Nik', num_years=5, age=35)
Katie = Employee(name='Katie', num_years=4, age=34)

In our new data class, we have three fields. Two of these fields are numeric: one represents the number of years someone has been at the company, while the other represents the employee’s age.

Let’s take a look at what happens when we try to compare these two objects:

# Comparing two objects
print(Nik > Katie)

# Throws:
# TypeError: '>' not supported between instances of 'Employee' and 'Employee'

We get a TypeError back, indicating that the > operand is not supported between the instances of this class.

By default, Python won’t implement the __ge__, __le__, __gt__, and __lt__ methods. In order to do this, we need to augment the @dataclass operator by passing in order=True.

# Using comparisons in data classes
@dataclass(order=True)
class Employee:
    name: str
    num_years: int
    age: int

Nik = Employee(name='Nik', num_years=3, age=35)
Katie = Employee(name='Katie', num_years=4, age=34)

print(Nik > Katie)

# Returns:
# True

By simply adding in order=True, we are now able to compare different objects! But, what attributes does it compare?

Python will look at the order in which the fields are defined and compare their values. The first field is name, which while being a string carries an assigned value. Strings are compared lexicographically, meaning that if the first letters are the same, it moves to the second and so on. It draws on the ascii value of a character, which can be accessed using the order() function.

Because 'N' is greater than 'K' (78 versus 75), the comparison returns True. However, it doesn’t make much sense to compare people’s names. We could simply change the ordering, but this would still lead to issues all other things being equal. It’s better to tell Python to ignore the field for comparison.

In order to do this, we need to use the field function again to specify not to include a field in comparisons:

# Ignoring a field for comparison
@dataclass(order=True)
class Employee:
    name: str = field(compare=False)
    num_years: int
    age: int

Nik = Employee(name='Nik', num_years=3, age=35)
Katie = Employee(name='Katie', num_years=4, age=34)

print(Nik > Katie)

# Returns:
# False

If the num_years field had been the same for both employee’s, then the comparison would have moved to the next field.

Making Immutable Data Classes

Another great feature of data classes is the ability to make them immutable, or, rather, to freeze them. What this means is that once an object is created, its attributes can’t be modified.

Doing this is as simple as using the frozen=True parameter when decorating a class. Let’s see what this looks like:

# Creating a frozen data class
@dataclass(frozen=True)
class Employee:
    name: str
    num_years: int
    age: int

Let’s now create a new Employee object and see what happens when we try to modify it:

# Creating a frozen object
Nik = Employee(name='Nik', num_years=1, age=35)
Nik.num_years = 2

# Throws:
# FrozenInstanceError: cannot assign to field 'num_years'

We can see that when we try to modify an attribute of a frozen, Python will throw a FrozenInstanceError, indicating that we can’t assign a value to a field.

The way that this works is that when we set frozen=True in the decorator function call, Python automatically creates the __deleteattr__() method and the __setattr__() method.

Inheritance with Data Classes

Inheritance is used when you want to extend the functionality (including attributes and methods) of an existing data class by inheriting its fields and methods into a new class. For example, we can create a more specialized version of a data class which reuses its structure and behaviour.

Let’s imagine for a moment that instead of creating an Employee class, we had created a Person class instead:

# Creating a data class
from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

This Person class is fairly generic: it provides information about a Person, regardless if they are an employee or not. However, say we want to use these fields in our more specific Employee class, we can inherit the fields (and methods, if we had any) using class inheritance.

This works much in the same way as inheritance does in normal Python classes. Let’s take a look:

# Inheriting Data Classes
@dataclass
class Employee(Person):
    location: str
    year_joined: int

Now when we go to create a new Employee object, we can use all of the fields available in the Person class as well. Let’s take a look:

# Creating an object using inheritance
Nik = Employee(name='Nik', age=35, location='Toronto', year_joined=2019)
print(Nik)

# Returns:
# Employee(name='Nik', age=35, location='Toronto', year_joined=2019)

We can see that when we created our object, we were able to pass in fields that were initially defined in the Person class.

Something important to note is that the fields are created in the order in which they are defined, starting with the base class. This means that the rules you have learned about data classes so far must be followed. For example, if a default field exists in the base class, all subclass fields must also have default values. Let’s take a look at this a bit more:

# Creating a data class
from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int = 35

@dataclass
class Employee(Person):
    location: str
    year_joined: int

Nik = Employee(name='Nik', age=35, location='Toronto', year_joined=2019)

# Throws:
# TypeError: non-default argument 'location' follows default argument

As you’re defining data classes that will serve as base classes, be mindful of this limitation!

Post Init Processing with __post_init__

Data classes also allow you to process values after initialization using the __post_init__() method. As the name implies, it run after the class is initialized. This is helpful for a number of reasons:

  • When you need to validate or transform field values after they are assigned.
  • When you need to compute additional attributes based on initial input values.

Data Class Help Validate Fields

Let’s take a look at how data classes can help you validate fields entered by users. For example, if wanted to make sure that the year an employee started was on or after 2019, we could use the following:

# Using __post_init__ to validate fields
from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int

    def __post_init__(self):
        if self.year_joined <= 2019:
            raise ValueError('Year joined must be on or after 2019.')
        
Evan = Employee(name='Evan', age=40, year_joined=2017)

We can see that the class throws a ValueError when we try to pass in a value less than 2019. This can be helpful for validating important fields to make sure that they’re as expected.

Similarly, while data classes don’t enforce the type hinting, you could use this validation to enforce them. Let’s modify our function to make sure that the data type is an integer:

# Using __post_init__ to validate fields
from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int

    def __post_init__(self):
        if type(self.year_joined) != int:
            raise TypeError('year_joined must an integer')
        if self.year_joined <= 2019:
            raise ValueError('Year joined must be on or after 2019.')
        
Evan = Employee(name='Evan', age=40, year_joined='2019')

# Raises:
# TypeError: year_joined must an integer

This type of validation can ensure that your code performs as it’s expected to.

Mofidying Field Values After Initialization

One of the other great things that you can do with post initialization functions is modify field values that depend on other values. Say we wanted to calculate how many years an employee has been at the company, based on the year that they joined. We could do this using the __post_init__() method, as shown below:

# Using __post_init__ to modify fields
from dataclasses import dataclass
from datetime import date

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int
    inaugural_employee: bool = field(init=False)
    num_years: int = field(init=False)

    def __post_init__(self):
        self.inaugural_employee = True if self.year_joined == 2019 else False
        self.num_years = date.today().year - self.year_joined

Nik = Employee(name='Nik', age=35, year_joined=2019)
print(Nik)

# Returns:
# Employee(name='Nik', age=35, year_joined=2019, inaugural_employee=True, num_years=5)

In the code block above, we implemented two new features:

  1. We used the field() function in our field declaration to specify that the field should not be initialized, by using the init=False argument, and
  2. We used the __post_init__() method to calculate the number of years based on today’s year.

This allows us to create fields that are derivative of other fields, without forcing the end user into thinking about it or duplicating entry.

Now, while this approach works, it also creates a fairly severe issue for the num_years attribute. Because this function is only run when the object is instantiated, the property doesn’t update if the year_joined attribute is updated.

We can check this by creating an object and then updating the attribute.

Nik = Employee(name='Nik', age=35, year_joined=2019)
print(Nik)
Nik.year_joined = 2020
print(Nik)

# Returns:
# Employee(name='Nik', age=35, year_joined=2019, inaugural_employee=True, num_years=5)
# Employee(name='Nik', age=35, year_joined=2020, inaugural_employee=True, num_years=5)

We can see from the code block above that the num_years attribute didn’t update. This is, of course, a fairly big problem.

Using a @property decorator for dynamic attributes

We can fix this, easily, by using an @property decorator to define the property instead. Let’s take a look at what this looks like:

# Using __post_init__ to modify fields
from dataclasses import dataclass
from datetime import date

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int
    inaugural_employee: bool = field(init=False)

    def __post_init__(self):
        self.inaugural_employee = True if self.year_joined == 2019 else False

    @property
    def num_years(self):
        return date.today().year - self.year_joined

Nik = Employee(name='Nik', age=35, year_joined=2019)
print(Nik)

# Returns:
# Employee(name='Nik', age=35, year_joined=2019, inaugural_employee=True)

While this property won’t, by default, show in the __repr__() method, we can check that the value updates by accessing the property:

Nik = Employee(name='Nik', age=35, year_joined=2019)
print(Nik.num_years)
Nik.year_joined = 2020
print(Nik.num_years)

# Returns:
# 5
# 4

Customizing the data class str method

Now that we’ve solved that issue, let’s change how the object is printed to include our dynamic property. For this, we can use the __str__() method to replicate a print out that we would have expected:

# Customizing the print out of a data class
from dataclasses import dataclass
from datetime import date

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int
    inaugural_employee: bool = field(init=False)

    def __post_init__(self):
        self.inaugural_employee = True if self.year_joined == 2019 else False

    @property
    def num_years(self):
        return date.today().year - self.year_joined
    
    def __str__(self):
        return f"Employee(name={self.name}, age={self.age}, year_joined={self.year_joined}, inaugural_employee={self.inaugural_employee}), num_years={self.num_years})"

Nik = Employee(name='Nik', age=35, year_joined=2019)
print(Nik)

# Returns:
# Employee(name=Nik, age=35, year_joined=2019, inaugural_employee=True), num_years=5)

We can see that as our data classes become more advanced, we need to make some concessions about the complexity of the code itself, too. Let’s now dive into another feature: keyword-only data classes.

Keyword Only Data Classes (Python 3.10+)

One of the things we’ve done so far is use keyword arguments for all of our data class instantiations. For example, in the following data class:

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int

We have been instatiating our objects using:

Nik = Employee(name='Nik', age=35, year_joined=2019)

While this works, we could simplify our code by using:

Nik = Employee('Nik', 35, 2019)

This will actually return the same object because we’re using positional arguments.

However, there may be times when you want your code to be more explicit. In Python 3.10, an optional argument was introduced to data classes to allow for keyword-only data classes. By default, Python will set the kw_only= argument to False. However, let’s see what happens when we set it to True:

# Using keyword-only arguments (Python 3.10+)
from dataclasses import dataclass

@dataclass(kw_only=True)
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee('Nik', 35, 2019)

# Raises:
# TypeError: Employee.__init__() takes 1 positional argument but 4 were given   

In the code block above, we implemented our data class using a keyword only specification. Then, when we tried to create our object using only positional arguments, the instantiation failed.

Using Match Arguments in Data Classes (Python 3.10+)

The Python match-case statements were introduced in Python 3.10. One of the amazing things about this is that this functionality was also extended to data classes. This allows you to destructure a data class to better use values conditionally.

Let’s take a look at an example of what this looks like:

# Using match-case arguments in data classes
from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee(name='Nik', age=35, year_joined=2019)
Katie = Employee(name='Katie', age=34, year_joined=2020)

def describe_employee(employee: Employee):
    match employee:
        case Employee(name=name, age=age, year_joined=2019):
            return f'{name} is {age} and is eligible for a promotion.'
        case Employee(name=name, age=age, year_joined=year_joined):
            return f'{name} is {age} and is not eligible for a promotion.'

print(describe_employee(Nik))
print(describe_employee(Katie))

# Returns:
# Nik is 35 and is not eligible for a promotion.
# Katie is 34 and is not eligible for a promotion.

In the code block above, we used our original data class and created two employees, Nik and Katie. We then created a function that takes an Employee as its only argument. The function uses a match statement to match the employee, where each case uses the employee’s information. It’s important to note here that in the first line, we specified that the name= and age= can be anything, but the year_joined= has to be equal to 2019. If that’s the case, then we return that the employee is eligible for promotion.

The second case statement is a catch-all for any other situation and will indicate that the employee is not eligible for a promotion.

Using Slots for Data Classes (Python 3.10+)

Normally, Python will store an object’s properties using a dynamic dictionary using the __dict__() method, which allows us to add, change, and remove attributes at run-time. While this flexibility is helpful, it comes at (potentially) a lot of memory overhead. Slots, on the other hand, replace these dictionaries with a fixed structure, saving memory. Since Python 3.10, this functionality also extends to data classes!

If you’re creating many objects, the use of slots can be helpful and save a lot of time. Let’s explore this with an example.

# Creating a data class with slots
from dataclasses import dataclass

@dataclass(slots=True)
class Employee_slots:
    name: str
    age: int
    year_joined: int

We can see how easy it is to instruct Python to create a data class with slots. The only modification we needed to make was to pass slots=True into the decorator function.

Let’s now see how much time we can save:

%%timeit
no_slots = [Employee(name='Nik', age=35, year_joined=i+1) for i in range(50000000)]
for employee in no_slots:
    _ = employee.year_joined

# Returns:
# 14.1 s ± 1.11 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

And now let’s try it with slots:

%%timeit
slots = [Employee_slots(name='Nik', age=35, year_joined=i+1) for i in range(50000000)]
for employee in slots:
    _ = employee.year_joined

# Returns:
# 11.6 s ± 394 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

We can see from the two code blocks above that we achieved a nearly 20% time savings by using slots. This is an important feature to keep in mine as you program data classes that may require many different instances in production environments.

Limitations of slots with data classes

There are some key limitations that we need to be aware of when working with slots:

  • Attempting to create an additional attribute will raise an AttributeError, since the space for each attribute is already defined and fixed
  • When working with inheritance, both the parent and child must use slots and the attributes must be unique between each
  • You may run into serialization complications that rely on dynamic properties, such as using JSON or Pickle serialization

Now that we’ve covered off the new features in Python 3.10, let’s dive into hashing data classes.

Hashing Data Classes

Hashing a data class allows you to use your objects to be used as keys in dictionaries or added to sets. Put simply, a hash is a numeric value generated by a hash function that is used to compare and store data in hash-based data structures such as dictionaries and sets.

For an object to be hashable, it must:

  • Have a valid __hash__() method.
  • Be immutable (its hash value must not change during its lifetime).

Data classes allow you to easily hash an object by setting the frozen=True parameter when defining the class. This will generate the __hash__() method for you.

Let’s see what this looks like:

# Using an unfrozen data class as a dictionary key
from dataclasses import dataclass

@dataclass()
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee(name='Nik', age=35, year_joined=2019)
employees = {Nik: True}

# Raises:
# TypeError: unhashable type: 'Employee'

In order to resolve this error, we need to freeze the data class. Let’s see how this changes things:

# Using a frozen data class as a dictionary key
from dataclasses import dataclass

@dataclass(frozen=True)
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee(name='Nik', age=35, year_joined=2019)
employees = {Nik: True}
print(employees)

# Returns: 
# {Employee(name='Nik', age=35, year_joined=2019): True}

We can see now that the class is able to used as a dictionary key, as it’s now hashable.

Using unsafe hashing in data classes

Python provides a way to hash mutable data classes by using the unsafe_hash=True parameter when defining the class. This requires special consideration and is best used when the class is logically immutable but can technically still be mutated.

Let’s take a look at how this works:

# Using unsafe hashing
from dataclasses import dataclass

@dataclass(unsafe_hash=True, frozen=False)
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee(name='Nik', age=35, year_joined=2019)
employees = {Nik: True}
print(employees)

# Returns: 
# {Employee(name='Nik', age=35, year_joined=2019): True}

We can see that, despite the class not being frozen, we were able to hash the object. This is because we forced the class to implement a __hash__() method.

Converting Data Classes to Dictionaries or Tuples

The dataclasses library provides two helpful functions for converting data classes to other data types:

  1. asdict is used to convert data classes to dictionaries, and
  2. astuple is used to convert data classes to tuples.

Let’s see what converting a data class to a dictionary and tuple looks like:

# Converting a data class to a dictionary
from dataclasses import dataclass, asdict, astuple

@dataclass
class Employee:
    name: str
    age: int
    year_joined: int

Nik = Employee(name='Nik', age=35, year_joined=2019)
as_dict = asdict(Nik)
as_tuple = astuple(Nik)
print(f'Dictionary: {as_dict}')
print(f'Tuple: {as_tuple}')

# Returns:
# Dictionary: {'name': 'Nik', 'age': 35, 'year_joined': 2019}
# Tuple: ('Nik', 35, 2019)

In the code block above, we imported both of these functions. We then passed our object into the functions to convert the values.

Conclusion

In this tutorial, we explored creating data classes from the fundamentals of creating them to fine-tuning their representations, managing default fields, incorporating methods, handling comparisons, enforcing immutability, and harnessing the power of inheritance. Furthermore, we’ve explored Python 3.10 features, including keyword-only arguments, match-case statements, slots, and hashing capabilities for data classes.

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

Leave a Reply

Your email address will not be published. Required fields are marked *