Skip to content

Python Object-Oriented Programming (OOP) for Data Science

Python Object-Oriented Programming (OOP) for Data Science Cover Image

In this tutorial, you’ll learn about Python object-oriented programming (OOP) and how to it relates to the domain of data science. Object-oriented programming can be a concept that’s hard to grasp. This tutorial aims to explain this important concept in an easy to follow manner. OOP refers to a programming structure that bundles related properties and behaviors into objects.

By the end of reading this tutorial tutorial, you’ll have learned:

  • Understand what object-oriented programming is and when to use it
  • How to create classes in Python to create new objects
  • How to work with object attributes and methods to define and modify objects
  • How to work with class inheritance and polymorphism to modify and extend your Python classes

Introduction to Object-Oriented Programming

Object-oriented programming (or OOP) refers to a programming paradigm that’s based on the concept of, well, objects. In this paradigm, objects can contain both data and code. These objects can also have attributes (properties) and methods (behaviors).

So, in short, objects have properties and behaviors. Let’s think of an object representing a person, for example. The person has many different properties, such as a name, age, hair color. They can also do things, such as walk or greet people. Object-oriented programming models real-world entities in the form of organized code that can have properties and can do things.

Object-oriented programming is focused on the following concepts:

  • Encapsulation: The idea behind encapsulation is that all properties and methods of an object are kept private and safe from being inherited by another object. This allows you to define both public and private methods and attributes. Public methods can be used by people using your program, while private methods cannot.
  • Abstraction: Behind the concept of abstraction is the concept of encapsulation. Abstraction allows you to expose only high-level mechanisms for doing certain things while hiding away (or “abstracting”) the complex mechanisms behind it. For example, your car abstracts all the mechanics behind turning your car on by pressing a button or turning a key.
  • Inheritance: Inheritance allows you to create similar objects that maintain a base number of properties (attributes) and abilities (methods). This allows you to create one type of object that can be used as a base for many other objects without needing to repeat your code. For example, the base of a car can be used to create objects such as trucks, SUVs, and camper vans.
  • Polymorphism: The concept of polymorphism builds on the concept of inheritance. While it can be helpful to define child objects, these child objects may operate slightly differently. Polymorphism allows you to define a child object but create and use its own methods. Following, the example of cars and trucks, the method to turn on a truck may be slightly different from turning on a car.

Why Object-Oriented Programming Matters for Data Science

In many cases in data science, the paradigm of procedural programming will be sufficient. This structures a program almost like a recipe or an instruction manual. In this case, each line of code is executed in order. So, what’s in it for you to learn object-oriented programming when all you want to do is data analysis?

There are two main reasons for doing this:

  1. Objects are everywhere in Python (in fact, everything is an object)
  2. Code organization: As your programs grow, the complexity grows too. Object-oriented programming allows you to organize your code, making it easier to test, debug, and expand.

In the early stages of working with Python for data science, it may seem like the concepts behind object-oriented programming don’t make sense or don’t apply. But don’t despair!

Learning to think in the paradigm of object-oriented programming allows you to better understand how many Python applications work. For example, a Pandas DataFrame is a complex object that has many attributes and methods. Simply knowing that a DataFrame is an object allows you to understand why the DataFrame has certain methods, while, say, a Python list doesn’t.

As a data scientist, you won’t always need to use object-oriented programming. Don’t fall into the trap that everything needs to follow an OOP paradigm. Later in the tutorial, you’ll learn some excellent use cases for when you should be aiming to use object-oriented programming.

Why Create Objects in Python?

In many cases, Python provides you with the ability to define simple concepts using primitive data types such as lists and dictionaries. For example, you could create a list to hold information about students in your class. Let’s create a few lists to contain this information:

# Creating lists to store data
nik = ['Nik', 33, 'datagy.io', 'Toronto']
kate = ['Kate', 33, 'government', 'Toronto']
evan = ['Evan', 40, 'teaching', 'London']

While this approach works, there are a number of different issues with this approach.

  1. You need to remember the position of each item.
  2. When an item doesn’t exist, index [3] may point to a different element.
  3. It’s not clear what each item represents.

While we could turn these lists into dictionaries, or a defaultdict, there are still a number of issues with this approach. You need to define functions to allow these dictionaries to do something. These functions are then accessible for anything else in your program. This can make it quite confusing for readers of your code and lead to unintended consequences.

All of these issues can be resolved by creating a class, which can hold different pieces of information, but can also contain functions. In the next section, you’ll learn how to create your first class in Python!

Objects and Classes in Python

In Python, you can define a class using the class keyword. The keyword is followed by the name of the class and a colon. That, in itself, is enough to define a class! Let’s create a new class following our convention above:

# Creating your first class
class Person:
   pass

By convention, a Python class is written with capital letters separating each word. If we had a multi-word class, then each word would be a capital letter.

Let’s break down a few pieces of terminology before moving on to something more detailed:

  1. A class is the blueprint for creating objects. For example, the class Person could contain information such as a name and an age and functions, but it doesn’t. It only contains the instructions for creating these.
  2. An instance is the object that is built when a class is created. If we were to create a Person object, this object would contain information and state about that specific object.

Let’s learn how we can expand on our method by giving it some properties. This is done using what’s called the __init__ method, also known as the constructor method.

Python init: The Constructor Method

Right now, our Person class doesn’t contain any information nor does it do anything. Creating this class, as it is right now, doesn’t actually accomplish anything. Let’s see how we can expand on this by setting the initial attributes of a person.

# Adding details to the Person class
class Person:
    def __init__(self, name, age, company):
        self.name = name
        self.age = age
        self.company = company

Our class Person now has three attributes, name, age, and company. When we create a new person, we can pass these parameters in to make our person a bit more interesting!

# Creating our first Person instance
Nik = Person('Nik', 33, 'datagy.io')

At first glance, this function looks a little odd. The __init__() function is known as the constructor method. This method sets the initial state of the object, by initializing that instance of the class.

The __init__() function can contain any number of parameters, but the first argument must always be the self variable.

Understanding self in Python Object-Oriented Programming

The self parameter is used to represent that instance of a class. self let’s Python know that the attribute or methods should be applied to that object (and only that object). In essence, the self parameter binds the attributes with the given arguments.

When you created the first object above, the variables that were passed in were assigned the self bindings. So, for example:

  • self.name was assigned the argument of the name parameter
  • self.company was assigned the argument of the company parameter

So, self points to the instance of that class. In Python, self enables objects to access its attributes and methods and makes these instances unique.

Python Class and Instance Attributes

In this section, you’ll learn more about attributes in Python classes. In fact, there are two main types of attributes contained in Python classes. There are class attributes and instance attributes. Let’s load some code again and take a look at the difference:

# Looking at the difference between class and instance attributes
class Person:
    # Class attribute
    type = 'Human'

    def __init__(self, name, age, company):
        # Instance attributes
        self.name = name
        self.age = age
        self.company = company

Nik = Person('Nik', 33, 'datagy.io')

In the example above we have declared four attributes. Three of these attributes are instance attributes while only one is a class attribute. So what’s the difference? The table below breaks down some of the key differences between class and instance attributes:

DescriptionInstance AttributeClass Attribute
CreatedInside the __init__() functionCreated before the __init__() function
References selfDoesn’t reference selfIs specified using self.attribute_name
Generic / SpecificGeneric for all instances, unless modifiedSpecific to a class instance
Differences between class and instance attributes

So, when should you use one over the other?

  • If you want an attribute to be the same for every instance of your class, such as the type attribute in the Person class, then use a class attribute. This prevents you from needing to pass it in as a value each time you create a class.
  • If you want an attribute to specific to an object, then use an instance attribute. This lets you customize the object to meet your needs.

Accessing Object Attributes in Python

Now that you have an understanding of Python object attributes, let’s see how you can access these attributes. Python object attributes can be accessed using dot notation. This is a familiar way of accessing data in lists or dictionaries and it works just like you’d expect.

class Person:
    # Class attribute
    type = 'Human'

    def __init__(self, name, age, company):
        # Instance attributes
        self.name = name
        self.age = age
        self.company = company

Nik = Person('Nik', 33, 'datagy.io')

This object now has four attributes. If you’re working in an IDE such as VS Code, these attributes are actually even accessible via Intellisense. This allows you to be save some time typing and make your attribute names are called correctly.

Accessing object attributes in Python

Let’s see how we can print out the attribute for company in our object Nik.

# Printing an object's attribute
print(Nik.company)

# Returns: datagy.io

We can even modify these attributes by simply directly assigning a new value to them. Say I had a birthday and wanted to update my age, I could simply write:

# Modifying an object's attribute
Nik.age = 34
print(Nik.age)

# Returns: 34

The state of that object is maintained and can be modified. This is one of the perks of using object-oriented programming – you can run your program while your program’s data is maintained in a helpful, easy to understand manner.

Python Functions and Methods

So far, the objects you’ve created have contained information, but they don’t actually do anything. In Python, we use functions to create repetitive actions. In object-oriented programming, functions also exist. However, methods refer to functions contained in an object.

The takeaway here is: While a function can be called from anywhere, a class method can only be called from an instance of that class. Because of this, every method is a function but not every function is a method.

Let’s define our first object method! We’ll create a method that allows our object to greet someone using their name:

# Writing your first object method
class Person:
    type = 'Human'

    def __init__(self, name, age, company):
        self.name = name
        self.age = age
        self.company = company
    
    def greet(self):
        print('Hi there! My name is ', self.name)

Nik = Person('Nik', 33, 'datagy.io')
Nik.greet()

# Returns: Hi there! My name is  Nik

Defining an object method is nearly the same as creating a regular function. There are a number of key differences:

  1. The function is defined inside the object
  2. The self argument is required
  3. The first argument is required to be self

Remember, self points to that instance of the object. Because of this, the method can access its attributes. However, even if the function doesn’t require any attributes from self, the argument is required.

Let’s now create a method that modifies the object itself. Let’s create a method that allows our object to have a birthday. This will increase the age by one, using the augment assignment operator.

# Adding a birthday method to our class
class Person:
    type = 'Human'

    def __init__(self, name, age, company):
        self.name = name
        self.age = age
        self.company = company
    
    def greet(self):
        print('Hi there! My name is ', self.name)

    def have_birthday(self):
        self.age += 1

Nik = Person('Nik', 33, 'datagy.io')
print(Nik.age)          # Returns: 33
Nik.have_birthday()
print(Nik.age)          # Returns: 34

While the complexity of our method have_birthday() is quite straight-forward, this drives home the point of object-oriented programming. We are able to abstract away the mechanics behind what the program does behind an easy-to-understand method.

Class Inheritance in Python

In this section, you’ll learn about an important concept related to Python object-oriented programming: inheritance. Inheritance is a process by which a class takes on the attributes and methods of another class. However, the class can also have its own attributes and methods.

In the case of inheritance, the original class is referred to as the parent class, while the class that inherits is referred to as the child class.

What’s special about child classes in Python is that:

  • They inherit all attributes and methods from the parent class
  • They can define their own attributes and methods, and
  • They can overwrite the attributes and methods of the parent class

Let’s see how we can leverage the concept of inheritance to create a new class: Employee. Each Employee will have the same attributes and methods of a Person, but will also have access to some of its own:

# Creating your first sub-class
class Employee(Person):
    def __init__(self, name, age, company, employee_number, income):
        super().__init__(name, age, company)
        self.employee_number = employee_number
        self.income = income

    def do_work(self):
        print("Working hard!")

kate = Employee('Kate', 33, 'government', 12345, 90000)

Now that we’ve created a subclass of Employee, we can create these objects. The objects will have access to the same methods and attributes, but also any additional attributes or methods. There are a few things to note here:

  • super().__init__() is included in the first line of the __init__() function of the subclass. This allows the class to inherit all the attributes from the parent class, without needing to repeat them.
  • The super() function takes all the original arguments of the parent class
  • We didn’t need to repeat any methods of the original class, but can still access them.

Let’ see how we can access a parent class method:

# Accessing a parent class method
kate = Employee('Kate', 33, 'government', 12345, 90000)
kate.greet()

# Returns: Hi there! My name is  Kate

In the example above, while the class Employee doesn’t explicitly define the greet() method, it has access to it by the power of inheritance! You can also force inherited classes to conform to a certain behavior using abstract base classes.

Polymorphism in Python Classes

Now, let’s say you wanted to ensure that your employees used a more formal greeting. The parent class, Person, you defined earlier already has a greet() method. Let’s see how you can modify the behavior of the child class, Employee, to have its own unique greeting.

In Python, polymorphism is as simple as defining that method itself in the child class. This allows you to have overwrite any parent methods without needing to worry about any overhead. Let’s see how we can implement this:

# Polymorphism in Python classes
class Employee(Person):
    def __init__(self, name, age, company, employee_number, income):
        super().__init__(name, age, company)
        self.employee_number = employee_number
        self.income = income

    def greet(self):
        print('Welcome! How may I help you?')

    def do_work(self):
        print("Working hard!")

kate = Employee('Kate', 33, 'government', 12345, 90000)
kate.greet()

# Returns: Welcome! How may I help you?

Here, you defined a new method greet() that behaves differently than the method of the same name in the parent class. This process is known as method overriding. Polymorphism allows you to access the overridden methods and attributes in a child class.

When Should You Use Object-Oriented Programming?

So, you’ve learned quite a lot about object-oriented programming in Python. You may still be wondering, “How does this apply to learning data science in Python?” In many cases, procedural programming may be enough for you at the beginning. There are two key reasons why you’ll want to learn object-oriented programming even if you’re primarily relying on procedural programming.

Python objects are everywhere – literally. Everything in Python is an object (whether it be an integer or even a function). Understanding how object methods and attributes work allow you to better understand how Python itself works. Objects are a critical part of data science libraries. Understanding, for example, that a DataFrame is an object opens up the understanding of how DataFrame methods can work to manipulate your data.

Object-oriented programming in Python allows you to organize your code. Not every project requires you to use object-oriented programming. But once your program gains complexity and / or users, it may be helpful to start thinking of object-oriented programming. Similar to how functions allow you to organize and abstract code, objects do as well.

Exercises

It’s time to check your understanding. Give the exercises below a shot. If you need help or want to check your solution, simply toggle the question to reveal a sample solution.

Develop an Employee method that gives the employee a 10% raise.

class Employee(Person):
    def __init__(self, name, age, company, employee_number, income):
        super().__init__(name, age, company)
        self.employee_number = employee_number
        self.income = income

    def get_raise(self):
        self.income *= 1.1

Python methods can also take arguments. Modify the greet() method in the Person class to pass in a person’s name to personalize the greeting.

class Person:
    type = 'Human'

    def __init__(self, name, age, company):
        self.name = name
        self.age = age
        self.company = company
    
    def greet(self, greeting_name):
        print('Hi there ', greeting_name,'! My name is ', self.name)

What does super() do?

The super() function allows you to inherit methods and attributes from a parent class when it’s passed into a child class’s __init__() function.

Conclusion and Recap

In this tutorial, you learned how to use object-oriented programming in Python and how it relates to the realm of data science. The section below provides a quick recap of Python object-oriented programming:

  • Object-oriented programming is related to four main concepts: encapsulation, abstraction, inheritance, and polymorphism
  • Everything in Python is an object – understanding OOP allows you to better understand the concepts behind data science libraries
  • OOP allows you to continue working with procedural programming, but in a more structured way
  • The __init__() method allows you to pass in instance attributes. Class attributes are defined outside of the constructor method
  • Object attributes can be accessed using dot notation, similar to accessing dictionary items
  • Methods are functions defined in a class, which can only be accessed by that class (or any child class)
  • Class inheritance allows you to reuse the code of parent classes while adding unique attributes and methods to a child class
  • Polymorphism allows you to overwrite any methods or attributes defined in a parent class

Additional Resources

To learn more about related topics, check out the tutorials below:

Nik Piepenbreier

Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials.View Author posts

7 thoughts on “Python Object-Oriented Programming (OOP) for Data Science”

  1. Sorry for not to be consistent.
    But I want to ask you nik, will this content be available even after 30days because this is awesome, and I want revise it for as long as I want.
    Will be glad to get an answer from you.

        1. hey just a PSA, python will say that Person isn’t defined a lot during this and it’s because it isn’t defined in the codes in this lesson.

          It should be like this first:

          class Person:
          def __init__(self, name, age, company):
          self.name = name
          self.age = age
          self.company = company

Leave a Reply

Your email address will not be published. Required fields are marked *