Python: Get a File’s Extension (Windows, Mac, and Linux)

  • by
Python Get File Extension Cover Image

In this tutorial, you’ll learn how to use Python to get a file extension. You’ll accomplish this using both the pathlib library and the os.path module.

Being able to work with files in Python in an easy manner is one of the languages greatest strength. You could, for example use the glob library to iterate over files in a folder. When you do this, knowing what the file extension of each file may drive further decisions. Because of this, knowing how to get a file’s extension is an import skill! Let’s get started learning how to use Python to get a file’s extension, in Windows, Mac, and Linux!

The Quick Answer: Use Pathlib

Quick Answer - Python Get a File's Extension (Pathlib and Os)

Using Python Pathlib to Get a File’s Extension

The Python pathlib library makes it incredibly easy to work with and manipulate paths. Because of this, it makes perfect sense that the library would have the way of accessing a file’s extension.

The pathlib library comes with a class named Path, which we use to create path-based objects. When we load our file’s path into a Path object, we can access specific attributes about the object by using its built-in properties.

Let’s see how we can use the pathlib library in Python to get a file’s extension:

# Get a file's extension using pathlib
import pathlib
file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx"

extension = pathlib.Path(file_path).suffix
print(extension)

# Returns: .xlsx

We can see here that we passed a file’s path into the Path class, creating a Path object. After we did this, we can access different attributes, including the .suffix attribute. When we assigned this to a variable named extension, we printed it, getting .xlsx back.

This method works well for both Mac and Linux computers. When you’re working with Windows, however, the file paths operate a little differently.

Because of this, when using Windows, create your file path as a “raw” string. But how do you do this? Simply prefix your string with a r, like this r'some string'. This will let Python know to not use the backslashes as escape characters.

Now that we’ve taken a look at how to use pathlib in Python to get a file extension, let’s explore how we can do the same using the os.path module.

Want to learn more? Want to learn how to use the pathlib library to automatically rename files in Python? Check out my in-depth tutorial and video on Towards Data Science!

Using os.path in Python to Get a File’s Extension

The os.path module allows us to easily work with, well, our operating system! The path module let’s us use file paths in different ways, including allowing us to get a file’s extension.

The os.path module has a helpful function, splitext(), which allows us to split file-paths into their individual components. Thankfully, splitext() is a smart function that knows how to separate out file extensions, rather than simply splitting a string.

Let’s take a look at how we can use the splitext() function to get a file’s extension:

# Get a file's extension using os.path
import os.path
file_path = "/Users/datagy/Desktop/Important Spreadsheet.xlsx"

extension = os.path.splitext(file_path)[-1]
print(extension)

# Returns: .xlsx

Let’s take a look at what we’ve done here:

  1. We import os.path. Rather than writing from os import path, we use this form of import so that we can leave the variable path open and clear.
  2. We load our file_path variable. Remember: if you’re Windows, make your file path a raw string, by pre-fixing an r before the opening quotation mark.
  3. Apply the splitext() function to the file path. We then access the item’s last item.

The splitext() returns a tuple: the first part will be the filename, and the second will be its extension. Because of this, if we only want a file’s extension, we can just access the tuples last item.

How to Use a Python File Extension

Now that you’ve learned two different ways to use Python to get a file’s extension, how can you apply this?

One handy method is to act on, say, only Excel files. If you’re writing a for-loop, you could first check to see if a file is an Excel file and then load it into a Pandas dataframe. This approach would let you skip the files that may not actually contain any data.

Let’s see how to do this in Python and Pandas:

# Get a file's extension using os.path
import pathlib
import pandas as pd

file_paths = ["/Users/datagy/Desktop/Important Spreadsheet.xlsx", "/Users/datagy/Desktop/A Random Document.docx"]

df = pd.DataFrame()
for file in file_paths:
    if pathlib.Path(file).suffix in ('.xls', '.xlsx'):
        temp_df = pd.read_excel(file)
        df = df.append(temp_df)

Now that you’ve learned a practical example, check out my other Pandas tutorials here, including how to calculate an average in Pandas and how to add day’s to a Pandas columns.

Conclusion

In this post, you learned how to use Python to get a file’s extension. You learned how to do this using both the pathlib library as well as the os.path module, using the splitext() function. You learned how to do this in Windows, Mac and Linux, in order to ensure that your code can run across systems.

To learn more about the splitext() function, check out the official documentation here.

Tags: