Use Python to List Files in a Directory (Folder) with os and glob

  • by
Cover image for list all files in a directory with python glob and os listdir
  • Save

In this post, you’ll learn different ways to list files in a directory, using both the OS library and the Glob library.

Working with data, you may find yourself in a situation where you need to combine different files or extract data from the latest file.

Table of Contents

Sample Data to Follow Along With

To follow along, you can download the files provided here.

Each of the files contains sales data for a different month, as shown below:

Sample folder structure for list all files in a directory with python glob and os listdir
  • Save

To follow along, download the files to a folder and make note of the path to that folder.

Importing OS and Glob Libraries

Both the OS and Glob libraries are installed by default. To import them, simply write:

import os
import glob

Use os’s listdir Function to Return all Files in a Directory

The os’s listdir function generates a list of all files (and directories) in a folder.

To use this, simply pass the directory as an argument.

To follow along, load the sample files into a single directory. Pass the path to the folder Files into the argument of the listdir function:

files = os.listdir(file_path)

print(files)
# Returns
# ['November.xlsx', 'October.xlsx', 'Other Files']

Use os’s Walk Function to Return All Files in a Directory and all Sub-directories

If you want to list all the files in a directory and all subdirectories, you can use the os walk function.

This function is a bit more confusing, but take a look at the code below:

files_list = []

for root, directories, files in os.walk(file_path):
	for name in files:
		files_list.append(os.path.join(root, name))

print(files_list)

The walk function returns a set of tuples of three. This for loop traverses those outputs.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Use Glob to List all Files in a Directory

Glob functions similarly to the os listdir function, but you can search for all files or those matching particular conditions.

A major benefit of the glob library is that it automatically includes the path to the file in each item. This can be especially helpful for data-related work.

For example, to return everything in a directory, use the asterisk (*):

file_list = glob.glob("FILE_PATH/*")
print(file_list)

This would return all files and folders in that directory.

Use Glob to Return all Files of a File Type in a Directory

Similar to the example above, you can also return only files matching a certain condition. For example, if you want to only return Excel files, you could write:

file_list = glob.glob("FILE_PATH/*.xlsx")

This would return only the files that include an xlsx extension.

This can be extremely useful in getting filenames of only certain file types.

Combine Data with Pandas and Glob

Using Pandas and glob, it’s easy to combine multiple Excel files into a single dataframe.

For example, if you wanted to combine the November and October files from the sample files, this could be done easily with Glob and Pandas:

file_list = glob.glob("FILE_PATH/*.xlsx")

files = []

for filename in file_list:
    df = pd.read_excel(filename)
    files.append(df)

frame = pd.concat(files, axis=0, ignore_index=True)

print(frame)

This returns a fully combine dataframe of all the Excel files in a folder.

Conclusion

In this post, you learned how to list all files in a folder using the os listdir function, how to traverse folders in a directory and get all file names, and how to use glob to get specific file paths. Finally, you learned how to combine Excel files into a single dataframe using glob and Pandas.

Cover of Introduction to Python for Data Science
  • Save

Want to learn Python for Data Science? Check out my ebook for as little as $10!