In this tutorial, you’ll learn how to use Python to remove punctuation from a string. You’ll learn how to strip punctuation from a Python string using the str.translate()
method, the str.replace()
method, the popular regular expression library re
, and, finally, using for-loops.
Being able to work with and manipulate strings is an essential skill for any Pythonista. Strings you find via the internet or your files will often require quite a bit of work in order to be able to analyze them. One of the tasks you’ll often encounter is the ability to use Python to remove punctuation from a string.
The Quick Answer: Use .translate()
for the fastest performance
Table of Contents
Use Python to Remove Punctuation from a String with Translate
One of the easiest ways to remove punctuation from a string in Python is to use the str.translate()
method. The translate() method typically takes a translation table, which we’ll do using the .maketrans()
method.
Let’s take a look at how we can use the .translate()
method to remove punctuation from a string in Python. In order to do this, we’ll import the built-in string
library, which comes bundled with a punctuation attribute.
import string
a_string = '!hi. wh?at is the weat[h]er lik?e.'
new_string = a_string.translate(str.maketrans('', '', string.punctuation))
print(new_string)
# Returns: hi what is the weather like
The .maketrans()
method here takes three arguments, the first two of which are empty strings, and the third is the list of punctuation we want to remove. This tells the function to replace all punctuation with None
.
Want to learn more? If you want to learn how to use the translate method (and others!) to remove a character from a string in Python, check out my in-depth tutorial here.
What is Python’s string.punctuation?
Python comes built-in with a library, string
, which includes an attribute string.punctuation
that includes many built-in punctuation characters. Because the library is built-in, you don’t need to worry about needing to install it.
In case you’re curious about what punctuation is included in the string.punctuation
, let’s have a quick look:
print(string.punctuation)
# Returns: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
Use Python to Strip Punctuation from a String with Regular Expressions (regex)
The Python regular expression library, re
, feels like it can do just about anything – including stripping punctuation from a string!
Regular expressions are great because it comes built-in with a number of helpful character classes that allow us to select different types of characters. For example, \w\s looks for words or whitespaces. We can select the opposite of this (i.e., anything that isn’t a word or whitespace) using the ^ character. This, then, allows us to select anything that isn’t a word or whitespace, which in our case, it selects punctuation.
Let’s see how we can use regex to remove punctuation in Python:
import re
a_string = '!hi. wh?at is the weat[h]er lik?e.'
new_string = re.sub(r'[^\w\s]', '', a_string)
print(new_string)
# Returns: hi what is the weather like
This is a great approach that looks for anything that isn’t an alphanumeric character or whitespace, and replaces it with a blank string, thereby removing it.
Use Python to Remove Punctuation from a String with str.replace
The str.replace()
method makes easy work of replacing a single character. For example, if you wanted to only replace a single punctuation character, this would be a simple, straightforward solution.
Let’s say you only wanted to replace the !
character from our string, we could use the str.replace()
method to accomplish this. Let’s take a look at how to:
a_string = '!hi. wh?at is the weat[h]er lik?e.'
new_string = a_string.replace('!', '')
print(new_string)
# Returns: hi. wh?at is the weat[h]er lik?e.
What we’ve done here, is append the .replace()
method to our string. The first parameter is the string to replace, which in this case is our !
character. The second parameter is what to replace it with, which in this case is an empty string.
In the next example, you’ll learn how to use a for loop to replace all punctuation from a string using a for-loop.
Use Python to Strip Punctuation from a String using a for-loop
In the previous section of the tutorial, you learned how to use the str.replace()
method to remove a single punctuation character. In this section, we’ll repeat this example, but use a for-loop to be able to remove every punctuation character.
Let’s see how we can do this in Python:
import string
a_string = '!hi. wh?at is the weat[h]er lik?e.'
for character in string.punctuation:
a_string = a_string.replace(character, '')
print(a_string)
# Returns: hi what is the weather like
One of the things to note here is that we’re writing over our original string here. We can’t assign a new string, as it will continuously replace itself.
Now that you’ve learned a number of methods, let’s see which of these methods is the fastest.
What is the fastest way to strip a Python String from Punctuation?
In this tutorial, you’ve learned three different methods to remove punctuation from a string in Python. Let’s see which of these methods is the fastest.
For this test, we created a string that’s over 1,000,000,000 characters long and removed all punctuation from a string using Python.
Let’s take a look at the results:
Method | Time Taken |
---|---|
str.translate() | 2.35 seconds |
regular expressions | 88.8 seconds |
for loop with str.replace() | 20.6 seconds |
The str.translate()
method is the fastest way to remove punctuation from a string in Python – sometimes up to 40 times faster!
Of course, speed isn’t everything, but finding code that significantly slows down your code will often lead to a poorer user experience.
Frequently Asked Questions
Python comes with a built-in library, string, that includes all common punctuation characters using the string.punctuation
attribute. Included are: !”#$%&'()*+,-./:;<=>?@[\]^_`{|}~
The easiest way to replace punctuation with a space in Python is to use the .translate()
method with the string.punctuation
library. Simply write: a_string.str.maketrans(dict.fromkeys(string.punctuation, ' '))
.
Conclusion
In this post, you learned how to strip punctuation from a Python string. You learned how to do this using the str.translate()
method, as well as regular expressions. You also learned how to do this with the .replace()
method as well as with a for-loop. Finally, you learned which of these methods is the fastest.
To learn more about the str.translate()
method, check out the official documentation here.
Additional Resources
To learn more about related topics, check out the tutorials below:
Sure, translate() is faster..
But if you want something that doesn’t need libraries and can be done in 1 line:
outString = “”.join(ch for ch in String if ch not in “,.?!'”)