Here is a summary of how to delete various types of spaces such as half-width spaces and full-width spaces.
macOS Catalina version 10.15.4 python 3.8.0
Use str.split ()
#\u3000 is a full-width space
text = "a\u3000 b\t\nc\r\n"
text = ''.join(text.split())
Use str.splitlines ()
text = "a\u3000 b\t\nc\r\n"
text = ''.join(text.splitlines())
Use str.translate ()
text = "a\u3000 b\t\nc\r\n"
table = str.maketrans({
'\u3000': '',
' ': '',
'\t': ''
})
text = text.translate(table)
If there are many other characters you want to delete, it is easier to write the argument of str.maketrans ()
in comprehension notation.
text = "a\u3000 b\t\nc\r\nd\x0ce\x0bf"
table = str.maketrans({
v: '' for v in '\u3000 \x0c\x0b\t' #Or['\u3000',' ','\x0c','\x0b','\t']
})
text = text.translate(table)
I've given you some advice on how to use regular expressions in comments, so I'll summarize them below. Thank you for your comment.
import re
#Delete line breaks, tabs, spaces, etc. all at once
text = "a\u3000\n\n b\t\nc\r\nd\x0ce\x0b\rf\r\n"
text = re.sub(r"\s", "", text)
#Line feed code (\r\n or\n) Delete only at once
text = "a\u3000\n\n b\t\nc\r\nd\x0ce\x0b\rf\r\n"
text = re.sub(r"[\r\n]", "", text)
#Delete some spaces (for example, full-width space, half-width space, tab) except line feed code at once
text = "a\u3000\n\n b\t\nc\r\nd\x0ce\x0b\rf\r\n"
text = re.sub(r"[\u3000 \t]", "", text)
Recommended Posts