LF, CR, CRLF are mixed in the received text file, When you need to replace CRLF with CR.
I will leave it as a memorandum because it did not come out even if I googled other than.
Python3.7
CRLF exists at the end of the second line (^ M is CR). I want to replace this with another character.
$ cat crlf.txt
1 2 3
4 5 6^M
7 8 9
If you open it as a normal text file in Python, that…? Will it be replaced?
with open('crlf.test') as f:
for line in f:
line = line.replace('\r\n','XXX') #Replace CRLF
print(line)
# 1 2 3
#
# 4 5 6
#
# 7 8 9
Apparently Python kindly removes the CR. Go and C # running on CentOS were the same, so it seems that it depends on the OS environment.
If you open it in binary mode, it won't be converted without permission. I thought, I tried it. Since the read data is a byte string, the step of decoding and converting it to a character string is necessary. Since it is binary, line breaks can only be identified after character string conversion ... I thought, it's easy because it recognizes the binary LF as a line break and returns one line.
with open('crlf.test','rb') as f:
for line in f:
line = line.decode().replace('\r\n','XXX')
print(line)
# 1 2 3
#
# 4 5 6XXX
# 7 8 9
Since CRLF is now XXX, there is no newline.
What happens if only CR is replaced?
with open('crlf.test','rb') as f:
for line in f:
line = line.decode().replace('\r','XXX')
print(line)
# 1 2 3
#
# 4 5 6XXX
#
# 7 8 9
Only CR is replaced properly, and the LF part of CRLF remains.
When dealing with text files with mixed line feed codes Open a text file in binary mode
Recommended Posts