There were some (personally) pitfalls when reading a gzip-compressed text file, so I've summarized them.
The default file read mode is binary, so the code below will read each line in binary.
import gzip
with gzip.open("test.txt.gz", "r") as fi:
for line in fi:
print(line)
To read it as text, read it in'rt'mode when the file is opened.
import gzip
with gzip.open("test.txt.gz", "rt") as fi:
for line in fi:
print(line)
Even if you specify the encoding by default, it will be ignored, so you need to specify the encoding again when opening the file. In other words, in the end, it can be read as a text file with the following code.
import gzip
with gzip.open("test.txt.gz", "rt", "utf_8") as fi:
for line in fi:
print(line)
Probably the same with other compressed files, but I haven't tried it.
Recommended Posts