First from the conclusion

Encode when reading with UTF-8 BOM in Python

Specify **'utf_8_sig' **.

Example of reading a file io.opne(filename, "r", encoding="utf_8_sig")

Convert from str type (UTF-8) to unicode type uni_string = unicode(str_string, 'utf_8_sig')

Introduction to scratching

I was a little addicted to reading UTF-8 in Python, so I'll write it down to prevent forgetting.

What is BOM

UTF-8 may have a BOM (Byte order mark). This is an identifier that the encoding is UTF-8. The first 3 bytes of the file are'EF BB BF'.

The trouble is that there are UTF-8 with BOM and UTF-8 without BOM.

BOM is added to UTF-8 in Windows'Notepad' and Excel. Linux and Mac basically seem to handle UTF-8 without BOM.

Handle files with UTF-8 BOM in Python

This time I wanted to load the csv edited in Excel, so I had to consider the BOM.

I thought, I wrote it in the document.

Official document UTF-8 with BOM mark

If you set the encoding codec to'utf_8_sig', If there is a BOM, it will be skipped and read. If there is no BOM, it can be read as UTF-8 as it is.

Sample program

`ImportCSV.py`



import io

with io.open('sample.csv', 'rt', encoding='utf_8_sig') as f:
    print(f.readlines())

Finally

Character codes tend to be addictive in Python, but if you can handle character codes properly when converting to unicode type, you will not have to worry about character codes.

Recommended Posts

Open UTF-8 with BOM in Python

UTF8 text processing in python

Scraping with selenium in Python

Working with LibreOffice in Python

Debugging with pdb in Python

Working with sounds in Python

Scraping with Selenium in Python

Scraping with Tor in Python

Tweet with image in Python

Combined with permutations in Python

Number recognition in images with Python

Testing with random numbers in Python

GOTO in Python with Sublime Text 3

Working with LibreOffice in Python: import

Scraping with Selenium in Python (Basic)

CSS parsing with cssutils in Python

Numer0n with items made in Python

Use rospy with virtualenv in Python3

Use Python in pyenv with NeoVim

Heatmap with Dendrogram in Python + matplotlib

Hash method (open address method) in Python

Read files in parallel with Python

Password generation in texto with python

Use OpenCV with Python 3 in Window

Until dealing with python in Atom

Get started with Python in Blender

Working with DICOM images in Python

Write documentation in Sphinx with Python Livereload

Get additional data in LDAP with python

Spiral book in Python! Python with a spiral book! (Chapter 14 ~)

Try logging in to qiita with Python

Stress Test with Locust written in Python

Python3> in keyword> True with partial match?

Get exchange rates from open exchange rates in Python

Exclusive control with lock file in Python

Device monitoring with On-box Python in IOS-XE

Try working with binary data in Python

Draw Nozomi Sasaki in Excel with python

Tips for dealing with binaries in Python

Display Python 3 in the browser with MAMP

Page cache in Python + Flask with Flask-Caching

Post Test 3 (Working with PosgreSQL in Python)

How to work with BigQuery in Python

Playing card class in Python (with comparison)

Dealing with "years and months" in Python

Process multiple lists with for in Python

Replace non-ASCII with regular expressions in Python

Connect with mysql.connector with ssh tunnel in Python 3.7

One liner webServer (with CGI) in python

Get Started with TopCoder in Python (2020 Edition)

Easy image processing in Python with Pillow

To work with timestamp stations in Python

Call APIGateWay with APIKey in python requests

Read text in images with python OCR

Introduced sip-4.14 in python3.2.2 environment with MacOS 10.7.4

Python in optimization

CURL in python

FizzBuzz with Python3

Metaprogramming in Python

Python 3.3 in Anaconda

Geocoding in python