Points to note when making pandas read csv of excel output

If you want python to read it, I would like the character format to be utf-8, Since there are various reasons on the data output side, there are many cases where the receiving side must convert and read.

The csv output in the Windows & Excel environment is Shift JIS. .. .. So, with pandas,

import pandas as pd
dataset1 = pd.read_csv("hogehoge.csv",encoding="shift_jis")

If you do it, you may not be able to read it properly if you think it's OK and be careful.

`test.csv`


Yamada,1000
Sato,2000
Yamamoto,3000

I can read this,

`test2.csv`


1,Yamada,1000
2,Takahashi,2000
3,Black 﨑,3000

Without exception, I get the following error. .. ..

UnicodeDecodeError: 'shift_jis' codec can't decode byte 0xfb in position 0: illegal multibyte sequence

This is in test2.csv, ・ Hashigodaka "** Taka " ・ Tachisaki " Saki **" It is caused by the mixture of windows extension strings such as. In order to read such characters, the character code must be cp932.

encoding='cp932'

Because there is such a thing, because it is windows, if you read it with shift_jis, it is not conscious that it is OK, From the beginning, it was said that if you read it with cp932, you will not have to worry about unnecessary troubles.

import pandas as pd
dataset1 = pd.read_csv("hogehoge.csv",encoding="cp932")

The following site was very helpful. "Let's sort out the differences between Shift_JIS and Windows-31J (MS932)" http://weblabo.oscasierra.net/shift_jis-windows31j/