There are two types: coded character set and character encoding method.
Character and code point mapping
--Example: unicode, ascii --Default unicode in python3
>>> hex(ord("Ah"))
'0x3042' #unicode"Ah"Code point
Character operation method, implementation method
--Example: utf-8, shift-jis, euc-jp -"A" becomes a different byte sequence when encoded with utf-8 and utf-16
--The literal of 'ah'
is a unicode string
--When converting to bytes 'Ah'.encode ('utf-8')
--When converting bytes back to unicode, bytes_moji.decode ('utf-8')
--The function is different when writing to a file with unicode and when writing with byte and reading.
-Handling of character codes in Python
--In python2, byte conversion is done by default ascii --When trying to convert a Japanese string to a file, an error occurs when trying to encode to byte with the default ascii --must be specified to encode with utf-8
Recommended Posts