Python2 has two string types, str and ʻunicode. Normally you should use the ʻunicode type.
str is more correct (I think) to be a byte string than a string.
aiueo = 'AIUEO'
#At this time, aiueo becomes str type
len(aiueo)
#How many will depend on the encoding of the file
#For example, utf-If it is 8, it becomes 15, and shift_If it is jis, it will be 10.
The ʻunicode` type records characters as UCS-2 (or UCS-4). To use characters in the UCS-4 range, you need to specify it when compiling Python.
aiueo = u'AIUEO'
#At this time aiueo becomes unicode type
len(aiueo)
#Become 5 in any environment
You can convert it to ʻunicodetype by calling thedecode method of type str. Conversely, you can convert it to a str type by calling the ʻencode method of the ʻunicode` type.
aiueo = u'AIUEO'
aiueo_utf8 = aiueo.encode('utf-8')
aiueo_shiftjis = aiueo.encode('shift_jis')
print isinstance(aiueo_utf8, str) # True
print isinstance(aiueo_shiftjis, str) # True
print len(aiueo_utf8) # 15
print len(aiueo_shiftjis) # 10
print len(aiueo_utf8.decode('utf-8')) # 5
print len(aiueo_shiftjis.decode('shift_jis')) # 5
Passing an incorrect encoding to the decode method will result in a ʻUnicodeDecodeError` error.
aiueo_shiftjis.decode('utf-8') #UnicodeDecodeError error
Python3
It seems that str is changed to bytes and ʻunicode is changed to str` in Python3 because it is confusing.
Recommended Posts