Python2 has two string types, str
and ʻunicode. Normally you should use the ʻunicode
type.
str
is more correct (I think) to be a byte string than a string.
aiueo = 'AIUEO'
#At this time, aiueo becomes str type
len(aiueo)
#How many will depend on the encoding of the file
#For example, utf-If it is 8, it becomes 15, and shift_If it is jis, it will be 10.
The ʻunicode` type records characters as UCS-2 (or UCS-4). To use characters in the UCS-4 range, you need to specify it when compiling Python.
aiueo = u'AIUEO'
#At this time aiueo becomes unicode type
len(aiueo)
#Become 5 in any environment
You can convert it to ʻunicodetype by calling the
decode method of type
str. Conversely, you can convert it to a
str type by calling the ʻencode
method of the ʻunicode` type.
aiueo = u'AIUEO'
aiueo_utf8 = aiueo.encode('utf-8')
aiueo_shiftjis = aiueo.encode('shift_jis')
print isinstance(aiueo_utf8, str) # True
print isinstance(aiueo_shiftjis, str) # True
print len(aiueo_utf8) # 15
print len(aiueo_shiftjis) # 10
print len(aiueo_utf8.decode('utf-8')) # 5
print len(aiueo_shiftjis.decode('shift_jis')) # 5
Passing an incorrect encoding to the decode
method will result in a ʻUnicodeDecodeError` error.
aiueo_shiftjis.decode('utf-8') #UnicodeDecodeError error
Python3
It seems that str
is changed to bytes
and ʻunicode is changed to
str` in Python3 because it is confusing.
Recommended Posts