Character code | Character code of "A" | len('Ah') |
---|---|---|
unicode | \u3042 | 1 |
euc-jp | \xa4\xa2 | 2 |
utf-8 | \xe3\x81\x82 | 3 |
conversion | code | Return value |
---|---|---|
euc-jp → unicode | unicode('\xa4\xa2','euc-jp') | u'\u3042' |
euc-jp → unicode | '\xa4\xa2'.decode('euc-jp') | u'\u3042' |
utf-8 → unicode | unicode('\xe3\x81\x82','utf-8') | u'\u3042' |
utf-8 → unicode | '\xe3\x81\x82'.decode('utf-8') | u'\u3042' |
unicode → euc-jp | u'\u3042'.encode('euc-jp') | '\xa4\xa2' |
unicode → utf-8 | u'\u3042'.encode('utf-8') | '\xe3\x81\x82' |
utf-8 → unicode → euc-jp | unicode('\xe3\x81\x82','utf-8').encode('euc-jp') | '\xa4\xa2' |
utf-8 → unicode → euc-jp | '\xe3\x81\x82'.decode('utf-8').encode('euc-jp') | '\xa4\xa2' |
euc-jp → unicode → utf-8 | unicode('\xa4\xa2','euc-jp').encode('utf-8') | '\xe3\x81\x82' |
euc-jp → unicode → utf-8 | '\xa4\xa2'.decode('euc-jp').encode('utf-8') | '\xe3\x81\x82' |
Unicode
Unicode time
>>> string=u'Ah'
>>> string
u'\u3042'
EUC-JP -> Unicode
EUC-At the time of JP
>>> string='Ah'
>>> string
'\xa4\xa2'
>>> len(string)
2
Wrong
>>> unicode(string)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)
Positive
>>> unicode(string,'euc-jp')
u'\u3042'
UTF -> Unicode
UTF-At 8
>>> string='Ah'
>>> string
'\xe3\x81\x82'
>>> len(string)
3
Wrong
>>> unicode(string)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
Positive
>>> unicode(string,'utf-8')
u'\u3042'
Conversion type | Function or method |
---|---|
Non-unicode string → unicode string | unicode([Non-unicode string], [Character code], [errors='strict']) |
Non-unicode string → unicode string | Non-unicode string.decode([Character code], [errors='strict']) |
unicode string → non-unicode string | unicode string.encode([Character code], [errors='strict']) |
errors
unicode | encode | decode | errors | Contents |
---|---|---|---|---|
○ | ○ | ○ | strict | Throw UnicodeDecodeError |
○ | ○ | ○ | replace | U+FFFD,Added ‘REPLACEMENT CHARACTER’ |
○ | ○ | ○ | ignore | Remove characters from the resulting Unicode string |
× | ○ | × | xmlcharrefreplace | Use XML character references |
Recommended Posts