7.1.1 Unicode
--The basic unit of computer memory is ** bytes **, which can represent 256 unique values using 8 bits. --Bits: The number of digits in a binary number (0 and 1, binary). 4-bit is a 4-digit number and 8-bit is an 8-digit binary number, which can represent 2 to the 4th power (16) and 2 to the 8th power (256), respectively. --Bytes: 1 byte with 8 bits. It is easier to understand if you use a hexadecimal number to represent one byte.
--Unicode is a developing international standard that seeks to define symbols in all literature and mathematics of the world's languages and other disciplines.
--The Python3 string is a Unicode string, not a byte string. --If you know the Unicode ID or name of a character, you can use it in a Python string. --Python's unicodedata module contains a bidirectional conversion function. --lookup (): Given a name (case insensitive), Unicode characters are returned. --name (): Given Unicode characters, uppercase names are returned.
>>> def unicode_test(v):
...     import unicodedata
#Extract the name from the letters
...     name=unicodedata.name(v)
#Extract the string from the name.
...     v2=unicodedata.lookup(name)
...     print("v=%s,name=%s,v2=%s"%(v,name,v2))
... 
>>> unicode_test("A")
v=A,name=LATIN CAPITAL LETTER A,v2=A
#Unicode ASCII symbols
>>> unicode_test("$")
v=$,name=DOLLAR SIGN,v2=$
#Unicode currency symbol
>>> unicode_test("\u00a2")
v=¢,name=CENT SIGN,v2=¢
>>> unicode_test("\u20ac")
v=€,name=EURO SIGN,v2=€
#Display of alternative symbols
>>> unicode_test("\u2603")
v=☃,name=SNOWMAN,v2=☃
>>> place = "cafe"
>>> place
'cafe'
>>> import unicodedata
>>> unicodedata.name('\u00e9')
'LATIN SMALL LETTER E WITH ACUTE'
>>> unicodedata.lookup('LATIN SMALL LETTER E WITH ACUTE')
'é'
#Specify a character string by code
>>> place = "caf\u00e9"
>>> place
'café'
#Specify a character string by name
>>> place = "caf\N{LATIN SMALL LETTER E WITH ACUTE}"
>>> place
'café'
>>> u="\N{LATIN SMALL LETTER U WITH DIAERESIS}"
>>> u
'ü'
#len()Counts Unicode characters, not bytes.
>>> len("&")
1
>>> len("\U0001f47b")
1
There are quite a few words that I don't remember hearing, such as encoding, decoding, and byte strings. Let's check each one.
"Introduction to Python3 by Bill Lubanovic (published by O'Reilly Japan)"
Recommended Posts