When I try to get a PDF character string, sometimes strange characters are included. Even if I try to find a matching place by searching, it doesn't work.
High height Mida Mt.
Isn't it different? It looked exactly the same on the console, so it took me a while to figure out why it didn't work.
I forgot how I searched and arrived, but I referred to here. https://qiita.com/korkewriya/items/e747253b715f41febfc4
#It is assumed that result already contains some character string
from unicodedata import normalize
result = normalize('NFKC', result) #Unicode normalization
Recommended Posts