Problems faced

When I try to get a PDF character string, sometimes strange characters are included. Even if I try to find a matching place by searching, it doesn't work.

Copy and paste the following string into Notepad!

High height Mida Mt.

Isn't it different? It looked exactly the same on the console, so it took me a while to figure out why it didn't work.

The cause seems to be Unicode

I forgot how I searched and arrived, but I referred to here. https://qiita.com/korkewriya/items/e747253b715f41febfc4

Solved with the following code


#It is assumed that result already contains some character string
from unicodedata import normalize
result = normalize('NFKC', result)  #Unicode normalization

Recommended Posts

I was in trouble because the character string in the PDF was strange

I was in trouble because I couldn't push with heroku

I was in trouble because the behavior of docker container did not change

Find out how many each character is in the string.

[Golang] Check if a specific character string is included in the character string

I got lost in the maze

I participated in the ISUCON10 qualifying!

I wrote the queue in Python

I wrote the stack in Python

Get the query string (query string) in Django

I want to batch convert the result of "string" .split () in Python