I was in trouble because the character string in the PDF was strange

Problems faced

When I try to get a PDF character string, sometimes strange characters are included. Even if I try to find a matching place by searching, it doesn't work.

Copy and paste the following string into Notepad!

High height Mida Mt.

Isn't it different? It looked exactly the same on the console, so it took me a while to figure out why it didn't work.

The cause seems to be Unicode

I forgot how I searched and arrived, but I referred to here. https://qiita.com/korkewriya/items/e747253b715f41febfc4

Solved with the following code


#It is assumed that result already contains some character string
from unicodedata import normalize
result = normalize('NFKC', result)  #Unicode normalization

Recommended Posts

I was in trouble because the character string in the PDF was strange
I was in trouble because I couldn't push with heroku
I was in trouble because the behavior of docker container did not change
Find out how many each character is in the string.
[Golang] Check if a specific character string is included in the character string
I got lost in the maze
I participated in the ISUCON10 qualifying!
I wrote the queue in Python
I wrote the stack in Python
Get the query string (query string) in Django
I want to batch convert the result of "string" .split () in Python
The file name was bad in Python and I was addicted to import
[Introduction to Python] Thorough explanation of the character string type used in Python!
[Pandas] Expand the character string to DataFrame
I saved the scraped data in CSV!
I wrote the selection sort in C
I can't get the element in Selenium!
Escape curly braces in the format string
[PowerShell] Get the reading of the character string
I wrote the sliding wing in creation.
I stumbled on the character code when converting CSV to JSON in Python
Divides the character string by the specified number of characters. In Ruby and Python.
I was in vain because I couldn't get a send parent order with pybitflyer
I get a strange window when I use the open directory dialog in Tkinter