For example, when handling the character string "report" in Unicode When the character "po" is divided into a character and a voiced / semi-voiced sound mark, such as "ho + ゜"
join_dakuten.py
import unicodedata
unicodedata.normalize('NKFC', unistr)
If you bite this function "Ho + ゜" can be treated as "Po"
Recommended Posts