Purpose of this article

Create a function to easily correct half-width and full-width notation fluctuations.

Preparation

Prepare the characters before and after the conversion.

abc_half = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
abc_full = "ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ"

digit_half = "0123456789"
digit_full = "０１２３４５６７８９"

katakana_half = "Yayuyo Sashisuseso Sashisuseso Sashisuseso Sashisuseso Sashisuseso Sashisuseso Nani Nino Hahifu Hehomamum Memomo"
katakana_full = "Aiueokakikukekosashisusesotachitsutetonaninunenohahifuhehomamimumeyayuyorarirurerowon"

punc_half = "!\#$%&\()*+,-./:;<=>?@[\\]^_`{|}~"
punc_full = "！＃＄％＆＼（）＊＋，－．／：；＜＝＞？＠［＼＼］＾＿｀｛｜｝～"

Since the plosive sound of half-width katakana expresses one character with two characters, create a conversion table separately from the others.


tmp01 = "Gagging, Going, Going, Going, Going, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go, Go"
tmp02 = "Gagigugegozajizuzezodajizudedobababibbebopapipupepo"

transtable02 = {}
for i in range(len(tmp02)):
    be = tmp01[i*2:i*2+2]
    af = tmp02[i]
    transtable02[be] = af

In the function clean_text,transtable01 = str.maketrans (before, after)creates a translation table and applies it withtext = text.translate (transtable01).


def clean_text(text):
    text = str(text).replace("\u3000", " ") #Full-width space to half-width

    before = abc_full + digit_full + katakana_half + punc_full
    after = abc_half + digit_half + katakana_full + punc_half

    transtable01 = str.maketrans(before, after)
    text = text.translate(transtable01)
    text = text.translate(transtable02)

    return text

How to use


text = "Memo Nara Rirure,-. / :; qrgegozajizezodaji"
clean_text(text)

>>>Memo Yayuyora Rirure+,-./:qr Gegozajizuzezodaji

that's all!

Afterword

I think there are other notational fluctuations in Japanese, such as okurigana and Chinese numerals, so I hope to add more.

reference

[Full-width ⇔ half-width] Recommended library for adjusting Japanese writing fluctuations in Python [python] Create a list of various character types

Recommended Posts

Correct half-width and full-width notation fluctuations in Python

Full-width and half-width processing of CSV data in Python

Class notation in Python

Convert CIDR notation in Python

Stack and Queue in Python

Unittest and CI in Python

Difference between list () and [] in Python

Difference between == and is in python

Included notation in Python function arguments

Manipulate files and folders in Python

Assignments and changes in Python objects

Check and move directories in Python

Ciphertext in Python: IND-CCA2 and RSA-OAEP

Hashing data in R and Python

Function synthesis and application in Python

Export and output files in Python

Reverse Hiragana and Katakana in Python2.7

Reading and writing text in Python

[GUI in Python] PyQt5-Menu and Toolbar-

Create and read messagepacks in Python

Overlapping regular expressions in Python and Java

Display LaTeX notation formulas in Python, matplotlib

Differences in authenticity between Python and JavaScript

Notes using cChardet and python3-chardet in Python 3.3.1.

Modules and packages in Python are "namespaces"

Avoid nested loops in PHP and Python

Differences between Ruby and Python in scope

AM modulation and demodulation in Python Part 2

difference between statements (statements) and expressions (expressions) in Python

Eigenvalues and eigenvectors: Linear algebra in Python <7>

How to write the correct shebang in Perl, Python and Ruby scripts

Implementation module "deque" in queue and Python

Line graphs and scale lines in python

Implement FIR filters in Python and C

Differences in syntax between Python and Java

Check and receive Serial port in Python (Port check)

Search and play YouTube videos in Python

Difference between append and + = in Python list

Difference between nonlocal and global in Python

Write O_SYNC file in C and Python

How to put a half-width space before letters and numbers in Python.

Dealing with "years and months" in Python

Read and write JSON files in Python

Easily graph data in shell and Python

Private methods and fields in python [encryption]

Find and check inverse matrix in Python

Linear Independence and Basis: Linear Algebra in Python <6>

Call sudo in Python and autofill password

Differences in multithreading between Python and Jython

Module import and exception handling in python

How to use is and == in Python

Project Euler # 1 "Multiples of 3 and 5" in Python

[Python] A function that aligns the width by inserting a space in text that has both full-width and half-width characters.

Organize python modules and packages in a mess

Accurately correct Android clock with adb and python

How to generate permutations in Python and C ++

Python variables and data types learned in chemoinformatics

Receive and display HTML form data in Python

Prime number enumeration and primality test in Python

[Python] Swapping rows and columns in Numpy data

[python] Difference between variables and self. Variables in class