Full-width and half-width processing of CSV data in Python

How to unify data with katakana, symbols, alphabets and numbers mixed with full-width and half-width characters.

reference:

https://qiita.com/shakechi/items/d12641d6cad01479785f

Since it is troublesome, when CSV is opened with pandas, I made it a function so that full-width and half-width processing can be performed for each column. Just put the column name in the list of columns = [] and it's OK.

What to process: Make katakana, symbols (spaces, etc.) and numbers half-width.


#Pre-install pip install jaconv with a terminal or command line tool.
import jaconv


def shori(column):
    list=  df[column].values.tolist()
    new_list = []
    
    for li in list:
        li = jaconv.z2h(li,digit=True, ascii=True,kana=True)
        new_list.append(li)
        
    df[column] = new_list
    
    return df[column]

##Put the column name you want to process in the list.
columns = []

#Turn with for.
for column in columns:
    shori(column)

Recommended Posts

Full-width and half-width processing of CSV data in Python

Correct half-width and full-width notation fluctuations in Python

Summary of date processing in Python (datetime and dateutil)

Hashing data in R and Python

processing to use notMNIST data in Python (and tried to classify it)

Data input / output in Python (CSV, JSON)

Easily graph data in shell and Python

Separation of design and data in matplotlib

Csv in python

Status of each Python processing system in 2020

Project Euler # 1 "Multiples of 3 and 5" in Python

Data analysis: Easily apply descriptive and inference statistics to CSV data in Python

I have 0 years of programming experience and challenge data processing with python

Plot CSV of time series data with unixtime value in Python (matplotlib)

Python: Preprocessing in machine learning: Handling of missing, outlier, and imbalanced data

Python variables and data types learned in chemoinformatics

Receive and display HTML form data in Python

View the result of geometry processing in Python

[Python] Swapping rows and columns in Numpy data

Real-time visualization of thermography AMG8833 data in Python

Reading and writing CSV and JSON files in Python

The story of reading HSPICE data in Python

Y / n processing in bash, python and Go

A well-prepared record of data analysis in Python

Explanation of edit distance and implementation in Python

Speed evaluation of CSV file output in Python

Example of reading and writing CSV with Python

File processing in Python

Multithreaded processing in python

Text processing in Python

Queue processing in Python

Various processing of Python

[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]

plot the coordinates of the processing (python) list and specify the number of times in draw ()

[Python] How to name table data and output it in csv (to_csv method)

"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"

Summary of tools needed to analyze data in Python

Calculation of standard deviation and correlation coefficient in Python

Power BI visualization of Salesforce data entirely in Python

List of Python libraries for data scientists and data engineers

Collectively register data in Firestore using csv file in Python

Difference between Ruby and Python in terms of variables

[python] Calculation of months and years of difference in datetime

Performance verification of data preprocessing in natural language processing

Not being aware of the contents of the data in python

List of Python code used in big data analysis

Let's use the open data of "Mamebus" in Python

Python asynchronous processing ~ Full understanding of async and await ~

Process csv data with python (count processing using pandas)

I made a program in Python that reads CSV data of FX and creates a large amount of chart images

Overview of generalized linear models and implementation in Python

Sample of getting module name and class name in Python

Overview of natural language processing and its data preprocessing

Compare read / write speed and capacity of csv, pickle, joblib, parquet in python environment

Consolidate a large number of CSV files in folders with python (data without header)

Check the processing time and the number of calls for each process in python (cProfile)

Handle Ambient data in Python

UTF8 text processing in python

Until you get daily data for multiple years of Japanese stocks and save it in a single CSV (Python)

Display UTM-30LX data in Python

Asynchronous processing (threading) in python