Convert DataFrame column names from Japanese to English using Googletrans

Introduction

When dealing with data frames in pandas, it may be inconvenient if the column name is Japanese. In such a case, it is troublesome to convert the Japanese column name manually, so I am eager to make it easier by using googletrans.

Environment and version

About Googletrans

Googletrans is a python library for using the google translate API. For Google colaboratory, you can install it with the following code.

!pip install googletrans==4.0.0-rc1

** * As of January 12, googletrans 3.0.0 will be installed if you install without specifying the version. It doesn't work well with this version. ** ** Reference: https://qiita.com/_yushuu/items/83c51e29771530646659

How to use googletrans

from googletrans import Translator
columns = df.columns

translator = Translator()
str = 'Hello'
print(translator.translate(str, dest='en').text)

Output result

Hello

The default for dest is English, so you can omit dest ='en'. Although it deviates from the purpose, it is possible to translate into other languages ​​by changing the dest.

print(translator.translate(str, dest='fr').text)

Output result

Bonjour

Convert from Japanese column to English column

It's finally the main subject. We will convert from Japanese to English columns. First, prepare the data frame.

Data frame preparation

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(25).reshape(5, 5),
                  columns=['Customer ID', 'Store ID', 'Quantity', 'price', 'Store area'])
df.head()

image.png

The data frame is ready. If it is a Japanese column, it is troublesome such as an error occurs when training with lightGBM. Let's convert it to English.

English conversion using Googletrans

eng_columns = {}
columns = df.columns
translator = Translator()

for column in columns:
    eng_columns[column] = translator.translate(column).text

print(eng_columns)

Output result

{'Customer ID': 'Customer ID', 'Store ID': 'Store ID', 'Quantity': 'Quantity', 'price': 'price', 'Store area': 'Store area'}

I was able to convert it to English safely. However, if it is left as it is, there will be spaces and it will be annoying. Implement the code to convert whitespace to underscores.

eng_columns = {}
columns = df.columns
translator = Translator()

for column in columns:
    eng_column = translator.translate(column).text
    eng_column = eng_column.replace(' ', '_')
    eng_columns[column] = eng_column

df.rename(columns=eng_columns, inplace=True)

image.png

I was able to make it into an English column safely.

Recommended Posts

Convert DataFrame column names from Japanese to English using Googletrans
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
Convert some Japanese names to antonyms
[Python3] I want to generate harassment names from Japanese!
Function to convert Excel column to number
Get one column from DataFrame with DataFrame
Convert list to DataFrame with python
Convert from pdf to txt 2 [pyocr]
Program to convert Japanese to station name
How to convert from .mgz to .nii.gz
From Python to using MeCab (and CaboCha)
How to easily convert format from Markdown
Convert from PDF to CSV with pdfplumber
Convert from katakana to vowel kana [python]
Replace column names / values with pandas dataframe
Create a dataframe from excel using pandas
Convert from Markdown to HTML in Python