[Natural language processing / NLP] How to easily perform back translation by machine translation in Python

Have you ever wanted to do ** reverse translation ** in Python for ** Data Augmentation **, such as in an NLP competition?

For example, Kaggle's Toxic Comment Classification Challenge uses this technique for its 1st place solution. https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/discussion/52557

In this article, I will explain how to easily reverse translate using ** machine translation ** in Python.

Reverse translation by machine translation Example of reverse translation by machine translation Quote: https://amitness.com/2020/05/data-augmentation-for-nlp/

Overview

With ** googletrans **, you don't need an API key and you can easily reverse translate.

Install googletrans

The environment assumes Python3.

$pip install googletrans

Reverse translation program

from googletrans import Translator

def BackTranslation(text, original_lang, via_lang):
    translator = Translator()
    return translator.translate(translator.translate(text, dest=original_lang).text, dest=via_lang).text

In the argument text, specify the original text, in original_lang, specify the original language, and in via_lang, specify the language you want to go through.

For the languages that can be specified for lang, refer to the following googletrans documentation. https://py-googletrans.readthedocs.io/en/latest/

Example of use

「The destiny of man is in his own soul.」 I will try to reverse translate the English sentence that says, via Japanese.

text = "The destiny of man is in his own soul."
BackTranslation(text, "en", "ja")

The return value (result of reverse translation) is as follows.

Results of reverse translation


'The fate of man lies in his own soul.'

Also, if you output the relayed language (Japanese), it will be as follows.

Relayed language


Human destiny lies in his own soul.

References

A Visual Survey of Data Augmentation in NLP https://amitness.com/2020/05/data-augmentation-for-nlp/

Googletrans: Free and Unlimited Google translate API for Python https://py-googletrans.readthedocs.io/en/latest/

Is reverse translation an alchemist of machine translation? http://deeplearning.hatenablog.com/entry/back_translation

Recommended Posts

[Natural language processing / NLP] How to easily perform back translation by machine translation in Python
[Python] Try to classify ramen shops by natural language processing
3. Natural language processing with Python 1-2. How to create a corpus: Aozora Bunko
Python: Deep Learning in Natural Language Processing: Basics
Python: Natural language processing
How to measure processing time in Python or Java
Compare how to write processing for lists by language
How to develop in Python
[Job change meeting] Try to classify companies by processing word-of-mouth in natural language with word2vec
[Python] How to do PCA in Python
How to collect images in Python
100 Language Processing Knock Chapter 1 in Python
How to use SQLite in Python
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
How to use Mysql in python
How to wrap C in Python
How to use ChemSpider in Python
Dockerfile with the necessary libraries for natural language processing in python
How to use PubChem in Python
Summarize how to preprocess text (natural language processing) with tf.data.Dataset api
100 Language Processing Knock Chapter 1 by Python
Preparing to start natural language processing
How to handle Japanese in Python
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)
How to separate pipeline processing code into files by spider in Scrapy
How to sort by specifying a column in the Python Numpy array.
How to deal with old Python versions in Cloud9 made by others
[Introduction to Python] How to use class in Python?
How to access environment variables in Python
How to dynamically define variables in Python
How to do R chartr () in Python
[Itertools.permutations] How to put permutations in Python
How to work with BigQuery in Python
3. Natural language processing with Python 2-1. Co-occurrence network
How to display multiplication table in python
How to extract polygon area in Python
[WIP] Pre-processing memo in natural language processing
How to check opencv version in python
Natural Language: Machine Translation Part2 --Neural Machine Translation Transformer
How to switch python versions in cloud9
How to adjust image contrast in Python
How to use __slots__ in Python class
How to dynamically zero pad in Python
How to use regular expressions in Python
How to display Hello world in python
How to use is and == in Python
How to write Ruby to_s in Python
How to take multiple arguments when doing parallel processing using multiprocessing in python
How to plot multiple fits images side by side in galactic coordinates using python
How to read all the classes contained in * .py in the directory specified by Python
[Python] [Natural language processing] I tried Deep Learning ❷ made from scratch in Japanese ①
Entry where Python beginners do their best to knock 100 language processing little by little
How to study Python 3 engineer certification basic exam by Python beginner (passed in August 2020)
I made a module in C language to filter images loaded by Python
Python: Deep learning in natural language processing: Implementation of answer sentence selection system
[Python] How to easily drop a child process started by multiprocess from another process
3. Natural language processing with Python 3-3. A year of corona looking back at TF-IDF
[python] How to display list elements side by side
Try the book "Introduction to Natural Language Processing Application Development in 15 Steps" --Chapter 3 Step 09 Memo "Identifier by Neural Network"
How to use the C library in Python
How to receive command line arguments in Python