I was mentally tired and wanted to get approval easily, so I used it at the NLP competition SIGNATE Student Cup 2020 that I recently participated in. Introducing a text data inflating script in Python using Translate. There are already many similar articles, so it's not new at all.
I couldn't find a handy one, but I decided to use this from kaggle's dataset. Wikipedia Movie Plots
For the time being, I will introduce a script that translates English sentences into Japanese and then translates them into English.
from googletrans import Translator
def retranslator(text, lang):
'''After translating from English to another language, translate again to English and aim to inflate the data
'''
translator = Translator()
translated = translator.translate(text, src='en', dest=lang).text
retranslated = translator.translate(translated, src=lang, dest='en').text
return translated, retranslated
Like this.
To explain it really simply, text is the string you want to translate, src is the language code of the original language, and dest is the language code of the translation destination.
For the language code of google translate, please refer to Language Support at the following URL and choose the one you like.
By the way, it is expected that the translation accuracy will be better in a relatively major language, so when using it for the purpose of inflating data, I think it is safer to choose a major language as it is. In fact, even in competitions, it seems that there are many cases where French, German, Spanish, Japanese, Chinese, etc. are selected and retranslated and inflated.
Execution code
import pandas as pd
from googletrans import Translator
data = pd.read_csv('./wiki_movie_plots_deduped.csv')
def retranslator(text, lang):
'''After translating from English to another language, translate again to English and aim to inflate the data
'''
translator = Translator()
translated = translator.translate(text, src='en', dest=lang).text
retranslated = translator.translate(translated, src=lang, dest='en').text
return translated, retranslated
for i in range(5):
row = data.iloc[i]
translated, retranslated = retranslator(row['Plot'], 'ja')
result = {
'Original': row['Plot'],
'translated': translated,
'retranslated': retranslated
}
for key, val in result.items():
print(key)
print(val)
print('')
output
Original A bartender is working at a saloon, serving drinks to customers. After he fills a stereotypically Irish man's bucket with beer, Carrie Nation and her followers burst inside. They assault the Irish man, pulling his hat over his eyes and then dumping the beer over his head. The group then begin wrecking the bar, smashing the fixtures, mirrors, and breaking the cash register. The bartender then sprays seltzer water in Nation's face before a group of policemen appear and order everybody to leave.[1]
translated A bartender works in the salon and serves drinks to customers. Carrie Nation and her followers jumped in after he filled a typical Irish bucket with beer. They attacked the Irish, pulled his hat over his eyes, and then dumped the beer over his head. After that, the group destroys the bars, the equipment, the mirrors, and the cashiers begin to break. The bartender then sprays Selzer water on Nation's face, and then a group of police officers appear and order everyone to leave. [1]
retranslated A bartender works at the salon and serves drinks to customers. Carry Nation and her followers plunge into him after he filled a typical Irish bucket with beer. They attacked the Irish, pulled his hat over his eyes, and then threw the beer over his head. After that, the group destroys the bar, destroys equipment, mirrors, and begins to destroy the cash register. The bartender then sprays Seltzer water on Nation's face, then a group of policemen appears and orders everyone to leave. [1]
Original The moon, painted with a smiling face hangs over a park at night. A young couple walking past a fence learn on a railing and look up. The moon smiles. They embrace, and the moon's smile gets bigger. They then sit down on a bench by a tree. The moon's view is blocked, causing him to frown. In the last scene, the man fans the woman with his hat because the moon has left the sky and is perched over her shoulder to see everything better.
translated The moon drawn with a smile hangs down in the park at night. A young couple walking over the fence learns about railings and looks up. The moon smiles. They hug and the moon smiles bigger. Then they sat on a bench by the tree. The view of the moon was obstructed and he frowned. In the final scene, the moon leaves the sky and everything is clearly visible over the shoulder, so the man wears a hat and incites the woman.
retranslated The moon drawn with a smile hangs in the park at night. A young couple walking over the fence learns about the handrail and looks up. The moon smiles. They hug and make the moon smile bigger. Then they sat on a bench by the tree. The moon's view was blocked and he frowned. In the last scene, the man leaves the sky and sees everything over his shoulder, so men wear hats to incite women.
Original The film, just over a minute long, is composed of two shots. In the first, a girl sits at the base of an altar or tomb, her face hidden from the camera. At the center of the altar, a viewing portal displays the portraits of three U.S. Presidents—Abraham Lincoln, James A. Garfield, and William McKinley—each victims of assassination. In the second shot, which runs just over eight seconds long, an assassin kneels feet of Lady Justice.
translated This movie is a little over a minute long and consists of two shots. Initially, the girl sits at the foot of an altar or tomb, with her face hidden from the camera. The viewing portal in the center of the altar shows portraits of the three victims of the assassination, Abraham Lincoln, James A. Garfield, and William McKinley. The second shot takes just over 8 seconds and kneels down on the goddess of justice.
retranslated This movie is a little over a minute and consists of two shots. Initially, the girl sits at the base of the altar or grave, with her face hidden from the camera. A viewing portal in the center of the altar shows portraits of three US presidents, Abraham Lincoln, James A. Garfield and William McKinley, who are victims of assassination. The second shot is just over 8 seconds and kneels on the feet of the goddess of justice.
Original Lasting just 61 seconds and consisting of two shots, the first shot is set in a wood during winter. The actor representing then vice-president Theodore Roosevelt enthusiastically hurries down a hillside towards a tree in the foreground. He falls once, but rights himself and cocks his rifle. Two other men, bearing signs reading "His Photographer" and "His Press Agent" respectively, follow him into the shot; the photographer sets up his camera. "Teddy" aims his rifle upward at the tree and fells what appears to be a common house cat, which he then proceeds to stab. "Teddy" holds his prize aloft, and the press agent takes notes. The second shot is taken in a slightly different part of the wood, on a path. "Teddy" rides the path on his horse towards the camera and out to the left of the shot, followed closely by the press agent and photographer, still dutifully holding their signs.
translated It consists of two shots in just 61 seconds, and during the winter, the first shot is taken in the woods. The actor representing Theodore Roosevelt, then Vice President, is enthusiastically rushing down the hillside towards the tree in the foreground. He collapses once, but gives himself rights and shoots his rifle. The other two men chase after him, labeled "his photographer" and "his reporter's agent," respectively. The photographer sets up the camera. "Teddy" points the rifle at a tree, defeats and pierces what looks like a normal domestic cat. "Teddy" holds his award high, and reporters take notes. The second shot is taken on a path in a slightly different part of the forest. "Teddy" heads his horse's path towards the camera and out to the left of the shot, followed closely by the press agent and photographer, still holding the sign faithfully.
retranslated Consisting of two shots of only 61 seconds, during the winter the first shot is taken in the woods. At the time, the actor, who represented Vice President Theodore Roosevelt, enthusiastically rushed down the hill toward the trees in front. He falls once, but empowers himself and shoots his rifle. Two other men chase him, marking them with "his photographer" and "his reporter agent" respectively. The cameraman sets up the camera. The "teddy" points its rifle at a tree, defeats and sticks what looks like a normal domestic cat. "Teddy" has raised his award high and reporters take notes. The second shot is taken on a path in a slightly different part of the forest. "Teddy" heads his horse towards the camera and out to the left of the shot, closely followed by the press agent and the photographer, still faithfully holding the autograph.
Original The earliest known adaptation of the classic fairytale, this films shows Jack trading his cow for the beans, his mother forcing him to drop them in the front yard, and beig forced upstairs. As he sleeps, Jack is visited by a fairy who shows him glimpses of what will await him when he ascends the bean stalk. In this version, Jack is the son of a deposed king. When Jack wakes up, he finds the beanstalk has grown and he climbs to the top where he enters the giant's home. The giant finds Jack, who narrowly escapes. The giant chases Jack down the bean stalk, but Jack is able to cut it down before the giant can get to safety. He falls and is killed as Jack celebrates. The fairy then reveals that Jack may return home as a prince.
translated The earliest known adaptation of classic fairy tales, this movie forces Jack to exchange his cows for beans, his mother forces him to drop them in the vestibule, and upstairs. Shows a forced bean. When he is asleep, Jack is visited by a fairy. The fairy gives a glimpse of what lies ahead as he climbs the bean stalk. In this version, Jack is the son of the abdicated king. When Jack wakes up, he discovers a bean tree growing and he climbs to the top of the giant's house. The giant finds Jack to escape slightly. The giant chases Jack on the bean stalk, but Jack can cut it off before the giant is safe. When Jack celebrates, he falls and is killed. The fairy reveals that Jack is going home as a prince.
retranslated The earliest known adaptation of the classic fairy tale, this film shows Jack exchanging his cows for beans, his mother forcing him to drop them in the front yard, and upstairs. Shows forced beag. When he is asleep, Jack is visited by fairies. The fairy gives a glimpse of what he is waiting for when he climbs the bean stalk. In this version, Jack is the son of the deposed King. When Jack wakes up, he finds a bean tree growing and he climbs to the top of the giant's house. The giant finds Jack who escapes slightly. The giant chases Jack for the bean stalk, but Jack can chop it off before the giant is safe. When Jack celebrates, he falls and is killed. The fairy reveals that Jack will return home as a prince.
The output result is not so easy to see, but I don't have much mental power to pay attention to such details, so please forgive me.
Did you know which movie plot from the translated text? If you are interested, please see the title by yourself except for the kaggle dataset.
Japanese translation? I feel that there are some parts that become, but the retranslated one is n
Now you can use the technique often used in NLP competitions to inflate data by expressing sentences with the same meaning in slightly different expressions. The drawback is that it depends on the quality of the translation, but I think this is a relatively easy and reasonably effective method, so please give it a try.
Recently (although it was about a week ago), I participated in SIGNATE Student Cup 2020. There, my mental strength was reduced. Click here for participation (style that does not forget to advertise) [SIGNATE Student Cup 2020 [Prediction Division] Participation (pop-ketle version)](https://pop-ketle.hatenablog.com/entry/2020/08/28/ 130451)
So, I'm writing while dividing into parts Let's make an app that can search similar images with Python and Flask Part2 has already been updated I want you to wait for a while. Actually, how should we develop the app next time, and should we properly research and write Flask's commentary? The current situation is that I don't have a lot of time to write an article because I'm worried about the next initiative and there are some other things I have to do. (I wrote this article for an hour because I wanted to get a feeling of doing my best easily.) Goodbye everyone for a while, please take good care of your mental strength.
Recommended Posts