Use MeCab to translate sloppy sentences in a "slow" way.

When I'm surfing the net, I'm fluttering with taunts that jump out of the words Masakari and Ahat Ahat. If such stray bullets fly here, how should we spend for mental health?

One solution is to pass on such bad mental health sentences by converting them "slowly".

** Slow translation ** http://needtec.sakura.ne.jp/yukkuri_translator/

Let's say you were told, "Don't just make shit videos, this de morons." However, if it is converted to "Yeah, don't make it, this Dote-san.", You won't get angry.

Here, we will use MeCab to perform morphological analysis, and slowly convert sentences that are bad for mental health to eliminate discomfort, but rather to make them feel at home.

Source code

`yukkuri_translator.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-
import MeCab
import jctconv
import sys
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')
sys.stdout = codecs.getwriter('utf-8') (sys.stdout)


converter = {
    'eat' : 'Mush Mush',
    'eat' : 'Squint',
    'sleep' : 'Suyasu',
    'sleep' : 'Suyasu',
    'Sleep' : 'Suyasuyashi',
    'Sleep' : 'Suyasuyashi',
    'shit' : 'Yes Yes',
    'Stool' : 'Yes Yes',
    'Flight' : 'Yes Yes',
    'urine' : 'Shishi',
    'Piss' : 'Shishi',
    'Sun' : 'Sun',
    'Sanctions' : 'At all',
    'Confectionery' : 'Fair',
    'candy' : 'Fair',
    'sugar' : 'Fair',
    'juice' : 'Fair',
    'Coordination' : 'Coordination',
    'pregnancy' : 'Ninshin'
}



class MarisaTranslator:
    def __init__(self, user_dic):
        self.mecab = MeCab.Tagger("-u " + user_dic)

    def _check_san(self, n):
        """
Judgment whether to add "san"
        """
        f = n.feature.split(',')
        if f[0] == 'noun':
            if f[1] == 'Proper noun' or f[1] == 'General':
                if n.next:
                    #Check the next word
                    nf = n.next.feature.split(',')
                    if nf[0] in ['noun', 'Auxiliary verb']:
                        #If the noun follows, do not add "san" here
                        return False
                    else:
                        if n.surface.endswith('Mr.'):  # Mr.でおわる場合は付与しない
                            return False
                        if n.surface == 'Mr' or n.surface == 'Sama':  # Mrでおわる場合は付与しない
                            return False
                        return True
                else:
                    return True
        return False


    def _check_separator(self, n):
        """
Judgment whether to add ","
        """
        f = n.feature.split(',')
        if f[0] == 'Particle':
            if n.next:
                #Check the next word
                nf = n.next.feature.split(',')
                if nf[0] in ['symbol', 'Particle']:
                    return False
                return True
        return False


    def _get_gobi(self, n):
        if n.next:
            f_next = n.next.feature.split(',')
            if n.next.surface == '、':
                return None
            if f_next[0] == 'BOS/EOS' or f_next[0] == 'symbol':
                f = n.feature.split(',')
                if f[0] in ['Particle', 'noun', 'symbol', 'Interjection']:
                    return None
                if f[5] in ['Command e', 'Continuous form']:
                    return None
                if n.surface in ['Is']:
                    return 'What'
                else:
                    return n.surface + 'Noze'
        return None

    def translate(self, src):
        n = self.mecab.parseToNode(src)
        text = ''
        pre_node = None
        while n:
            f = n.feature.split(',')
            if n.surface in converter:
                text += converter[n.surface]
            elif len(f) > 8:
                gobi = self._get_gobi(n)
                if gobi is not None:
                    text += gobi
                elif f[8] != '*':
                    text += f[8]
                else:
                    text += n.surface
            else:
                text += n.surface
            if self._check_san(n):
                text += 'Mr.'
            elif self._check_separator(n):
                text += '、'
            n = n.next
            pre_node = n
        return jctconv.kata2hira(text.decode('utf-8')).encode('utf-8')

Example of using the above class:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from yukkuri_translator import MarisaTranslator


if __name__ == "__main__":
    t = MarisaTranslator('yukkuri.dic')
    print t.translate('Don't just make shit videos, this de morons.')

Description

Conversion to hiragana

By making all the characters into hiragana, it becomes a line like a bean paste brain.

To do this, first perform a morphological analysis in MeCab. This will give you the reading of each word. This applies to 8 features (starting with 0). Since this reading is in katakana, use jctconv to convert everything to hiragana.

It may be misread, but it's ** specification ** because it's just bean paste.

Since it is only hiragana, add "," appropriately.

Due to the slow specifications, hiragana will be used a lot. Therefore, in order to improve readability, add "," after the particle as much as possible. See "_check_separator" for more information on this condition.

Add "san" to the end of "noun"

By adding "san" to the end of the noun, you can express the slowness. If the noun follows, there are conditions such as exclusion, so please check "_check_san" for details.

Add "noze" to the end of the word

The ending of Slow Marisa has a characteristic, and in many cases the end of the sentence ends with "Noze" or "Nanoze", so I reproduced it.

An example is as follows.

Managing payments and spending is a matter of course

If there is

It's natural to manage spending and spending.

It will be.

See "_get_gobi" for ending conditions.

Replace word

Try to replace some words. For example, the dirty word "feces" is replaced with "yes" to stabilize the user's mind. This replacement is performed according to the contents registered in the converter variable.

Summary

By using MeCab's morphological analysis, it was confirmed that sentences that are bad for mental health can be disguised as if they were talking slowly and cutely.

By applying this, it is thought that translations into sentences such as "Slow Reimu", "Slow Youmu", and "Yaruo" can be performed.

The application that runs on the Web and its code are attached below.

** Slow translation ** http://needtec.sakura.ne.jp/yukkuri_translator/ https://github.com/mima3/yukkuri_translator

that's all.