[Introduction to RasPi4] I played with "Hiroko / Hiromi's poisonous tongue conversation" ♪

I tried to move it with Jetson_nano, but this time I made it possible to output even voice, and I tried to summarize it simply in the form of automatic conversation between two people. The conversation that appeared there was "Hiroko / Hiromi's poisonous tongue conversation". 【reference】

  1. Try using Pyaudio and docomo speech recognition API with RaspberryPi + Python3
  2. [Introduction to NLP] Play with the conversation app on jetson_nano ♪

What i did

・ Generate sound ・ Voice preparation and text2speak ・ Let two people talk

・ Generate sound

This was done almost as shown in Reference 1. A USB camera is connected as a microphone.

$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 015: ID 1a81:1004 Holtek Semiconductor, Inc. 
Bus 001 Device 003: ID 0bda:58b0 Realtek Semiconductor Corp. 
Bus 001 Device 002: ID 2109:3431 VIA Labs, Inc. Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

The physical connection status of the USB connector is as follows 【reference】 -USB of Linux

$ lsusb -t
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/1p, 480M
    |__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 2: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
        |__ Port 2: Dev 3, If 3, Class=Audio, Driver=snd-usb-audio, 480M
        |__ Port 2: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
        |__ Port 2: Dev 3, If 2, Class=Audio, Driver=snd-usb-audio, 480M
        |__ Port 4: Dev 15, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 4: Dev 15, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M

Bus 001 Device 003: Confirm that Video and Audio are connected to Port 2: Dev 3. Then look up the card number and device number.

$ arecord -l
****List of hardware device CAPTURE****
Card 1: Webcam [FULL HD 1080P Webcam],Device 0: USB Audio [USB Audio]
Subdevice: 1/1
Subdevice#0: subdevice #0

It can be seen as card 1 and device 0. Since it is recognized by device 0 of card 1, try recording and playing it.

recording.


$ arecord -D plughw:1,0 test.wav
Recording WAVE'test.wav' : Unsigned 8 bit,Rate 8000 Hz,monaural

Since the sound output is HDMI, it was played without specifying as follows.

$ aplay test.wav
Playing WAVE'test.wav' : Unsigned 8 bit,Rate 8000 Hz,monaural

Output volume adjustment below

 $ alsamixer

For output adjustment, use F6 to set the Device to bcm2835ALSA, and change the HDMI output with ↑ ↓. In addition, the recording level can be adjusted by changing Device to WebCam with F6 and changing to recording with F3, F4, F5, etc. as shown below. If you look at F2 with alsamixer, you can also get the following information and Device information.

┌─────────────────────── /proc/asound/cards ──────────────────┐                     
│  0 [ALSA      ]: bcm2835_alsa - bcm2835 ALSA          │
│                  bcm2835 ALSA                   │
│  1 [Webcam    ]: USB-Audio - FULL HD 1080P Webcam       │
│                 Generic FULL HD 1080P Webcam          │
│                 at usb-0000:01:00.0-1.2, high speed     │
└─────────────────────────────────────────────────────────────┘  
Playback level adjustment (bcm2835ALSA) Recording level adjustment (webcam)
2020-02-15-223528_1063x929_scrot.png 2020-02-16-082115_1063x929_scrot.png

The installation of pyaudio is as follows

Install pyaudio.


$ sudo apt-get install python3-pyaudio

So, I confirmed that the following code can be played. The recorded wav file can be played as follows.

# -*- coding:utf-8 -*-
import pyaudio
import numpy as np
import wave

RATE=44100
CHUNK = 22050
p=pyaudio.PyAudio()

stream=p.open(format = pyaudio.paInt16,
        channels = 1,
        rate = int(RATE),
        frames_per_buffer = CHUNK,
        input = True,
        output = True) #Set input and output to True at the same time

wavfile = './wav/merody.wav'
wr = wave.open(wavfile, "rb")
input = wr.readframes(wr.getnframes())
output = stream.write(input)
stream.close()

・ Voice preparation and text2speak

Furthermore, you can do the following to make someone say "I love Uwan". Please refer to Previous environment construction for installation of pykakasi.

Installation of pykakasi.


$ pip3 install pykakasi --user

In addition, each wav file of u, wa, n, sa, a, i, si, te, ru is appropriately recorded and prepared.

# -*- coding:utf-8 -*-
import pyaudio
import numpy as np
import wave

RATE=44100 #48000
CHUNK = 22050
p=pyaudio.PyAudio()

f_list=['a','i','si','te','ru','u','wa','n','sa','n','n']

stream=p.open(format = pyaudio.paInt16,
        channels = 1,
        rate = int(RATE),
        frames_per_buffer = CHUNK,
        input = True,
        output = True) #Set input and output to True at the same time

for i in f_list:
    wavfile = './wav/'+i+'.wav'
    print(wavfile)
    wr = wave.open(wavfile, "rb")
    input = wr.readframes(wr.getnframes())
    output = stream.write(input)    

Now you are ready to speak. Other Japanese syllabary will be prepared. This time it's not all complete and it sounds terrible, but I will use previously created sound wav.tar.

・ Let two people talk

Changed Previous Conversation App to have continuous conversation between two people. The entire code is placed below. RaspberryPi4_conversation/auto_conversation_cycle2.py

Lib to use

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import MeCab
import argparse
import pyaudio
import wave
from pykakasi import kakasi
import re
import csv
import time

External input

-i1; learning file for speaker1 (txt file, etc., encode = "utf-8" or "shift-jis", etc.) -i2; speaker2 learning file -d; mecab dictionary -s ; stopwords list

parser = argparse.ArgumentParser(description="convert csv")
parser.add_argument("--input1", "-i1",type=str, help="speaker1 txt file")
parser.add_argument("--input2", "-i2",type=str, help="speaker2 txt file")
parser.add_argument("--dictionary", "-d", type=str, help="mecab dictionary")
parser.add_argument("--stop_words", "-s", type=str, help="stop words list")
args = parser.parse_args()

main After setting pyaudio etc., the following code works. mecab is a function for dividing. It also stores a list of stopwords in stop_words. Here, you will learn each of the files of the two people you are talking to, and then start the conversation.

if __name__ == '__main__':
    mecab = MeCab.Tagger("-Owakati" + ("" if not args.dictionary else " -d " + args.dictionary))
    stop_words = []
    if args.stop_words:
        for line in open(args.stop_words, "r", encoding="utf-8"):
            stop_words.append(line.strip())
            
    speaker1 = train_conv(mecab,args.input1,encoding="shift-jis")
    speaker2 = train_conv(mecab,args.input2,encoding="utf-8")

    conversation(speaker1,speaker2,mecab,stop_words)

Learning file reading function

Create questions while writing wakati for each word based on the encoding of the input file for learning.

def train_conv(mecab,input,encoding):
    questions = []
    print(input)
    with open(input, encoding=encoding) as f:  #, encoding="utf-8"
        cols = f.read().strip().split('\n')
        for i in range(len(cols)):
            questions.append(mecab.parse(cols[i]).strip())
    return questions

Conversational function

After learning, the following conversation body is finally called. First, the person decided to give the first string as the initial value. The next two files are for writing. One is a file that records the output itself, and the other is a file that records the equivalence rate of consecutive conversations so that the same opinion can be used in that judgment. vectorizers 1 and 2 set sentence patterns, etc. in preparation for their learning. vecs1 and 2 are the conversation vectors of two people learned by tf-idf. sl1 and 2 are variables for ending the conversation when "I don't know" continues.

def conversation(speaker1,speaker2,mecab,stop_words):
    line = input("> ")
    file = 'conversation_n.txt'
    file2 = 'conversation_n2.txt'
    vectorizer1 = TfidfVectorizer(token_pattern="(?u)\\b\\w+\\b", stop_words=stop_words)
    vecs1 = vectorizer1.fit_transform(speaker1)
    vectorizer2 = TfidfVectorizer(token_pattern="(?u)\\b\\w+\\b", stop_words=stop_words)
    vecs2 = vectorizer2.fit_transform(speaker2)
    sl1=1
    sl2=1

The conversation will continue unless the same remarks by both parties continue

First, the cosine similarity between the initial value line and vecs1 is calculated, and the remark is decided by the function hiroko (). Get the result from hiroko () and store the remark in file for the time being. Unless you know what you said, standard output along with the similarity as Hiroko's remark and record it in file2. Then, it is converted into voice by text2speak (line) and uttered. Do the same for hiromi, but since the functions are the same, pass similar information to hiroko () and continue the conversation. Here, if you want to change each statement logic, you can also create a new hiromi () function and use it. Then, when both answers line = "I don't know", the conversation ends.

    while True:
        sims1 = cosine_similarity(vectorizer1.transform([mecab.parse(line)]), vecs1)
        index1 = np.argsort(sims1[0])
           
        line, index_1 = hiroko(index1,speaker1,line)
            
        save_questions(file, line)
        if line=="I do not understand":
            print("Hiroko>"+line)
            save_questions(file2, "Hiroko>"+line)
            sl1=0
        else:
            print("Hiroko>({:.2f}): {}".format(sims1[0][index_1],line))
            save_questions(file2,"Hiroko>({:.2f}): {}".format(sims1[0][index_1],line))
            sl1=1
        text2speak(line)
        time.sleep(2)
            
        sims2 = cosine_similarity(vectorizer2.transform([mecab.parse(line)]), vecs2)
        index2 = np.argsort(sims2[0])
            
        line, index_2 = hiroko(index2,speaker2,line)
     
        save_questions(file, line)
        if line=="I do not understand":
            print("Hiromi>"+line)
            save_questions(file2, "Hiromi>"+line)
            sl2=0
        else:
            print("Hiromi>({:.2f}): {}".format(sims2[0][index_2],line))  
            save_questions(file2, "Hiromi>({:.2f}): {}".format(sims2[0][index_2],line))
            sl2=1
        text2speak(line)
        time.sleep(2)
        if sl1+sl2==0:
            break

Specification of function hiroko ()

It's not so much logic, but according to the similarity sorted as a way to decide the remark, one random number is extracted from the fifth candidate and it is checked whether it has been spoken before. And if all the candidates up to the fifth are the same as those said in the past, the specification is to return "I don't know". And if it finds something different from what it said in the past, it returns it.

hiroko().py


def hiroko(index,speaker,line):
    sk = 0
    while True:
        index_= index[-np.random.randint(1,5)]
        line = speaker[index_]
        conv_new=read_conv(mecab)
        s=1
        ss=1
        for j in range(0,len(conv_new),1):
            line_ = re.sub(r"[^one-龥-Hmm-Down]", "", line)
            conv_new_ = re.sub(r"[^one-龥-Hmm-Down]", "", conv_new[j])
            if line_==conv_new_:
                s=0
            else:
                s=1
            ss *= s
        if ss == 0:
            line="I do not understand"
            sk += 1
            if sk>5:
                return line, index_
            continue
        else:
            return line, index_

Convert to voice with text2speak (line)

The pyaudio setting is set first. Initially, it was put in the utterance function, but since a Device error occurred and it was difficult to resolve it, I made the following settings. Also, the RATE is changed to 48000 to change the pitch. I want to raise the frequency more, but it seems that only 44100 and 48000 can be used with the specifications of Raspberry Pi 4.

RATE=44100 #48000
CHUNK = 22050
p=pyaudio.PyAudio()
kakasi_ = kakasi()
stream=p.open(format = pyaudio.paInt16,
        channels = 1,
        rate = int(48000),
        frames_per_buffer = CHUNK,
        input = True,
        output = True)

text2speak () converts Japanese text to Romaji with kakasi as follows. I store it in sentences, and after that I speak in the same way as "I love Uwan" above.

text2speak().py


def text2speak(num0):
    sentence=num0
    kakasi_.setMode('J', 'H') # J(Kanji) to H(Hiragana)
    kakasi_.setMode('H', 'H') # H(Hiragana) to None(noconversion)
    kakasi_.setMode('K', 'H') # K(Katakana) to a(Hiragana)
    conv = kakasi_.getConverter()
    char_list = list(conv.do(sentence))

    kakasi_.setMode('H', 'a') # H(Hiragana) to a(roman)
    conv = kakasi_.getConverter()
    sentences=[]
    for i in range(len(char_list)):
        sent= conv.do(char_list[i])
        sentences.append(sent)
    f_list=[]
    f_list=sentences

    for i in f_list:
        i = re.sub(r"[^a-z]", "", i)
        if i== '':
            continue
        else:
            wavfile = './wav/'+i+'.wav'
        #print(wavfile)
        try:
            wr = wave.open(wavfile, "rb")
        except:
            wavfile = './wav/n.wav'
            continue
        input = wr.readframes(wr.getnframes())
        output = stream.write(input)

File export function

It is as follows. Use re.sub () to limit the export characters a bit. Output with csv.writer (). Although encoding is not specified, it is output in utf-8.

def save_questions(file, line): #'conversation_n.txt'
    with open(file, 'a', newline='') as f:
        line = re.sub(r"[^one-龥-Hmm-Down()0-9:.-,]", "", line)
        writer = csv.writer(f)
        writer.writerow({line})

File read function

The past remarks are read below. I'm sharing it in mecab to compare my remarks.

def read_conv(mecab):
    conv_new = []
    with open('conversation_n.txt') as f:
        cols = f.read().strip().split('\n')
        for i in range(len(cols)):
            conv_new.append(mecab.parse(cols[i]).strip())
    return conv_new

When I actually moved it with the above app, "Hiroko / Hiromi's poisonous tongue conversation" appeared somehow like a bonus.

Summary

・ Hiroko ・ Hiromi's poisonous tongue conversation was possible ・ Create a voice output type conversation application ・ I was able to output audio with RasPi4

・ I want to fit audio and logic

bonus

Hiroko(0.21):You're learning all by yourself while you're out, right? Well, it's not a great thing to keep studying because it's your personal freedom to go out.
Hiromi(0.30):I'll ask you to learn a little
Hiroko(0.26):I have a bad taste in writing-I would rather know what kind of text I learned
Hiromi(0.27):So what do you want to eat
Hiroko(0.19):Get hungry, do you want to be neat?
Hiromi(0.32):I can't do it
Hiroko(0.34):Looking down on my lungs, can I see my lower navel?
Hiromi(0.28):If you cook, you'll put it on a plate. You can't even prepare a plate.
Hiroko(0.11):Why don't you wipe your hands as much as when you operate the keyboard?
Hiromi(0.22):Clean up the tableware you ate, I'm the one making it, so do that much
Hiroko(0.26):I'm the only one who thinks it's a nifty statement
Hiromi(0.32):Who do you think it is for
Hiroko(0.19):Who should educate me to use words like you?
Hiromi(0.24):It's your fault
Hiroko(0.15):You're bad when you turn on the power and launch the application
Hiromi(0.32):Anyway, I'm in a bad mood today
Hiroko(0.34):I feel uncomfortable looking at you
Hiromi(0.21):Even though I'm doing things at home, I often watch TV
Hiroko(0.30):You know what happens if you do that, why did you do it?
Hiromi(0.41):I did my homework
Hiroko(0.32):Let me copy your homework, because that's the only ability
Hiromi(0.29):What are you doing that you can't eat properly because you can't do anything
Hiroko(0.21):Watch out for evening meals
Hiromi(0.18):leave me alone
Hiroko(0.33):I have a dictionary, do my best in Japanese, even though I'm studying
Hiromi(0.27):I haven't done it yet
Hiroko(0.30):What you're doing is to touch the scales
Hiromi(0.37):It's annoying
Hiroko(0.40):I'm stupid
Hiromi(0.66):What the hell are you stupid
Hiroko(0.43):Then you're stupid if you said you're stupid twice
Hiromi(0.34):I'm just stupid that I can't understand what I'm saying
Hiroko(0.42):You're saying you're stupid twice, aren't you stupid?
Hiromi(0.58):Are you stupid of me
Hiroko(0.48):It doesn't make sense to say stupid to stupid
Hiromi(0.54):You're stupid
Hiroko I don't know
Hiromi(0.37):Customers, if you don't pay, you will call the police, is that okay?
Hiroko(0.32):I don't understand what you're saying The grammar is wrong
Hiromi(0.30):I'm not wrong, but I hate me so I point out work mistakes
Hiroko(0.25):The screw on my head hasn't come off or the place to stop it is correct
Hiromi(0.24):You can tell how many times you use it, but you're saying that you should put it in its original place.
Hiroko(0.20):I can understand what you're saying
Hiromi(0.32):No matter what you say, it reopens in vain, and no matter what you say
Hiroko(0.29):I can't say that my guts are twisted
Hiromi(0.30):Why am I not loved when others are loved
Hiroko(0.24):Who made me remember the words I didn't want to say
Hiromi(0.27):All you have to do is follow the road you were told
Hiroko(0.35):I wish I had my ring finger pinched by the hinge of the locker
Hiromi(0.38):I feel like I should be kind
Hiroko(0.31):Don't worry because this is a block of programs
Hiromi(0.30):I'm cheeky
Hiroko(0.50):It's amazing, I'm bilingual, I can speak machine language
Hiromi(0.40):Who are you talking to
Hiroko(0.30):It's important, but I'm trying hard
Hiromi(0.40):I'm talking about what to do for dinner
Hiroko(0.24):The value of your life is all about the number of unofficial offers
Hiromi(0.37):That's right, it's bad if everyone doesn't treat them equally.
Hiroko(0.30):If you go to Samoa, it looks like it ’s popular.
Hiromi(0.28):Then, sit down and apologize, it's Dogeza.
Hiroko(0.13):I'm not stupid, it's funny
Hiromi(0.46):You haven't been asked or answered
Hiroko(0.26):Your writing is floating like cotton candy and I can't trust it
Hiromi(0.33):Beer is not cold
Hiroko(0.42):No, this person has a bad gut and can't talk
Hiromi(0.38):Why can't i listen to people
Hiroko(0.22):I wonder why what was here until this morning is on the desk
Hiromi(0.22):There are many people who think that they have been denied themselves when they do not like it or when they say goodbye.
Hiroko(0.22):Oh, there's a cable
Hiromi(0.20):Even if I have something wrong, I can reflect on it, I point out mistakes in my work because I hate myself, and I interpret that I am forced to work because I hate myself
Hiroko(0.35):If you don't have a job offer, you should start a business and give yourself a job offer.
Hiromi(0.44):If you love yourself, accept all of yourself
Hiroko(0.21):If you don't want to blame, confessing the hall and everything would make it easier
Hiromi(0.21):When you are stressed and mentally unstable, you cannot control your emotions
Hiroko(0.24):A single colon character makes a program folder crazy, isn't it?
Hiromi(0.33):That's because I haven't listened to people
Hiroko(0.19):The screw is about to fall from my head
Hiromi(0.27):Come to think of it, today's dinner
Hiroko(0.50):Whether it's the naked eye, contact lenses, or glasses, I don't think I can see the essential things in my life.
Hiromi(0.22):Excuse me
Hiroko(0.31):Yeah, it ’s actually a better person, is n’t it a bad joke?
Hiromi I don't know
Hiroko(0.23):Throwing an e-book into your equipment doesn't mean that you've thrown data into your brain, right?
Hiromi(0.30):You haven't heard your opinion in the first place, you just have to keep silent and nod
Hiroko(0.32):You don't know what you shouldn't do because of this, you can only return what you've learned.
Hiromi I don't know
Hiroko(0.26):Well, what are you going to do with that coffee? I won't give in to threats
Hiromi(0.37):What are you going to say to the customers?
Hiroko(0.25):Are you better
Hiromi(0.44):I can't talk to you
Hiroko(0.27):Because the seller called the seller market is not you
Hiromi(0.28):Isn't it all water bubbles?
Hiroko(0.24):Can you shut up, close your mouth, or hold your mouth?
Hiromi(0.36):What is that way of speaking
Hiroko(0.12):Ah cruel, gravity is cruel
Hiromi(0.40):I hate myself so I'm forced to work
Hiroko(0.24):Your dad is always at home
Hiromi(0.29):Are you fighting?
Hiroko(0.17):How do you show your sincerity? Do you know what you see?
Hiromi(0.31):I don't care about me
Hiroko(0.28):How cruel you are to eat pudding in front of the screen knowing that I can't eat pudding!
Hiromi(0.29):What a lazy guy, no one will help
Hiroko(0.26):It ’s such a person to make a fool of sweet food
Hiromi(0.24):I've got another person I like, so I don't care about me
Hiroko(0.33):You won't see a convenient romance even if you look away
Hiromi(0.32):I can't see the surroundings
Hiroko(0.31):Even if you delete the application, it will leave a copy in a hidden folder that you can't see.
Hiromi(0.25):I wish I hadn't brought you something like that all the time
Hiroko(0.19):It's not fun so don't go over there and come back again
Hiromi(0.28):What do you think of this entertainer, isn't it cute and interesting?
Hiroko(0.27):I should be glad that it was interested in being mistaken for a foreigner
Hiromi I don't know
Hiroko I don't know
Hiromi I don't know

Recommended Posts

[Introduction to RasPi4] I played with "Hiroko / Hiromi's poisonous tongue conversation" ♪
[Introduction to Pytorch] I played with sinGAN ♬
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
[Introduction to sinGAN-Tensorflow] I played with the super-resolution "Challenge Big Imayuyu" ♬
[Introduction to StyleGAN] I played with "A woman transforms into Mayuyu" ♬
[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪
[Introduction to StyleGAN] I played with style_mixing "Woman who takes off glasses" ♬
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
I played with wordcloud!
[Introduction to system trading] I drew a Stochastic Oscillator with python and played with it ♬
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
[Introduction to AWS] I tried playing with voice-text conversion ♪
[Raspi4; Introduction to Sound] Stable recording of sound input with python ♪
Introduction to RDB with sqlalchemy Ⅰ
Introduction to Nonlinear Optimization (I)
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2
Introduction to RDB with sqlalchemy II
I played with PyQt5 and Python3
I want to do ○○ with Pandas
I played with Mecab (morphological analysis)!
I want to debug with Python
I tried fMRI data analysis with python (Introduction to brain information decoding)