Introduction

I've been sloppy about studying crawling and scraping during my free spring break as a college student, but I decided to make a Twitter bot because I wanted to make something here. With the help of Friend B, I created a bot that analyzes Friend B's tweets and makes Friend B-like tweets.

Development environment

Using Ubuntu virtual environment with virtualbox on Windows 10 Ubuntu 18.04.4 LTS

What I used

--Markov chain rule

MeCab
tweepy
MySQL
VPS (Virtual Private Server)

I will write a little explanation for each

Markov chain rule

When I investigated by asking, "Well, how do you make the sentences to be generated look like that person? How do you do the automatic generation of sentences in the first place?", I found a Markov chain rule.

Please refer to this article for the Markov chain rule, which is also written in an easy-to-understand manner. I also referred to the Python code. ** Markov chain rule is amazing! ** It will be.

[Python] Generate sentences with N-th floor Markov chain](https://qiita.com/k-jimon/items/f02fae75e853a9c02127)

MeCab We use MeCab as a means to analyze the retrieved tweets. The basic flow of sentence generation is "to perform morphological analysis on the acquired tweets using MeCab, create a list of words, and arrange the words according to the Markov chain rule".

Click here for a detailed explanation of MeCab.

[Technical explanation] What is morphological analysis? From MeCab installation procedure to execution example in Python

tweepy To operate Twitter automatically, use the Twitter API officially provided by Twitter. The Twitter API allows you to programmatically operate your account.

When operating the Twitter API from Python, tweepy, which is a Python library, is convenient, so I learned how to use it while searching on the net while referring to this book.

[Python Crawling & Scraping-Practical Development Guide for Data Collection and Analysis-](https://www.amazon.co.jp/Python%E3%82%AF%E3%83%AD%E3%83%] BC% E3% 83% AA% E3% 83% B3% E3% 82% B0-% E3% 82% B9% E3% 82% AF% E3% 83% AC% E3% 82% A4% E3% 83% 94 % E3% 83% B3% E3% 82% B0-% E3% 83% 87% E3% 83% BC% E3% 82% BF% E5% 8F% 8E% E9% 9B% 86% E3% 83% BB% E8% A7% A3% E6% 9E% 90% E3% 81% AE% E3% 81% 9F% E3% 82% 81% E3% 81% AE% E5% AE% 9F% E8% B7% B5% E9% 96% 8B% E7% 99% BA% E3% 82% AC% E3% 82% A4% E3% 83% 89-% E5% 8A% A0% E8% 97% A4-% E8% 80% 95% E5% A4% AA / dp / 4774183679 / ref = tmm_other_meta_binding_title_0? _Encoding = UTF8 & qid = & sr =)

MySQL

Anyway, I decided to study the database, so I decided to use MySQL to save the data acquired by the Twitter API. Using MySQL has the advantage that it is easy to extend the application later. At the time of trial production, I tried to put the data in a text file, but it seems that the operation was a little slow.

If you want to know more about MySQL, please refer to the link below. I think you can get an image.

https://www.atoone.co.jp/column/10114/

VPS I rented a VPS because I needed a server to run the bot. The image is that you can use one computer that is always running.

Building the necessary environment required knowledge of Linux, which was a rather hard task for me as a server beginner, but I installed the necessary software such as MySQL and MeCab.

I was able to easily borrow it at CohoHa VPS. I will use it as a web server soon. It's cheap if you can play on the server for 880 yen a month ...

ConoHaVPS

Bot app configuration

At last it is the main subject, but it looks like this in the figure.

The role of each script is as follows @ twitter_collector.py (script for data collection)

Use Twitter API with tweepy, a Python library, to get tweet data
Extract necessary elements (text, posting date and time, etc.) from the acquired data and save it in the database
Use the streaming API to add new tweets to the database at any time

@ bot.py (Bot body)

Read the tweet body from the database and format it into a form that is easy to process
Morphological analysis of text data with MeCab to model trends in the content of friends B's tweets
Generate a tweet like Friend B according to the created model
Post tweets generated by Twitter API (Tweepy)

I think that it is easier to manage by separating the part that collects data and the bot body that tweets.

Bot automation

We prepared the environment of CentOS 8 on the server, installed each necessary tool such as MeCab, MySQL, Python on it and built the environment.

The bot is run by automatically executing a shell script for executing Python scripts by systemd. Systemd is explained in an easy-to-understand manner in this article.

Automatic startup by systemd

To make it more similar

It was fun because it produced quite interesting tweets just by making sentences according to Markov chain rules, but at the moment I am devising the following two points because I want to make it more similar.

--Check the distribution of the person's tweet interval pattern for each time zone, and let them tweet accordingly. --Investigate the distribution of the number of characters in your tweet and determine (the range) of the number of characters to tweet according to it.

Both are achieved by getting data from tweets actually made by Friend B, listing them, and randomly selecting values from them.

Impressions

As expected, the longer the sentence, the less the meaning of the sentence becomes, but if it is a short tweet, you may tweet as if you really said it, and words are randomly selected from the contents that the person tweeted in the past. Therefore, depending on the combination, I was able to generate tweets that would make me laugh.

I think that if the icons are exactly the same, it may not be possible to distinguish between the bot and the person himself, and I want to do a Turing test.

Friend B, thank you for your cooperation.

in conclusion

I wondered if I could do it with machine learning, but let's call it the next task ... Spring break is over ... It's an online class.

Thank you for reading to the end. If you have any advice, I would appreciate it if you could comment.

[CENTOS] [Python] Created a Twitter bot that generates friend-like tweets using Markov chains