[DOCKER] [GPT-2] I tried to make a fake Trump president's bot by fine-tuning President Trump's Twitter with GPT-2, which is talked about as "too dangerous".

background

It has been talked about that if you use GPT-2 published by OpenAI, you can automatically generate natural sentences. There is.

Cornell University's latest survey shows that 70% of people who read GPT-2 generated texts read the texts in New York. ・ The result is that it was misunderstood as a Times article.

A new full model with 1,558 million (1558M) parameters has been released.

However, in reality, there are still unknowns about how this AI can be used effectively. Therefore, I developed and released a Web application called "Mockers", an online tool that anyone can easily use GPT-2. By doing so, I would like to provide an opportunity to consider how to use GPT-2.

If you want to get a feel for what GPT-2 looks like, try this Mockers generation tool. https://mockers.io/generator

Although it is in English, please refer to here for how to use it. https://doc.mockers.io/archives/1966/ https://doc.mockers.io/archives/1987/

Purpose of this article

Share the results of experiments that challenged fine tuning using Mockers.

Fine tuning is to use a model that has already been trained, give additional data, train it at low cost, and generate another model. A model is created that learns the context and style of a given sentence and generates the sentence according to it. Mockers does more than just try GPT-2, it supports fine-tuning and auto-posting.

Use Case

By using this mechanism, for example, the following use cases can be realized.

――It is possible to build a media that does not infringe the copyright of a certain curated media, imitates it, and parasitizes it so that it receives PV spills.

――You can build a bot that constantly impersonates a Twitter account.

What you want to try

In this article, as a demo, I will experiment with fine tuning using GPT-2. I used Mockers to fine-tune President Trump's Twitter to create a fake Trump presidential bot.

Here, too, you can always see the latest President Trump's Mock. https://mockers.io/timeline

procedure

Access the following page. https://mockers.io/login

If you log in successfully, you will be prompted to create a model as it is, so press "Go to creation screen".

When the new model dialog is displayed, enter the "model name" appropriately and set the "model type" to "custom model (Twitter)". Now you can generate a fine-tuned model for your Twitter account. Enter the target Twitter account in "Target account to mock (input)".

screencapture-mockers-io-model-settings-2019-12-18-21_50_41.png

Currently, it takes up to 2 hours to generate a model. Once the model is generated, it will be automatically generated on a regular basis, but you can also register an account to tweet the generated text. To do this, you need to register with the Twitter API in advance. See the article below for how to apply. https://qiita.com/kngsym2018/items/2524d21455aac111cdee

The Twitter account entered in "Synchronize (input) target account" triggers the tweet and generates sentences with content related to the tweet.

In this way, "impersonation (Mock)" is realized.

Experimental result

The following is the content actually tweeted by Fake Trump. It's not necessarily the line he's likely to say, but sometimes he says something that fits his position, and even if it's not, it's generated with topics that are related to what he said in the past. You can see that.

University of Alberta Human Kinetics (SHK) University is the premier facility for conducting high energy, high impact research in science, physical sciences and engineering.

The Democratic candidate was not even a Democrat, and the Republican candidate was a strong and capable conservative. Both of them were the people I strongly supported. Our country is excellent for both.

Since 2015, the first major US military operations to tackle IS militants in Syria and Iraq have deployed at least 2,500 additional ground forces, but the group's most extreme.

Task

――GPT-2 can control the length on a word-by-word basis, but since it cannot control the number of characters, it cannot be optimized for media that is strict about the number of characters like Twitter. As a result, if you exceed Twitter's limit of 280 characters, you have to forcibly cut it.

--The GPU memory required for fine tuning is too large to work on GPUs that can be used by the general public on models of 774M and above. Even my privately owned "Geforce GTX 1080 Ti" and AWS P3 instance "Tesla V100" didn't work due to lack of memory. (Normal reasoning is possible)

in conclusion

It is expected that the accuracy of sentence generation technology based on large-scale unsupervised learning will continue to improve, and at the same time, it will shift to the phase of utilization in the real world. We hope that this article and Mockers can contribute to natural language AI and its development and social implementation.

P.S. Don't forget Hillary.

It's an argument I heard from Senate Democrats before the election, and Democrats said this could happen if this was a Republican, Republican Republican, or other party.

https://mockers.io