[RUBY] Automatically generate introductory text for AV works with DMM API, MeCab, and Markov chains

Published e-book "Artificial Intelligence Pornography"

Artificial Intelligence Porn: Erotic texts written by computers The generated result of this article is an e-book. We have posted 100 sentences of about 200 to 400 characters. If you are interested, please!

Automatically generate

I was impressed with this article. I wanted to do something myself, but the hurdles for automatic generation using Deep Learning are high, so I decided to try automatic generation using Markov chains first. [Evangelion] Try to automatically generate Asuka-like lines with Deep Learning

That's right! AV!: Satisfied:

Data is required for automatic generation. I don't understand anime, so even if I imitate Evangelion, I don't get excited. What raises your tension? ... I immediately came up with an introduction to AV.

Advantages of AV introduction

――It seems that there is a lot of data with about 120 characters? --Compared to the title, there will be variations in the generated sentences. --Enthusiastic sentences to excite people: chart_with_upwards_trend: ――It looks fun because it is not inorganic --Can be obtained with the DMM API --Ver3.0 was released in March 2016 ――It should be easier to develop because it should have been improved.

Register the DMM API

https://affiliate.dmm.com/api/guide/ That's why I started immediately. Please refer to this usage guide.

--DMM Affiliate Registration --Affiliate ID issuance --Get API ID

Get the work introduction

I referred to here. The API of DMM is ver3.0, so it needs to be modified. http://akms.hateblo.jp/entry/2013/05/24/234703

code

I got the introductory text of 1000 works and wrote it in ero.txt.

rb::dmm.rb


# -*- coding: utf-8 -*-

require 'open-uri'
require 'rexml/document'

def getURL(offsetNum)
	url = "https://api.dmm.com/affiliate/v3/ItemList?"
	queries = []

	params = {
	  "api_id"       => 'YOUR_API_ID',
	  "affiliate_id" => 'YOUR_AFFILIATE_ID',
	  "site"         => 'DMM.R18',
	  "service"      => 'digital',
	  "floor"        => 'videoa',
	  "sort"         => 'rank',
	  "offset"       => offsetNum,
	  "hits"         => 100,
	  "output"       => "xml"
	}

	params.each_pair do |key,value|
	  queries.push("#{key}=#{value}")
	end

	url += queries.join("&")

	return url

end

#Open a text file for writing
File.open("ero.txt", "w") do |file|

	#Loop as many times as the offset you want to get
	10.times do
		url = getURL(num)
		res = open(url)
		REXML::Document.new(res).elements.each("xml/result/items/item") do |element|
			#Write a work introduction
			file.puts element.elements['comment'].text
		end
	end
end

Perform a Markov chain

Prepare MeCab

I think there are various ways to do it, but I read the article below and set it up. http://qiita.com/grachro/items/4fbc9bf8174c5abb7bdd

Markov chain script preparation

I referred to this script. https://github.com/o-tomox/TextGenerator

Since I ran it in Python 3 series, I modified the above script from 2 series to 3 series. https://gist.github.com/naoyashiga/4dfaa7e2a5222a9cadd9

Introductory text is automatically generated

Get an introductory text

$ ruby dmm.rb

write to db

$ python PrepareChain.py

Automatically generated

$ python GenerateText.py

Output result

I was able to output: sunglasses:

Oops, the content seems too obscene! Unfortunately I can't publish it here: weary:

If you are interested, let's do it yourself!

Consideration and prospects

――I want to try deep learning ――I want to be able to adjust the amount of generated text ――I want to generate based on a larger amount of sentences

Postscript

I tried using a neural network. DMM API, char-rnn (recurrent neural network) automatically generates an introductory text for AV works --not good but great

Recommended Posts

Automatically generate introductory text for AV works with DMM API, MeCab, and Markov chains
Perform a Twitter search from Python and try to generate sentences with Markov chains.
Install tweepy with pip and use it for API 1.1
Try morphological analysis and Markov chains with Django (Ari with a lot of room for improvement)