Try morphological analysis and Markov chains with Django (Ari with a lot of room for improvement)

Introduction

Acquired Twitter data long ago and Markov chain. I did that, but I tried to incorporate it into Django. First of all, the result is a big problem ant. I will leave the progress of the work at the moment.

Click here for past articles Accumulate information by Twitter search, analyze morphological elements, generate sentences with Markov chains, and tweet.

Since .has_key cannot be used in Python3,

if markov.has_key(w):

Is the process

if w in markov:

It is an image to rewrite.

Preparing to use MeCab with Python 3

$ brew install mecab
$ brew install mecab-ipadic
$ pip install mecab-python3

I installed it in this way.

code

As usual, I will go get the RSS of the DMM18 prohibited video.

Simple RSS reader made with Django We are doing almost the same thing as above, so please refer to it.

Get and save the title and description.

views.py


import feedparser
import MeCab
import random
import re
import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

from django.http import HttpResponse
from django.shortcuts import (render, redirect,)

def index(request):
    url = 'http://www.dmm.co.jp/digital/videoa/-/list/rss/=/sort=date/'
    feeder = feedparser.parse(url)


    for entry in feeder['entries']:
        lists = entry['description'] + entry['title']
        f = open('text.txt', 'w')
        f.write(lists)
        f.flush()
        f.close()

    f = open('text.txt', 'r')
    mecab_read = f.read()
    f.close()

    tagger = MeCab.Tagger('-Owakati')
    wordlist = tagger.parse(mecab_read)
    wordlist = wordlist.rstrip(' \n').split(' ')

    f = open('l.txt', 'w')
    f.write(str(wordlist))
    f.close()

    markov = {}
    w = ''

    for x in wordlist:
        if w:
            if w in markov:
                new_list = markov[w]
            else:
                new_list =[]

            new_list.append(x)
            markov[w] = new_list
        w = x

    choice_words = wordlist[0]
    sentence = ''
    count = 0

    while count < 20:
        choice_words = random.choice(wordlist)
        sentence += choice_words
        count += 1
        sentence = sentence.split(' ', 1)[0]
        p = re.compile('[!-/:-@[-`{-~]')
        sus = p.sub('', sentence)

    context = {
        'wordlist': wordlist,
        'sus': sus,
        }

    return render(request,'index.html',context)

index.html


{% extends "base.html" %}
{% block body %}
  <div class="container">
    <div class="row">

      <div class="col-md-12">
        <p class="1">{{ wordlist }}</p>
        <p class="2">{{ sus }}</p>
      </div>

    </div>
  </div>

{% endblock %}

result

スクリーンショット 2016-11-21 13.41.16.png

I ended up with such a fucking result. There is a lot of money in the description, so you have to remove it ...

I also learned that if you don't cut the actress name, you won't understand it.

I will devote myself.

What I originally wanted to do

I wanted to make an AV title with a Markov chain.

Recommended Posts

Try morphological analysis and Markov chains with Django (Ari with a lot of room for improvement)
Get a large amount of Starbucks Twitter data with python and try data analysis Part 1
Perform a Twitter search from Python and try to generate sentences with Markov chains.
(For beginners) Try creating a simple web API with Django
[DynamoDB] [Docker] Build a development environment for DynamoDB and Django with docker-compose
A collection of tips for speeding up learning and reasoning with PyTorch
Connect a lot of Python or and and
Build a data analysis environment that links GitHub authentication and Django with JupyterHub
Web application that analyzes morphological elements and generates sentences with Markov chains [bottle]
Try creating a web application with Vue.js and Django (Mac)-(1) Environment construction, application creation
Practice of creating a data analysis platform with BigQuery and Cloud DataFlow (data processing)
Let's try analysis! Chapter 8: Analysis environment for Windows created with Python and Eclipse (PyDev)
Create a dashboard for Network devices with Django!
I tried morphological analysis and vectorization of words
Automatically generate introductory text for AV works with DMM API, MeCab, and Markov chains
Volume of creating and publishing django-malice, a django application for causing HTTP 40X errors