■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)

Read Data by "with open" method

Try reading ** Ryunosuke Akutagawa's "nose" ** from Aozora Bunko The character code of the file is ** shift_jis **

#Reading and writing text files in Python (input / output)
with open('/hana.txt', mode='r', encoding='shift_jis') as f: 
  nose_hana = f.read()

print(nose_hana)

Preprocessing of "HANA"

#Data preprocessing
import re
import pickle

nose = re.sub('《[^》]+》', '', nose_hana)    #Delete ruby
nose = re.sub('[|―  「」\n]', '', nose)      # |-And double-byte space, "" and line break deletion
nose = re.sub('[ ]', '', nose)                #Delete half-width space
nose = re.sub('[\u3000]', '', nose)           #\u3000 deleted

sentense_end = '。'

nose_list = nose.split(sentense_end)
nose_list.pop()
nose_list = [x+sentense_end for x in nose_list]

print(nose_list)

3. WAKATI "separate writing"

from janome import tokenizer

s = Tokenizer()

t = nose_list

for _ in nose_list:
  print(s.tokenize(_, wakati=True))

Analysis of results of "WAKATI"

#You can count the frequency of appearance in collections
import collections

s = Tokenizer() #Instantiation
words = []
for _ in nose_list:
  words += s.tokenize(_, wakati=True)

c = collections.Counter(words)
print(c)

Reference

Installation of morphological analysis tool (janome)

Recommended Posts

■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)

Natural language processing 1 Morphological analysis

■ [Google Colaboratory] Use morphological analysis (janome)

100 natural language processing knocks Chapter 4 Morphological analysis (first half)

100 natural language processing knocks Chapter 4 Morphological analysis (second half)

Performance verification of data preprocessing in natural language processing

Easy learning of 100 language processing knock 2020 with "Google Colaboratory"

Overview of natural language processing and its data preprocessing

3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]

Types of preprocessing in natural language processing and their power

100 language processing knocks 2020: Chapter 4 (morphological analysis)

100 Language Processing Knock 2020 Chapter 4: Morphological Analysis

[Natural language processing] Preprocessing with Japanese

100 Language Processing Knock Chapter 4: Morphological Analysis

■ [Google Colaboratory] Use morphological analysis (MeCab)

100 Language Processing Knock-59: Analysis of S-expressions

100 language processing knock 2020 "for Google Colaboratory"

[Language processing 100 knocks 2020] Chapter 4: Morphological analysis

100 Language Processing Knock 2015 Chapter 4 Morphological Analysis (30-39)

Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!

100 language processing knocks Chapter 4: Morphological analysis 31. Verbs

[WIP] Pre-processing memo in natural language processing

100 language processing knocks Morphological analysis learned in Chapter 4

Unbearable shortness of Attention in natural language processing

Python: Natural language processing

RNN_LSTM2 Natural language processing

3. Natural language processing with Python 4-1. Analysis for words with KWIC

100 language processing knock-30 (using pandas): reading morphological analysis results

100 natural language processing knocks Chapter 5 Dependency analysis (second half)

100 natural language processing knocks Chapter 5 Dependency analysis (first half)

[For beginners] Language analysis using the natural language processing tool "GiNZA" (from morphological analysis to vectorization)

Japanese morphological analysis using Janome

100 Language Processing Knock-57: Dependency Analysis

3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]

Natural language processing 3 Word continuity

100 language processing knock-56: co-reference analysis

Natural language processing 2 Word similarity

3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER

3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]

Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression

Why is distributed representation of words important for natural language processing?

[Word2vec] Let's visualize the result of natural language processing of company reviews

Study natural language processing with Kikagaku

100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)

100 natural language processing knocks Chapter 4 Commentary

Natural language processing for busy people

[Language processing 100 knocks 2020] Chapter 5: Dependency analysis

Artificial language Lojban and natural language processing (artificial language processing)

100 Language Processing Knock 2020 Chapter 5: Dependency Analysis

Japanese analysis processing using Janome part1

Time series analysis 3 Preprocessing of time series data

Preparing to start natural language processing

Natural language processing analyzer installation summary

Summary of multi-process processing of script language

I tried to display the analysis result of the natural language processing library GiNZA in an easy-to-understand manner

Easy padding of data that can be used in natural language processing

Learn the basics of document classification by natural language processing, topic model

Answers and impressions of 100 language processing knocks-Part 1

100 Language Processing Knock-91: Preparation of Analogy Data

100 Language Processing Knock-44: Visualization of Dependent Tree

Answers and impressions of 100 language processing knocks-Part 2