[Natural language processing] Preprocessing with Japanese

I would like to summarize some Japanese preprocessing that has natural language processing. (Scheduled to be updated at any time)

Full-width-> half-width

>>> import unicodedata
>>> 
>>> text =u'1994'
>>> print unicodedata.normalize(‘NFKC’, text)
1994

Cloud = proper noun? ??

I think most people parse Japanese with mecab.

And I think that there are many people who use neologd as a dictionary, but there is one I found using this dictionary.

$ mecab -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd
cloud
Cloud noun,Proper noun,General,*,*,*,cloud~,Spider spider,Spider spider
EOS

Spider Koyakuso Kunobasho ...? When I looked it up, it was an anime movie directed by Makoto Shinkai.

Recommended Posts

[Natural language processing] Preprocessing with Japanese

3. Natural language processing with Python 2-1. Co-occurrence network

[WIP] Pre-processing memo in natural language processing

3. Natural language processing with Python 1-1. Word N-gram

I tried natural language processing with transformers.

Python: Natural language processing

RNN_LSTM2 Natural language processing

3. Natural language processing with Python 2-2. Co-occurrence network [mecab-ipadic-NEologd]

[Python] I played with natural language processing ~ transformers ~

Let's enjoy natural language processing with COTOHA API

100 Language Processing with Python Knock 2015

Natural language processing 1 Morphological analysis

Natural language processing 3 Word continuity

Natural language processing 2 Word similarity

3. Natural language processing with Python 4-1. Analysis for words with KWIC

Performance verification of data preprocessing in natural language processing

Building an environment for natural language processing with Python

Overview of natural language processing and its data preprocessing

3. Natural language processing with Python 5-1. Concept of sentiment analysis [AFINN-111]

100 natural language processing knocks Chapter 4 Commentary

100 Language Processing Knock with Python (Chapter 1)

Quick batch text formatting + preprocessing for Aozora Bunko data for natural language processing with Python

Natural Language: Word2Vec Part1 --Japanese Corpus

100 Language Processing Knock with Python (Chapter 3)

Artificial language Lojban and natural language processing (artificial language processing)

■ [Google Colaboratory] Preprocessing of Natural Language Processing & Morphological Analysis (janome)

Preparing to start natural language processing

Natural language processing analyzer installation summary

3. Natural language processing with Python 5-2. Emotion intensity analysis tool VADER

3. Natural language processing with Python 5-5. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (words)]

3. Natural language processing with Python 5-3. Emotion value analysis of Japanese sentences [Word emotion polarity value correspondence table]

Japanese Natural Language Processing Using Python3 (4) Sentiment Analysis by Logistic Regression

Easily build a natural language processing model with BERT + LightGBM + optuna

Dockerfile with the necessary libraries for natural language processing in python

Summarize how to preprocess text (natural language processing) with tf.data.Dataset api

Natural Language Processing Case Study: Word Frequency in'Anne with an E'

100 Language Processing Knock 2020 with GiNZA v3.1 Chapter 4

Natural Language: GPT --Japanese Generative Pretraining Transformer

Natural language processing of Yu-Gi-Oh! Card name-Yu-Gi-Oh!

100 Knocking Natural Language Processing Chapter 1 (Preparatory Movement)

100 Language Processing Knock with Python (Chapter 2, Part 2)

100 Language Processing Knock with Python (Chapter 2, Part 1)

Natural Language: BERT Part1 --Japanese Wikipedia Corpus

Convenient goods memo around natural language processing

100 Language Processing Knock-88: 10 Words with High Similarity

3. Natural language processing with Python 5-4. Emotion value analysis of Japanese sentences [Japanese evaluation polarity dictionary (noun edition)]

100 language processing knocks 03 ~ 05

100 language processing knocks (2020): 40

100 language processing knocks (2020): 35

100 language processing knocks (2020): 47

100 language processing knocks (2020): 39

100 language processing knocks (2020): 22

100 language processing knocks (2020): 26

100 language processing knocks (2020): 34

100 Language Processing Knock (2020): 28

100 language processing knocks (2020): 42

100 language processing knocks (2020): 29

100 language processing knocks (2020): 49

100 language processing knocks 06 ~ 09

100 language processing knocks (2020): 43

100 language processing knocks (2020): 24