I heard that various tools for natural language processing can be tried with python, so I played around with them. I can't see the algorithm at all, but it's amazing that you can do something interesting in just a few lines.

Execution environment

Google ColabNotebook

What I tried

First, install transformers and define what you need.

The source of transformers is here

pip install transformers

`lang.py`


import torch
from transformers import pipeline
sentiment_analysis = pipeline('sentiment-analysis')
question_answering = pipeline('question-answering')
fill_mask = pipeline("fill-mask")
feature_extraction = pipeline("feature-extraction")

This time I played with the above four. Let's look at each below.

sentiment-analysis It outputs the positive / negative degree of the input sentence.

`lang.py`


sentiment_analysis("Because of the pandemic, I decided to refrain from going out.")
# => [{'label': 'NEGATIVE', 'score': 0.9692758917808533}]

It is expected to be negative with a great probability.

question-answering If you give a question and a situational explanation (there is a word to answer), the answer will be returned.

`lang.py`


question_answering({
    'question': 'What is the cause of the pandemic?',
    'context' : 'The coronavirus triggered an outbreak, and society was thrown into chaos.'
})
# => {'answer': 'coronavirus', 'end': 15, 'score': 0.6689822122523879, 'start': 4}

You have the correct answer. (However, when I tried various other things, it sometimes returned an error, so it is easy for the algorithm to understand? It seems that it will not work unless it is a sentence.)

fill-mask If you give a sentence with in one place, it will return a word that seems to be applicable in the blank.

`lang.py`


fill_mask("I have to be in bed all day today because I get <mask>.")
'''
 => [{'score': 0.2714517414569855,
  'sequence': '<s> I have to be in bed all day today because I get tired.</s>',
  'token': 7428},
 {'score': 0.19346608221530914,
  'sequence': '<s> I have to be in bed all day today because I get sick.</s>',
  'token': 4736},
 {'score': 0.07417058944702148,
  'sequence': '<s> I have to be in bed all day today because I get headaches.</s>',
  'token': 20816},
 {'score': 0.05399525910615921,
  'sequence': '<s> I have to be in bed all day today because I get insomnia.</s>',
  'token': 37197},
 {'score': 0.05070624500513077,
  'sequence': '<s> I have to be in bed all day today because I get sleepy.</s>',
  'token': 33782}]
'''

Everything looks good. (I'm sorry for all the example sentences that seem to be depressing.)

feature-extraction It returns a vector that represents the characteristics of the entered sentence. Unlike the above three, it is a return value of only numerical values, but I thought that it would be easy to handle sentences with my own model if I used this. (I want to do something someday)

`lang.py`


array = feature_extraction("I catch a cold.")

import numpy as np
np.array(array).shape
# => (1, 7, 768)

array[0][0][:10]
'''
 => [0.3683673143386841,
 0.008590285666286945,
 0.04184938594698906,
 -0.08078824728727341,
 -0.20844608545303345,
 -0.03908906877040863,
 0.19680079817771912,
 -0.12569604814052582,
 0.010193285532295704,
 -1.1207540035247803]
'''

It returned a list type with the above dimensions and values. Even so, using so much data to understand the single sentence "I caught a cold". ..

One more thing below.

`lang.py`


array = feature_extraction("I catch a cold　and I am sleepy.")

import numpy as np
np.array(array).shape
# => (1, 11, 768)

array[0][0][:10]
'''
 => [0.3068505525588989,
 0.026863660663366318,
 0.17733855545520782,
 0.03574731573462486,
 -0.12478257715702057,
 -0.22214828431606293,
 0.2502932548522949,
 -0.17025449872016907,
 -0.09574677795171738,
 -0.9091089963912964]
'''

The second dimension has changed. The last dimension, 768, doesn't seem to change.

[Python] I played with natural language processing ~ transformers ~

Execution environment

What I tried

lang.py

lang.py

lang.py

lang.py

lang.py

lang.py

`lang.py`

`lang.py`

`lang.py`

`lang.py`

`lang.py`

`lang.py`