I heard that various tools for natural language processing can be tried with python, so I played around with them. I can't see the algorithm at all, but it's amazing that you can do something interesting in just a few lines.
Google ColabNotebook
First, install transformers and define what you need.
pip install transformers
lang.py
import torch
from transformers import pipeline
sentiment_analysis = pipeline('sentiment-analysis')
question_answering = pipeline('question-answering')
fill_mask = pipeline("fill-mask")
feature_extraction = pipeline("feature-extraction")
This time I played with the above four. Let's look at each below.
sentiment-analysis It outputs the positive / negative degree of the input sentence.
lang.py
sentiment_analysis("Because of the pandemic, I decided to refrain from going out.")
# => [{'label': 'NEGATIVE', 'score': 0.9692758917808533}]
It is expected to be negative with a great probability.
question-answering If you give a question and a situational explanation (there is a word to answer), the answer will be returned.
lang.py
question_answering({
'question': 'What is the cause of the pandemic?',
'context' : 'The coronavirus triggered an outbreak, and society was thrown into chaos.'
})
# => {'answer': 'coronavirus', 'end': 15, 'score': 0.6689822122523879, 'start': 4}
You have the correct answer. (However, when I tried various other things, it sometimes returned an error, so it is easy for the algorithm to understand? It seems that it will not work unless it is a sentence.)
fill-mask
If you give a sentence with
lang.py
fill_mask("I have to be in bed all day today because I get <mask>.")
'''
=> [{'score': 0.2714517414569855,
'sequence': '<s> I have to be in bed all day today because I get tired.</s>',
'token': 7428},
{'score': 0.19346608221530914,
'sequence': '<s> I have to be in bed all day today because I get sick.</s>',
'token': 4736},
{'score': 0.07417058944702148,
'sequence': '<s> I have to be in bed all day today because I get headaches.</s>',
'token': 20816},
{'score': 0.05399525910615921,
'sequence': '<s> I have to be in bed all day today because I get insomnia.</s>',
'token': 37197},
{'score': 0.05070624500513077,
'sequence': '<s> I have to be in bed all day today because I get sleepy.</s>',
'token': 33782}]
'''
Everything looks good. (I'm sorry for all the example sentences that seem to be depressing.)
feature-extraction It returns a vector that represents the characteristics of the entered sentence. Unlike the above three, it is a return value of only numerical values, but I thought that it would be easy to handle sentences with my own model if I used this. (I want to do something someday)
lang.py
array = feature_extraction("I catch a cold.")
import numpy as np
np.array(array).shape
# => (1, 7, 768)
array[0][0][:10]
'''
=> [0.3683673143386841,
0.008590285666286945,
0.04184938594698906,
-0.08078824728727341,
-0.20844608545303345,
-0.03908906877040863,
0.19680079817771912,
-0.12569604814052582,
0.010193285532295704,
-1.1207540035247803]
'''
It returned a list type with the above dimensions and values. Even so, using so much data to understand the single sentence "I caught a cold". ..
One more thing below.
lang.py
array = feature_extraction("I catch a cold and I am sleepy.")
import numpy as np
np.array(array).shape
# => (1, 11, 768)
array[0][0][:10]
'''
=> [0.3068505525588989,
0.026863660663366318,
0.17733855545520782,
0.03574731573462486,
-0.12478257715702057,
-0.22214828431606293,
0.2502932548522949,
-0.17025449872016907,
-0.09574677795171738,
-0.9091089963912964]
'''
The second dimension has changed. The last dimension, 768, doesn't seem to change.
Recommended Posts