Can BERT tell the difference between "candy (candy)" and "candy (rain)"?

background

I've used Word2Vec before, but I don't think it's a context-sensitive expression.

"Tomorrow's weather forecast is candy." "I bought candy at a candy store."

When there are two sentences like the above Although the meanings of "rain" and "candy" are different, they are the same word, so they recognize the same thing.

Meanwhile, by saying that BERT can express in consideration of the context, I wanted to check if I think the above "candy" has a different meaning.

plan

Target data

Check using the following corpus by referring to the ELMo implementation in the article here.

BERT model

BERT uses huggingface / transformers to get distributed representations.

Implementation and results

The mounting was carried out in the following steps.

  1. Word split
  2. Number the words
  3. Convert to model input format (tensorized)
  4. Model preparation and input
  5. Calculate the similarity between outputs (rain and rain, rain and candy, candy and candy)
import torch
import numpy as np
from transformers import BertJapaneseTokenizer, BertForMaskedLM

tokenizer = BertJapaneseTokenizer.from_pretrained('bert-base-japanese-whole-word-masking')

def tokenize(text):
    return tokenizer.tokenize(text)

def word_to_index(text):
    return tokenizer.convert_tokens_to_ids(text)

def to_tensor(tokens):
    return torch.tensor([tokens])


def cos_sim(vec1, vec2):
    x = vec1.detach().numpy()
    y = vec2.detach().numpy()

    x_l2_norm = np.linalg.norm(x, ord=2)
    y_l2_norm = np.linalg.norm(y, ord=2)
    xy = np.dot(x,y)

    return xy / (x_l2_norm * y_l2_norm)


if __name__ == "__main__":
    d_rainy_01 = "[CLS]The weather forecast for tomorrow is candy.[SEP]"
    d_rainy_02 = "[CLS]I didn't go for a dog walk because it was candy this morning.[SEP]"
    d_rainy_03 = "[CLS]It's rainy season, so it's raining every day.[SEP]"
    d_candy_01 = "[CLS]I bought candy at a candy store.[SEP]"
    d_candy_02 = "[CLS]Work while licking candy.[SEP]"
    d_candy_03 = "[CLS]I'm not good at sour candy.[SEP]"

    # 1.Word split
    tokenize_rainy_01 = tokenize(d_rainy_01)
    tokenize_rainy_02 = tokenize(d_rainy_02)
    tokenize_rainy_03 = tokenize(d_rainy_03)
    tokenize_candy_01 = tokenize(d_candy_01)
    tokenize_candy_02 = tokenize(d_candy_02)
    tokenize_candy_03 = tokenize(d_candy_03)

    # 2.Number words
    indexes_rainy_01 = to_vocabulary(tokenize_rainy_01)
    indexes_rainy_02 = to_vocabulary(tokenize_rainy_02)
    indexes_rainy_03 = to_vocabulary(tokenize_rainy_03)
    indexes_candy_01 = to_vocabulary(tokenize_candy_01)
    indexes_candy_02 = to_vocabulary(tokenize_candy_02)
    indexes_candy_03 = to_vocabulary(tokenize_candy_03)

    # 3.Convert to model input format(Tensorization)
    tensor_rainy_01 = to_tensor(indexes_rainy_01)
    tensor_rainy_02 = to_tensor(indexes_rainy_02)
    tensor_rainy_03 = to_tensor(indexes_rainy_03)
    tensor_candy_01 = to_tensor(indexes_candy_01)
    tensor_candy_02 = to_tensor(indexes_candy_02)
    tensor_candy_03 = to_tensor(indexes_candy_03)

    # 4.Model preparation and input
    bert = BertForMaskedLM.from_pretrained('bert-base-japanese-whole-word-masking')
    bert.eval()

    index_rainy_01 = tokenize_rainy_01.index('Rain')
    index_rainy_02 = tokenize_rainy_02.index('Rain')
    index_rainy_03 = tokenize_rainy_03.index('Rain')
    index_candy_01 = tokenize_candy_01.index('Rain')
    index_candy_02 = tokenize_candy_02.index('Rain')
    index_candy_03 = tokenize_candy_03.index('Rain')
    vec_rainy_01 = bert(tensor_rainy_01)[0][0][index_rainy_01]
    vec_rainy_02 = bert(tensor_rainy_02)[0][0][index_rainy_02]
    vec_rainy_03 = bert(tensor_rainy_03)[0][0][index_rainy_03]
    vec_candy_01 = bert(tensor_candy_01)[0][0][index_candy_01]
    vec_candy_02 = bert(tensor_candy_02)[0][0][index_candy_02]
    vec_candy_03 = bert(tensor_candy_03)[0][0][index_candy_03]

    # 5.Between outputs(Rain and rain, rain and candy, candy and candy)Calculate the similarity of
    print("rain_01 and rain_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_rainy_02)))
    print("rain_01 and rain_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_rainy_03)))
    print("rain_02 and rain_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_rainy_03)))
    print("-"*30)


    print("rain_01 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_01)))
    print("rain_01 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_02)))
    print("rain_01 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_03)))
    print("-"*30)

    print("rain_02 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_01)))
    print("rain_02 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_02)))
    print("rain_02 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_03)))
    print("-"*30)

    print("rain_03 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_01)))
    print("rain_03 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_02)))
    print("rain_03 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_03)))
    print("-"*30)

    print("candy_01 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_01, vec_candy_02)))
    print("candy_01 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_01, vec_candy_03)))
    print("candy_02 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_02, vec_candy_03)))

To summarize the results,

rainy_01 rainy_02 rainy_03 candy_01 candy_02 candy_03
rainy_01 * 0.79 0.88 0.83 0.83 0.83
rainy_02 * * 0.79 0.77 0.75 0.77
rainy_03 * * * 0.87 0.89 0.84
candy_01 * * * * 0.93 0.90
candy_02 * * * * * 0.90
candy_03 * * * * * *

For the time being, the meanings of rain and candy are different (?) But why didn't the value deviate from expectations?

bonus

NICT released a Pre-trained model in March 2020, so I compared it with bert-base-japanese-whole-word-masking. The results of using the NICT model to obtain similarities in the same process are as follows.

rainy_01 rainy_02 rainy_03 candy_01 candy_02 candy_03
rainy_01 * 0.83 0.82 0.86 0.82 0.85
rainy_02 * * 0.88 0.87 0.79 0.84
rainy_03 * * * 0.84 0.80 0.86
candy_01 * * * * 0.82 0.85
candy_02 * * * * * 0.81
candy_03 * * * * * *
bert-base-japanese-whole-word-masking NICT
Same meaning 0.865 0.835
Different meanings 0.820 0.837

in conclusion

The result was not what I expected ... I don't know if this is the way to go, so please let me know!

reference

Recommended Posts

Can BERT tell the difference between "candy (candy)" and "candy (rain)"?
About the difference between "==" and "is" in python
About the difference between PostgreSQL su and sudo
What is the difference between Unix and Linux?
Consideration of the difference between ROC curve and PR curve
The rough difference between Unicode and UTF-8 (and their friends)
What is the difference between usleep, nanosleep and clock_nanosleep?
How to use argparse and the difference between optparse
Difference between "categorical_crossentropy" and "sparse_categorical_crossentropy"
Difference between regression and classification
Difference between np.array and np.arange
Difference between MicroPython and CPython
Difference between ps a and ps -a
Difference between return and print-Python
What is the difference between a symbolic link and a hard link?
Understand the difference between cumulative assignment to variables and cumulative assignment to objects
The difference between foreground and background processes understood by the principle
Difference between java and python (memo)
Difference between list () and [] in Python
Difference between SQLAlchemy filter () and filter_by ()
Difference between == and is in python
Memorandum (difference between csv.reader and csv.dictreader)
(Note) Difference between gateway and default gateway
Difference between Numpy randint and Random randint
Difference between sort and sorted (memorial)
Difference between python2 series and python3 series dict.keys ()
I investigated the behavior of the difference between hard links and symbolic links
[Python] Difference between function and method
Difference between SQLAlchemy flush () and commit ()
Python --Difference between exec and eval
[Python] Difference between randrange () and randint ()
[Python] Difference between sorted and sorted (Colaboratory)
[Xg boost] Difference between softmax and softprob
difference between statements (statements) and expressions (expressions) in Python
Difference between PHP and Python finally and exit
Difference between @classmethod and @staticmethod in Python
Difference between append and + = in Python list
Difference between nonlocal and global in Python
Difference between linear regression, Ridge regression and Lasso regression
[Python] Difference between class method and static method
Difference between docker-compose env_file and .env file
The subtle relationship between Gentoo and pip
About the relationship between Git and GitHub
[Python Iroha] Difference between List and Tuple
[python] Difference between rand and randn output
speed difference between wsgi, Bottle and Flask
Difference between numpy.ndarray and list (dimension, size)
Difference between ls -l and cat command
Difference and compatibility verification between keras and tf.keras # 1
Summary of the differences between PHP and Python
The answer of "1/2" is different between python2 and 3
Difference between using and import on shield language
[python] Difference between variables and self. Variables in class
Bayesian modeling-estimation of the difference between the two groups-
I had Spotify analyze the difference between the Beatles song and my own song and plotted it
[Python] Explain the difference between strftime and strptime in the datetime module with an example
[Introduction to Infectious Disease Models] What is the difference between the April epidemic and this epidemic? .. .. ‼