I've used Word2Vec before, but I don't think it's a context-sensitive expression.
"Tomorrow's weather forecast is candy." "I bought candy at a candy store."
When there are two sentences like the above Although the meanings of "rain" and "candy" are different, they are the same word, so they recognize the same thing.
Meanwhile, by saying that BERT can express in consideration of the context, I wanted to check if I think the above "candy" has a different meaning.
Check using the following corpus by referring to the ELMo implementation in the article here.
BERT uses huggingface / transformers to get distributed representations.
The mounting was carried out in the following steps.
import torch
import numpy as np
from transformers import BertJapaneseTokenizer, BertForMaskedLM
tokenizer = BertJapaneseTokenizer.from_pretrained('bert-base-japanese-whole-word-masking')
def tokenize(text):
return tokenizer.tokenize(text)
def word_to_index(text):
return tokenizer.convert_tokens_to_ids(text)
def to_tensor(tokens):
return torch.tensor([tokens])
def cos_sim(vec1, vec2):
x = vec1.detach().numpy()
y = vec2.detach().numpy()
x_l2_norm = np.linalg.norm(x, ord=2)
y_l2_norm = np.linalg.norm(y, ord=2)
xy = np.dot(x,y)
return xy / (x_l2_norm * y_l2_norm)
if __name__ == "__main__":
d_rainy_01 = "[CLS]The weather forecast for tomorrow is candy.[SEP]"
d_rainy_02 = "[CLS]I didn't go for a dog walk because it was candy this morning.[SEP]"
d_rainy_03 = "[CLS]It's rainy season, so it's raining every day.[SEP]"
d_candy_01 = "[CLS]I bought candy at a candy store.[SEP]"
d_candy_02 = "[CLS]Work while licking candy.[SEP]"
d_candy_03 = "[CLS]I'm not good at sour candy.[SEP]"
# 1.Word split
tokenize_rainy_01 = tokenize(d_rainy_01)
tokenize_rainy_02 = tokenize(d_rainy_02)
tokenize_rainy_03 = tokenize(d_rainy_03)
tokenize_candy_01 = tokenize(d_candy_01)
tokenize_candy_02 = tokenize(d_candy_02)
tokenize_candy_03 = tokenize(d_candy_03)
# 2.Number words
indexes_rainy_01 = to_vocabulary(tokenize_rainy_01)
indexes_rainy_02 = to_vocabulary(tokenize_rainy_02)
indexes_rainy_03 = to_vocabulary(tokenize_rainy_03)
indexes_candy_01 = to_vocabulary(tokenize_candy_01)
indexes_candy_02 = to_vocabulary(tokenize_candy_02)
indexes_candy_03 = to_vocabulary(tokenize_candy_03)
# 3.Convert to model input format(Tensorization)
tensor_rainy_01 = to_tensor(indexes_rainy_01)
tensor_rainy_02 = to_tensor(indexes_rainy_02)
tensor_rainy_03 = to_tensor(indexes_rainy_03)
tensor_candy_01 = to_tensor(indexes_candy_01)
tensor_candy_02 = to_tensor(indexes_candy_02)
tensor_candy_03 = to_tensor(indexes_candy_03)
# 4.Model preparation and input
bert = BertForMaskedLM.from_pretrained('bert-base-japanese-whole-word-masking')
bert.eval()
index_rainy_01 = tokenize_rainy_01.index('Rain')
index_rainy_02 = tokenize_rainy_02.index('Rain')
index_rainy_03 = tokenize_rainy_03.index('Rain')
index_candy_01 = tokenize_candy_01.index('Rain')
index_candy_02 = tokenize_candy_02.index('Rain')
index_candy_03 = tokenize_candy_03.index('Rain')
vec_rainy_01 = bert(tensor_rainy_01)[0][0][index_rainy_01]
vec_rainy_02 = bert(tensor_rainy_02)[0][0][index_rainy_02]
vec_rainy_03 = bert(tensor_rainy_03)[0][0][index_rainy_03]
vec_candy_01 = bert(tensor_candy_01)[0][0][index_candy_01]
vec_candy_02 = bert(tensor_candy_02)[0][0][index_candy_02]
vec_candy_03 = bert(tensor_candy_03)[0][0][index_candy_03]
# 5.Between outputs(Rain and rain, rain and candy, candy and candy)Calculate the similarity of
print("rain_01 and rain_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_rainy_02)))
print("rain_01 and rain_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_rainy_03)))
print("rain_02 and rain_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_rainy_03)))
print("-"*30)
print("rain_01 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_01)))
print("rain_01 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_02)))
print("rain_01 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_01, vec_candy_03)))
print("-"*30)
print("rain_02 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_01)))
print("rain_02 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_02)))
print("rain_02 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_02, vec_candy_03)))
print("-"*30)
print("rain_03 and candy_01 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_01)))
print("rain_03 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_02)))
print("rain_03 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_rainy_03, vec_candy_03)))
print("-"*30)
print("candy_01 and candy_02 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_01, vec_candy_02)))
print("candy_01 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_01, vec_candy_03)))
print("candy_02 and candy_03 :Cos similarity of "candy": {:.2f}".format(cos_sim(vec_candy_02, vec_candy_03)))
To summarize the results,
rainy_01 | rainy_02 | rainy_03 | candy_01 | candy_02 | candy_03 | |
---|---|---|---|---|---|---|
rainy_01 | * | 0.79 | 0.88 | 0.83 | 0.83 | 0.83 |
rainy_02 | * | * | 0.79 | 0.77 | 0.75 | 0.77 |
rainy_03 | * | * | * | 0.87 | 0.89 | 0.84 |
candy_01 | * | * | * | * | 0.93 | 0.90 |
candy_02 | * | * | * | * | * | 0.90 |
candy_03 | * | * | * | * | * | * |
For the time being, the meanings of rain and candy are different (?) But why didn't the value deviate from expectations?
NICT released a Pre-trained model in March 2020, so I compared it with bert-base-japanese-whole-word-masking. The results of using the NICT model to obtain similarities in the same process are as follows.
rainy_01 | rainy_02 | rainy_03 | candy_01 | candy_02 | candy_03 | |
---|---|---|---|---|---|---|
rainy_01 | * | 0.83 | 0.82 | 0.86 | 0.82 | 0.85 |
rainy_02 | * | * | 0.88 | 0.87 | 0.79 | 0.84 |
rainy_03 | * | * | * | 0.84 | 0.80 | 0.86 |
candy_01 | * | * | * | * | 0.82 | 0.85 |
candy_02 | * | * | * | * | * | 0.81 |
candy_03 | * | * | * | * | * | * |
bert-base-japanese-whole-word-masking | NICT | |
---|---|---|
Same meaning | 0.865 | 0.835 |
Different meanings | 0.820 | 0.837 |
The result was not what I expected ... I don't know if this is the way to go, so please let me know!