Regarding transformer

Regarding transformer (rough summary)

・ What is a transformer? Recently I was studying natural language processing and learned about transform, so I summarized it a little. I am studying, so I would appreciate it if you could point out any mistakes. A transformer is a device that inputs a character string in the transformer, converts it, and outputs another character string. For example When I enter the string "I am John Cena." In the translator, it feels like "I am John Cena."

Regarding morphological analysis The string first needs to be broken down into words. In English, the sentence is divided into words like "I am John Cena.", So there is no need to divide it into words, but in Japanese, the sentence is like "I am John Cena." Not separated by words. Therefore, morphological analysis is performed to decompose it into word units. To put it simply, morphological analysis is the work of dividing a sentence into words and determining the part of speech of each morpheme. As a concrete example, when morphological analysis of "I am John Cena" is performed, "" I "," is "," John Cena "," is ",". It becomes "".

How to handle words? ??
After breaking down a sentence into words, how the words are treated is converted into numbers. For example, this = [0.2, 0.4, 0.5], is = [0.1, 0.7, 0.35]. What these characteristics represent is the characteristics of each word. These [0.2, 0.4, 0.5] and [0.1, 0.7, 0.35] are called word vectors.

How are words converted to word vectors ?? Simply put, all the natural languages you want to analyze are morphologically analyzed and the words that appear are collected. Then vectorize the words like one-hot encoding.
For example, if the only word that appears in the sentence you want to analyze this time is "I am John Cena." I = [ 1 , 0 , 0 , 0 ] am = [ 0 , 1 , 0 , 0 ] John Cena = [ 0 , 0 , 1 , 0 ] . = [ 0 , 0 , 0 , 1 ]
It can be converted to a one-hot vector like this. Change this word vector with an encoder. As a result, the word can be changed into a feature quantity. The vector obtained by changing this one-hot vector with an encoder is called an embedding vector.

As an example I = [1, 0, 0, 0] ⇒ Encoder ⇒ $ x_1 $ = [0.3, -0.3, 0.6, 2.2]

This $ x_1 $ is the embedding vector

The idea like this is detailed here (I used it as a reference) https://ishitonton.hatenablog.com/entry/2018/11/25/200332

How to handle character strings? ?? Until now, it was possible to convert words to vectors, but how is the character string converted? X = "I am John Cena." ⇒ ["I", "am", "John Cena" ",". "](Divided into words) ⇒ $ x_1 $, $ x_2 $, $ x_3 $, $ x_4 $ (Convert word to vector $ x_n $ is embedding vector) ⇒ X = [$ x_1 $, $ x_2 $, $ x_3 $, $ x_4 $] It is converted into a matrix in this way.

Regarding Transformers

A transformer returns a certain character string when you input a character string. As for the contents, it consists of many encoders and decoders as shown in the above figure. The entered character string first enters the encoder. The contents of the encoder are shown below.

This self attention looks at the relationship between words in the input string. Also, the strong relationship between words looks at the similarity of each word vector. Therefore, in order to check the similarity, the inner product of the matrix should be checked. Then, the conversion is performed using a general neural network.

The decoder then uses the input from the encoder to predict the next word.

This E-D-attention looks at the relationship between input and output.

A rough transformer looks like this. If you want to know properly https://qiita.com/omiita/items/07e69aef6c156d23c538

I almost used this as a reference! Insanely easy to understand! https://www.youtube.com/watch?v=BcNZRiO0_AE

Recommended Posts
Regarding transformer

Regarding Pyston 0.3

[Note] Regarding Tensorflow

Regarding VirusTotal API