Aidemy 2020/10/30
Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This time, it will be the first post of topic extraction of Japanese text. Nice to meet you.
What to learn this time ・ Deep learning in natural language processing ・ Embedding, RNN, LSTM, Softmax, etc.
-Deep learning used in natural language processing includes __machine translation, automatic summarization, and automatic answering to questions __. ・ In such natural language processing, using a neural network model has the advantage that the word vector (Embedding) can be learned by the __error back propagation method __ while considering the context and holding down the dimensions. is there.
Embedding -Embedding means __ "embedded" __. This is the first process to be performed when constructing a neural network that handles words. Specifically, it is a process of embedding the symbol __word in the d (about 100 to 300) dimension __. -Embedding is done like __ "model.add (Embedding (argument))" __. The arguments are as follows. -Input_dim: Number of vocabulary (word type) -Output_dim: The magnitude of the dimension d of the word vector -Input_length: Length of each sentence
RNN -RNN is a "recurrent neural network", which is a neural network often used in deep learning of natural language processing. __ Excellent for handling input strings (variable length series) with arbitrary length __.
LSTM -LSTM is a type of RNN and has a function to make up for the shortcomings of RNN. Since RNN is a neural network that is deep in the time direction, it has the drawback of "forgetting" the value entered in the early stages. That is, RNN is not good at long-term memory, but LSTM can perform both short-term memory and long-term memory, as the name "LongShortTermMemory" suggests.
-LSTM can also be imported from keras and easily implemented. Like Embedding, it can be used with __ "model.add (LSTM ())" __. The arguments are as follows. ・ Units: Number of dimensions of hidden state vector (about 100 to 300) -Return_sequences: "True" outputs the output sequence (hidden state vector) for __all input sequences __, and "False" outputs __ only the hidden state vector at the last time T.
BiLSTM -The LSTM inputs the input series x from 1 to the end in order, but conversely, it is also possible to take the method __ to input in order from the back. Applying this, a method called __ "BiLSTM" __ that inputs information from __two-way __ is often used. In Japanese, it is called __ "two-way recursive neural network" __. -The advantage of BiLSTM is that it is possible to acquire both "information propagated from the beginning" and "information propagated from the back" at one time. -For the implementation method, use keras __ "model.add (Bidirectional (argument))" __. In the first argument, __LSTM () __ in the previous section is stored as it is, and in the second argument, __ "merge_mode" __ specifies how to connect the two-way LSTM. -Specify one of __ ['sum','mul','concat','ave'] __ as the second argument. sum is connected by adding elements, and mul is connected by multiplying. concat is combined and connected, and ave is connected on average.
-Not limited to this natural language, in deep learning that classifies classes, the __Softmax function __ is used in the layer closest to the output layer of the neural network. This function was also used in "Deep Learning Basics". -As I confirmed at the time of "gender identification", by using Softmax for the output, the output such that the __ sum of the probability distribution of the output of each class is 1 will be output. -In the past, it was used like __ "model.add (Activation (" softmax "))" __ in __Sequential model , but " Functional API __" that describes the model without using Sequential. In the case of ", write as" __Activation ('softmax') (x) __ ". Pass __ [batch size, number of classes] __ to this x.
Attention
-Attention means __ "Attention Mechanism" __. This is a frequent mechanism in __machine translation and automatic summarization __. -For example, to give an example of automatic response to a question, suppose that there is a list s of a certain question sentence and a list t of the corresponding answer sentence. At this time, when you want the machine to do __ "determine whether t is valid as an answer sentence to s" __, RNN converts __ these sentences into hidden state vectors __ and __ at a certain time. By calculating the feature of t in consideration of the hidden state vector of s in, it is possible to obtain the information of t in consideration of the information of __s (where to pay attention to s) __, so the above judgment is possible. Become.
-When implementing Attention, it cannot be implemented with the __Sequential model __, so use the "Functional API" seen in the previous section. The Functional API is complicated unlike the Sequential model, which simply adds a layer, but __ it is possible to build the model freely. -Build a model with Functional API with __ "Model (inputs, outputs)" __. Therefore, it is necessary to create __ "inputs" and "outputs" to be passed when building the model in advance. -For __inputs creation __, create an input layer with __ "Input (shape = (sentence length,))" __ (OK without batch_size in the second argument), and __Embedding Apply __ and then make it BiLSTM and you're done. __ If you have multiple input layers, you can create multiple identical ones . - Output is created by implementing Attention __. In this case, for BiLSTM "bilstm1" and "bilstm2" of two sentences, the matrix product is calculated by "__dot () __", the __Softmax function __ is applied to it, and the matrix between this and birstm1. The data to be passed to output is completed by calculating the product __ and concatenating this with birstm2 with "__concatenate () __" to form it as an output layer.
・ The specific code is as follows
Dropout ・ (Review) Dropout is a method to randomly set some data to __ "0" __ to learn in order to prevent overfitting. -When using the Sequential model, it can be used with __ "model.add (Dropout (ratio)") __, but when using the Functional API like this time, __ "Dropout (ratio) (x)" __ You can use it like this.
-In deep learning of natural language processing, a process called Embedding is first performed to convert words into vectors. ・ A model called RNN is used in this deep learning. Of these, the overwhelming majority use LSTM, which has excellent long-term memory. In addition, by using "BiLSTM" that applies this LSTM from both ends of the data, the information of the entire input series can be considered. -By implementing Attention at the time of output, the information of one vector can be extracted after considering the information of another vector. This technique is common in natural language processing. -Similarly, data generalization can be prevented by dropping out at the output layer.
This time is over. Thank you for reading until the end.
Recommended Posts