Merry Christmas. In old age, he wants to join the Santa Claus Association and give his children dreams.
By the way, do you guys like Windows? I love.
However, probably because there are few people using Windows machines in the data analysis area, the latest analysis libraries and frameworks are often inadequate for Windows, and there are scenes where it is difficult to build an environment.
Even developers who only have Windows machines have wonderful motivations such as "I want to do data analysis" and "I want to do natural language processing", but "Mac is not provided ('· × ·`) ・ ・ ・ Start For those who are worried that "I can't do it (SIer related)", this article will introduce the procedure for executing a natural language processing program with VS Code + Docker. Let's Dive into Docker for Debugging!!!
This time, I will try with a rudimentary code of "sentiment analysis" which is a task of natural language processing.
The framework used is Hugging Face, which specializes in natural language processing. Reference article: Transformers of Hugging Face attracting attention in natural language processing (NLP)
# INPUT
text = ['Very yeah',
'I'm not feeling well today',
'It's subtle',
'Okay',
'I don't think it's good']
------------------------------------------------------------------------
# OUTPUT
[[{'label': 'positive', 'score': 0.9899728894233704}] #Very yeah
[{'label': 'Negative', 'score': 0.8069409132003784}] #I'm not feeling well today
[{'label': 'Negative', 'score': 0.7249351143836975}] #It's subtle
[{'label': 'positive', 'score': 0.6537005305290222}] #Okay
[{'label': 'Negative', 'score': 0.9345374703407288}] #I don't think it's good
Enter any text and run a program that can determine if the text is Positive/Negative. That's exciting.
It is assumed that the following environment is prepared.
Well, let's make a program first.
There are only two files to prepare first. The file structure looks like this.
First, from the Docker file.
Dockerfile
FROM continuumio/anaconda3
WORKDIR /app
#RUN conda install -y tensorflow
RUN pip install -U pip && \
pip install mecab-python3 && \
pip install fugashi && \
pip install ipadic && \
pip install torch && \
pip install transformers
Next is the main program (Python) to be executed.
main.py
from transformers import pipeline
from transformers import BertForSequenceClassification
from transformers import BertJapaneseTokenizer
def nlp_main():
#Text to enter
text_list = ['Very yeah','I'm not feeling well today','It's subtle','Okay','I don't think it's good']
model = BertForSequenceClassification.from_pretrained('daigo/bert-base-japanese-sentiment')
tokenizer = BertJapaneseTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
#Functions for sentiment analysis
nlp_sentiment_analyzer = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
#Processing execution
for index, text in enumerate(text_list):
print(f"No{index}『{text}』:{nlp_sentiment_analyzer(text)}")
if __name__ == '__main__':
nlp_main()
~~ Very simple is Best. ~~
Now that we have defined a Dockerfile, let's build it. Originally, it is necessary to install the library etc. directly in the Native Windows environment, but With Docker, you can build an environment on a container very easily.
With just this, you can build Docker using VS Code. Isn't it easy? Build time takes about 10 minutes. * By the way, my environment is Core i71065G7 @ 1.3GHz, 1.5GHz 16GB.
As a trial, run the program normally on the console instead of Debug.
python main.py
Now, here is the main debug execution method.
Install VS Code Extension so that you can run Debug.
Now let's set the Debug settings.
Click the "Debug icon" above and click the "create a launch.json file link".
After clicking, a selection screen will be displayed as to what to debug. Select "Python".
Then select "Python File".
The following automatically generated file "launch.json" is displayed.
Please rewrite as follows. Changes: "program": "$ {workspaceRoot} /main.py"
launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${workspaceRoot}/main.py",
"console": "integratedTerminal"
}
]
}
Please refer to this information for details on Debug settings. Visual Studio Code Debugging
Debug is executed.
Video commentary
When the debug session starts, the Debug toolbar appears at the top of the editor.
Continue / Pause F5 * Proceed to the next breakpoint
Step Over F10 * Go one step at a time
Step Into F11 * Perform internal execution of the step
Step Out Shift + F11 * End internal execution and return to the next higher step
Restart Ctrl + Shift + F5 * Restart the program
Stop Shift + F5 * End the program
"Step" is one line of source code.
Video commentary
If you can do it so far, it's okay if you have debugged with Visual Studio or Eclipse (old), right?
Also, for the entered text, The Positive/Negative classification and the reliability score are also displayed, but it seems that the result is reasonable from the human eye. It's amazing.
It is difficult to build a natural language processing environment in a Native Windows environment, By sandwiching Docker in this way, it is possible to easily build an environment. You can also run Debug with VS Code.
If you want to analyze in a Windows environment, please give it a try.
Being able to develop with Docker means launching a strong instance such as AWS EC2, In fact, VS Code can also be used for remote debugging while it is running. In other words, GPU instances can also be used.
Recommended Posts