Watson API's NLC (Natural Language Classifier) has been in service since the beginning of the Watson API service, and has been providing the service for a long time (although it has been about 3 or 4 years). ) API. Initially, it only had an API interface, but a beta version of the UI tool was provided in the middle, and a UI tool with a better UI was provided as a function of Watson Studio, and it continues to this day. The guide for using UI tools on Studio has already been introduced in another article Learning NLC with Watson Studio. This tool also has a simple test function, and tests can also be performed from the UI, but the problem in this case is that you can only perform one test at a time, and the target test data is sent at the same time. There is a point that it is not possible to obtain accuracy. I have created a simple tool to supplement this part, so I will introduce it.
First, let's take a quick look at what you can do with this tool. All the materials used to introduce here are uploaded on Github, so you can check the operation even in your environment. Unfortunately, NLC does not have a light plan, but Standard and the instance itself are free, and there is a range where you can check the operation for free below, so it should be possible for checking this sample within the free range.
Free 1 Natural Language Classifier per month
Free 1000 API calls per month
4 free training events per month
It is assumed that you have already learned. It doesn't matter if you use the Studio features or call the API directly. In this sample, we used the following text from Wikipedia to study three genres (classes): "Japanese history (j-hist)", "Japanese geography (j-geo)", and "science". It was.
The verification data uses the following execl file.
The first row of EXCEL means the column name. The item with the column name ** text ** means the text data used when calling NLC. ** class ** is the correct class name corresponding to that Text. Other items are not used when conducting the test, so they may or may not be present. In this sample Excel, there is a key item ** text_id ** for convenience, but it does not matter whether such a column exists or not.
The output data created as a result of the test is also Excel. The sample is shown below. (The appearance will be crafted later to make it easier to see. The actual output is more plain.)
The first three columns are the test input data itself. After that, the result of applying Text to the model, The output is in the order of ** class name **, ** confidence **. The number of items to be output can be set with variables on the notebook. In this example, n = 3.
The output as Excel is as above, but as a bonus function, the following confusion matrix (Confusion Matirx) is also displayed on Notebook. Now you can also see what class is accurate and how much.
I will explain the environment required to run the sample.
You must have an account on IBM Cloud and have NLC and Watson Studio available as services. Unfortunately, for NLC, you need a credit card account. Please refer to the link below for the specific procedure.
Easy Jupyter Notebook in the Cloud-Procedures for using Jupyter Notebook in IBM Cloud-
Procedure for upgrading to IBM Cloud (formerly Bluemix) credit account
The following article describes the procedure for associating Spark / Watson ML services with Studio, but you can associate NLC with the same procedure. (The service group is "** AI **" like Watson ML)
Register additional services in Watson Studio
The necessary materials to move the sample are as follows.
file name | Purpose | Link |
---|---|---|
nlc-test-tool-v1.ipynb | Accuracy verification script body | For downloadForcodeconfirmation |
nlc-test-sample.xlsx | Excel sample for verification | For download |
nlc-test-sample-output.xlsx | Verification result Excel sample | For download |
nlc-train.csv | CSV sample for learning | For downloadFordataconfirmation |
Now, I will explain the procedure for actually running the sample application.
In this sample application, the model trained using the above nlc-train.csv
as training data is used.
The learning procedure using Watson Studio is explained in another article Learning NLC with Watson Studio, so please refer to this.
After learning, check the Model ID of the model you created on the asset management screen of Watson Studio.
Copy it and save it with a text editor, etc., as you will use it later.
https://cloud.ibm.com/services
From, view the list of IBM Cloud services and click the NLC link.
The screen will look like the one below. ① Click ** Service Credentials ** ② Click ** Display credential information ** ③ Click the ** clipboard icon ** to copy the credentials In order.
Paste the clipboard credentials into a text editor and save them. We will use this information later.
Upload Excel nlc-test-sample.xlsx
to be used for verification on the cloud.
The procedure is as follows:
View the main project management screen from https://dataplatform.cloud.ibm.com/projects
Select ** Data ** from ** Add to project ** at the top of the screen (see below)
If the upload is successful, you should see nlc-test-sample.xlsx
in ** Data Assets **, as shown in the following figure.
Load the pre-downloaded nlc-test-tool-v1.ipynb
into Watson Studio's Jupyter Notebook.
For the loading procedure, refer to Easy Jupyter Notebook in the Cloud-Procedures for using Jupyter Notebook in IBM Cloud-.
The loaded Jupyter Notebook needs to be modified in several places according to the environment. First, set up your COS credentials. The Notebook immediately after loading should look like the figure below, so click the "** + **" icon at the top of the screen to insert a cell.
In the state of the figure below
① Click the ** file ** icon at the top of the screen
② From the file list, click the nlc-test-sample.xlsx
uploaded earlier.
③ Click ** Insert Credenstails ** from the menu that appears
The contents of the empty cell inserted earlier should be as shown below, so copy ** IAM_SRVICE_ID ** to ** BUCKET ** to the clipboard.
Paste the copied information into the "** COS Credentials **" cell below it. The item of the original dummy data is deleted.
In the end, it should look like this: (Note that only the bottom FILE: infile
line will use the original information.)
After completing a series of work, delete the cells added for work together with the cells. (Click the scissors icon with the cell you want to erase selected)
Set the following model_id. Paste the prepared model_id in the model_id row of the "** Variable Definition **" cell. Enclose the string in single quotes.
Finally, set the NLC credentials. Paste the pre-prepared NLC credentials into the "** NLC Credentials **" cell. Delete the extra parentheses and the first dummy item line to make it look like the figure below.
Thank you for your hard work. This completes all preparatory work.
In ** Jupyter Notebook **, the cursor is aligned with the cell you want to process, and the corresponding cell is executed by pressing SHift + Enter
keys, and the selected cell is advanced by one.
If you return the selected cell to the first cell and repeat SHift + Enter
, it should evaluate the model with test data, generate Excel, display the confusion matrix, and so on.
If an error occurs in any cell, see the error message to determine the problem.
If there is no error display and the execution result is as shown in the figure below, the tool has been executed successfully.
The generated Excel is made up of the above Notebook up to the point of writing it back to COS, but downloading this Excel from the Studio screen requires another effort. I will explain the procedure. (If you create an output EXCEL with the same file name, the procedure will be unnecessary from the second time onward)
First, select "** Add to project "-> " Data **" from the project management screen. (Procedure explained in Excel file upload)
If you set the tab at the top of the screen to "File", the newly generated nlc-test-sample-output.xlsx
file should be included in the list.
(If it doesn't appear in the list, try closing and reopening the Porject.)
On the screen below
① Click the check box of this file ② Click the icon with the dots lined up vertically ③ Select "** Add as data ase et **" from the menu
will do.
Then, as shown in the screen below, the output EXCEL will also be displayed in the ** Data asset ** field. Click the icon under Actions and select ** Download ** from the menu to download the Excel file for output.
The raw Excel before processing looks like this.
Before I knew it, it seems that EXCEL, which has only one sheet, can be read as Data Asset of Studio. Attach the result of reading Excel registered on the Data Asset side from Studio in the above procedure.
Recommended Posts