I want to perform simple natural language processing (morphological analysis + α) using MeCab in the pre-processing of Azure Data Factory. It would be convenient if you could implement it as a function and call it later from various services such as LogicApps. So I considered two implementation methods.
Since it does not perform heavy processing like machine learning, Azure Functions will be sufficient, so I implemented it.
If you write the conclusion first ** ・ Functions triggered by HTTP Request of Azure Functions can be implemented by referring to the following URL **
Create an Azure Functions project using Visual Studio Code https://docs.microsoft.com/ja-jp/azure/azure-functions/functions-create-first-function-vs-code?pivots=programming-language-csharp
** ・ Mecab can also be used on the Functions side by adding "mecab-python3" to requirements.txt of .vscode **
** ・ The monthly free tier of Azure Functions is rather generous, so you can try it for free for a while **
The rest is a memorandum of some stumbling points.
There are many points of lack of understanding, so please point out any mistakes.
An event-driven serverless computing platform. It is activated by the set trigger and consumes computing resources only during the execution of the function, so wasteful costs can be reduced.
There are three hosting plans for Functions, but the leftmost "pay-as-you-go plan" is the so-called normal serverless. This time I chose this.
Azure Functions-Price setting https://azure.microsoft.com/ja-jp/services/functions/#pricing
Billing is determined by the number of executions, execution time, and memory usage. There is a free frame that is reset every month, and it seems that it is free up to "1 million executions" and "400,000 GB seconds".
You can choose from .Net Core, Node.js, Python, Java, Powershell Core. One thing to note is that if you choose Python for the runtime stack, the OS will only support Linux. Currently, Linux's pay-as-you-go plan is not supported in both regions of Japan, so it is necessary to select another region. (As of May 2020)
Supported regions are "[Available products by region](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=functions®ions=us-east,us-east-2,us -central, us-north-central, us-south-central, us-west-central, us-west, us-west-2, japan-west, japan-east) ".
Triggers for launching Functions include HTTP Request, Timer, BLOB, eventhub, etc. This time it seems to be easy to use, so I implemented it with HTTP Request for the time being. It seems easy to use when the BLOB file is created.
I wanted to use Python for a pay-as-you-go plan, so I created a region in East Asia for the time being. (Anyway, as long as it supports the pay-as-you-go plan Linux. Geographically, Korea Central in Seoul was closer.)
You can now choose Linux pay-as-you-go. We also need to prepare a storage account to link from Functions, so we created a new one this time.
Application Insight for monitoring is also created at this timing. Now you can see the execution log.
This will create four resources.
It is unclear why the resource called App Service Plan is created even though the consumption (serverless) is selected in the price plan.
In the case of .NET, you can create a function on the screen of Azure portal, but unfortunately it is not supported in the case of Python. Therefore, the procedure is to develop and test locally and then deploy to Functions.
The specific procedure was advanced by referring to the following article.
Create an Azure Functions project using Visual Studio Code https://docs.microsoft.com/ja-jp/azure/azure-functions/functions-create-first-function-vs-code?pivots=programming-language-csharp
The following environment is required locally, so install it. ・ Because. js -Python 3.8, Python 3.7, or Python 3.6 ・ Visual Studio Code · Python extensions for Visual Studio Code · Azure Functions extension for Visual Studio Code ・ Azure Functions Core Tools
Permission error when running Functions Local.
Error message `The file cannot be read because script execution is disabled on this system. ``
I had to change the PowerShell Execution Policy. I raised ExecutionPlicy to RemoteSigned only for the first time, and then returned to Restricted and the error disappeared.
About Execution Policies https://docs.microsoft.com/ja-jp/powershell/module/microsoft.powershell.core/about/about_execution_policies?view=powershell-7
Error with the following code on \ _ \ _ init \ _ \ _. Py
import azure.functions as func
Error message
Unable to import 'azure.functions' pylint(import-error) [3, 1]
It seems that the problem was caused by a mixture of multiple Python versions locally, and it was solved by switching to another runtime.
If you add the library to requirements.txt of .vscode, it seems that pip install will be done on the Functions side. So I added the following. ・ Mecab-python3
The rest is as usual.
__init__.py
import MeCab
mecab = MeCab.Tagger("-Ochasen")
parsedsentence = mecab.parse(sentence)
The use of MeCab in Functions was surprisingly quick. Functions are sufficient for those that meet this purpose. (However, I don't understand why it works, including where the mecab dictionary itself is ...)
The pay-as-you-go plan has a maximum timeout of 10 minutes, so it's just for lightweight execution. If you want to do something a little more elaborate, it seems better to use another hosting plan for Functions or even adopt DataBricks.
Azure Functions scale and hosting https://docs.microsoft.com/ja-jp/azure/azure-functions/functions-scale#service-limits
You can see the cost by calling Metric from the Azure portal. The result of executing the simple function "Return the result of morphological analysis with Mecab for one input sentence" once is as follows.
The unit may be Function Execution Unit. Since the unit of Function Execution Unit is [MB milliseconds], convert this to [GB seconds], which is the unit of billing.
This time 163.58 k = 163,580 MB milliseconds, so 163,580 / 1,024,000 = 0.15974609375 【GB seconds】
It's free of charge up to "1 million executions" and "400,000 GB seconds" every month, so if you do some testing, it seems that it will fit in the free frame at all. Of course, please note that you will be charged for BLOBs created at the same time.
Recommended Posts