Lambda A serverless computing service provided by aws that can execute code. There is also a function to scale according to the number of requests, so there is no need for environment construction, load distribution, or maintenance.
You pay for the time you run, so you won't be charged when you're not running your code. In other words, there is no server maintenance cost. This is a great deal.
Lambda doesn't use a server, so you can't connect directly and install the required libraries. Instead, you can use it by uploading a library adapted to the pre-installed Linux environment to Lambda.
I will upload Selenium used this time in the same way.
First of all, you need to install Selenium and Webdriver in some environment
・ Launch and install Python environment with Docker ・ Use Cloud9
There is a method called, but this time we will adopt Cloud9, which can be done more quickly and easily.
Cloud9 A service that allows you to execute code from your browser.
Enter Cloud9 to move.
The environment name is python_for_lambda, and everything else is created by default.
This alone creates an environment where python can be executed.
$ python -V
Python 3.7.9
Install selenium immediately Specify the directory ** python/lib/python3.7/site-packages ** as the installation destination.
$ pip install selenium -t python/lib/python3.7/site-packages
Collecting selenium
Using cached https://files.pythonhosted.org/packages/80/d6/4294f0b4bce4e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl
Collecting urllib3 (from selenium)
Using cached https://files.pythonhosted.org/packages/f5/71/45d36a8df6861b2c017f3d094538c0fb98fa61d4dc43e69b9/urllib3-1.26.2-py2.py3-none-any.whl
Installing collected packages: urllib3, selenium
Successfully installed selenium-3.141.0 urllib3-1.26.2
Next, we will install chrome. Since we are considering headless operation, we will install chrome driver and headless-chromium.
$ mkdir -p headless/python/bin
#Create a directory to save in advance
$ cd headless/python/bin
$ url -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > headless-chromium.zip
# headless-Install chromium
$ unzip -o headless-chromium.zip -d .
$ rm headless-chromium.zip
#Extract the file and delete the zip
$ curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > chromedriver.zip
#Install chrome driver
$ unzip -o chromedriver.zip -d .
$ rm chromedriver.zip
Left-click each and select Download to download with zip.
selenium: ** Under python ** chromedriver, headless-chromium: ** Under headless **
headless
headless
┗ python
┗ bin
┣ chromedriver
┗ headless-chromium
selenium
python
┗ lib
┗ python3.7
┗ site-packages
┣ selenium
┣ selenium-3.141.0.dist-info
┣ urllib3
┗ urllib3-1.26.2.dist-info
If you are already using Selenium or chromedriver, it may already exist on your PC, but it will not be able to run when you upload it to Lambda unless it is a version compatible with the Linux environment **, so Linux We recommend that you use the one installed on cloud9, which is the environment.
Lambda
Now that we have the required libraries installed, we'll move this up to Lambda.
First move to Lambda.
Select ** Layer ** from the console.
This layer archives the libraries and content needed to execute the function and can be used during code execution. Let's upload the selenium and headless files that we just installed.
Layer name | File | |
---|---|---|
chromedriver, headless-chromium | headless | headless.zip |
selenium | selenium | python.zip |
Set the runtime to python3.7.
Next, create the function. I will write the code later, so let's first create the function type.
Select Create Dashboard Function.
The function name is lambda_function_for_headless_chrome, and the runtime is created with python3.7.
Now you have an environment to run python.
Then add the layer you just created to this function.
Click Layers and select Add Layer.
Add headless and selenium layers from ** Custom Layer ** respectively.
Finally, if Layers is (2), layer addition is complete.
Now let's run selenium in python.
Rewrite the code that already exists in the function code from the function as follows.
lambda_function_for_headless_chrome
#Import automatically under python
from selenium import webdriver
def lambda_handler(event, context):
URL = "https://news.yahoo.co.jp/"
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--hide-scrollbars")
options.add_argument("--single-process")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--window-size=880x996")
options.add_argument("--no-sandbox")
options.add_argument("--homedir=/tmp")
options.binary_location = "/opt/python/bin/headless-chromium"
#Browser definition
browser = webdriver.Chrome(
"/opt/python/bin/chromedriver",
options=options
)
browser.get(URL)
title = browser.title
browser.close()
return title
What you need to pay attention to here is the description of ** chromedriver PATH **. It is specified in ** under opt ** because the folder uploaded to the ** Lambda layer is automatically saved under opt **.
Then Selenium's PATH is not necessary.
In the Lambda layer ・ ** Under python ** ・ ** python/lib/python3.x (version to be used) /site-packages subordinate ** If either of these, ** the file will be read automatically **.
Therefore, this time, you can execute the import command without specifying PATH.
Finally, play with ** basic settings ** a little.
This is because ** execution processing in Selenium takes longer than a normal program **, so there is a high possibility that it will time out if it is an existing setting. Therefore, it is necessary to take a long timeout.
Also, ** execution memory was not enough if it was 128MB, so change it to 256MB ** before running the test.
Now let's create a test and run the code.
Click the test and enter the function name, otherwise it will be created by default.
Once created, click the test again to run it!
If this happens after the standby screen has been displayed for a while, it is successful.
Next, we will write the production code to implement more practical periodic processing.
If it doesn't work, please refer to here.
When'lambda_function': No module named'selenium' comes out
When chromedriver'executable may have wrong permissions. Appears
[Periodically run Python scraping on AWS Lambda] (https://qiita.com/eisu26/items/be7a75edf7a798f17f11)
How to make AWS Lambda Layers when running selenium x chrome on AWS Lambda
Recommended Posts