I am making an application that periodically scrapes with Lambda and stores it in DynamoDB.
If you google immediately, many articles will come out. Thank you. I was trying by referring to this article, but there was a point I was addicted to for about 2 days, so I made a note.
https://masakimisawa.com/selenium_headless-chrome_python_on_lambda/
This chromedriver_linux64.zip
serverless chrome v1.0.0-55 https://github.com/adieuadieu/serverless-chrome/releases/tag/v1.0.0-55
stable-headless-chromium-69.0.3497.81-amazonlinux-2017-03.zip
selenium 3.14
Bring it from cloud9 (Amazon Linux)
lambda layer is Python 3.6 Of course, the runtime of lambda is also 3.6
I checked the version many times, but the following error on lambda
Chrome failed to start: exited abnormally\n (unknown error: DevToolsActivePort file doesn't exist)
There are many opinions that it can be fixed by adding some options such as the following articles. https://stackoverflow.com/questions/50642308/webdriverexception-unknown-error-devtoolsactiveport-file-doesnt-exist-while-t
In the article, it's java code, but I changed it to Python and
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
However
Another error occurred
unknown error: unable to discover open window in chrome
After all, in addition to some of the above, in this article
options.add_argument("--single-process")
It was solved by adding.
https://stackoverflow.com/questions/60229291/aws-lambda-ruby-crawler-selenium-chrome-driver-unknown-error-unable-to-discov
In addition to options.add_argument ("--headless ")
, click here for the option settings that were finally added as troubleshooting.
options.add_argument("--single-process")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--no-sandbox")
When uploading a layer to lambda I put headless-chrome and chromedriver in a dir called headless-chrome and zipped it, but when I did it with 7zip, it did not work with an error like executable may have wrong permission. Looking at various things, it seems that the deployment package does not work well with windows zip, so if you zip it with wsl ubuntu. This problem has been resolved.
I was at a loss about how to scrape with Cloud Funtion + Typescript Puppeteer,
Recommended Posts