Background
I used to save the pip package using venv
in Python3.6 or later, but when I was investigating whether it could be done on AWS, there was an affordable function called layer, so I tried using it.
venv
This is OK on the work PC
python -m venv [Virtual environment name]
./[Virtual environment name]/bin/activate
([Virtual environment name]) pip install [package name]
[Web scraping using AWS lambda-Development (Web scraping)](https://qiita.com/satsukiya/items/b9d02abd7fa96cd59355#development-web%E3%82%B9%E3%82%AF%E3% Create a zip file for uploading on AWS in the same way as 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0).
mkdir packages
cd packages
pip install [package] -t ./
pip install [package] -t ./
......
zip -r ./myDeploymentPackage.zip ./packages
Layer To create a pip environment, create a layer.
Development (error)
Here, [Web scraping using AWS lambda-Development (Web scraping)](https://qiita.com/satsukiya/items/b9d02abd7fa96cd59355#development-web%E3%82%B9%E3%82%AF% Let's write a code using beautiful soup with E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0).
import json
import requests
from bs4 import BeautifulSoup
def lambda_handler(event, context):
# TODO implement
response = requests.get('https://mainichi.jp/editorial/')
soup = BeautifulSoup(response.text)
pages = soup.find("ul", class_="list-typeD")
articles = pages.find_all("article")
links = [ "https:" + a.a.get("href") for a in articles]
return {
'statusCode': 200,
'links' : links
}
Then
{
"errorMessage": "Unable to import module 'lambda_function'"
}
I get an import error.
[Basic precautions when using Lambda @ Python layers (for those who cannot import files uploaded to layers)-Notes](https://qiita.com/k_hoso/items/78beb33e53abfdddabe7#%E6% B3% A8% E6% 84% 8F% E7% 82% B9)
Looking at, it seems that the zip file is decompressed under / opt /
.
In this case, there is a package installed by pip on the work PC in / opt / packages /
, so
import sys
sys.path.append('/opt/packages')
It is necessary to pass the path in advance.
Development (modified)
import sys
sys.path.append('/opt/packages')
import json
import requests
from bs4 import BeautifulSoup
def lambda_handler(event, context):
# TODO implement
response = requests.get('https://mainichi.jp/editorial/')
soup = BeautifulSoup(response.text)
pages = soup.find("ul", class_="list-typeD")
articles = pages.find_all("article")
links = [ "https:" + a.a.get("href") for a in articles]
return {
'statusCode': 200,
'links' : links
}
{
"statusCode": 200,
"links": [
"https://mainichi.jp/articles/20201120/ddm/005/070/067000c",
"https://mainichi.jp/articles/20201120/ddm/005/070/065000c",
"https://mainichi.jp/articles/20201119/ddm/005/070/119000c",
"https://mainichi.jp/articles/20201119/ddm/005/070/118000c",
"https://mainichi.jp/articles/20201118/ddm/005/070/115000c",
"https://mainichi.jp/articles/20201118/ddm/005/070/114000c",
"https://mainichi.jp/articles/20201117/ddm/005/070/114000c",
"https://mainichi.jp/articles/20201117/ddm/005/070/113000c",
"https://mainichi.jp/articles/20201116/ddm/005/070/043000c"
]
}
Post Scripting I wish I could do it without "/ packages /" when compressing with zip: expressionless:
Recommended Posts