I had to scrape it in Python, so make a note of it. Prepare the environment with Docker and implement it.
Constitution
$ ls
README.md docker-compose.yaml scraping
$ ls scraping/
Dockerfile requirements.txt scrap.py scraping.py
docker-compose.yaml
version: '3.8'
services:
scraping:
build: ./scraping
Dockerfile
FROM python:latest
COPY . /work
WORKDIR /work
RUN apt-get update
#Install beautiful soup
RUN pip install -U pip
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["scrap.py"]
requirements.txt
bs4
requests
Get it with h1
scrap.py
import requests
from bs4 import BeautifulSoup
url = "https://www.yahoo.co.jp"
response = requests.get(url)
soup = BeautifulSoup(response.text,"html.parser")
titles = soup.find_all("h1")
for title in titles:
print(title.text)
Execution result
$ docker-compose up --build
.
.
.
scraping_1 | Yahoo! JAPAN
scraping_1 |Search
scraping_1 |About JavaScript settings
scraping_1 |Recommended browser
scraping_1 |Notice
scraping_1 |Main services
scraping_1 |news
scraping_1 |Major news
scraping_1 |His Majesty the Emperor "Deep Reflection" This year as well
scraping_1 |A love story from a lonely battlefield
scraping_1 |5000 dead without fighting an abandoned island
scraping_1 |Be wary of disaster-grade heat in the afternoon
scraping_1 |Former Recruitment Kobun "Japan and Efforts"
scraping_1 |Lawson stamp purchase rampant background
scraping_1 |Breaking news exchange match Iwaki vs.Kokushikan
scraping_1 |Riseisha finishes summer without losing even once
scraping_1 |Silent prayer at the memorial service
scraping_1 |Information about individuals
scraping_1 |Your status
scraping_1 |Today's date
b-model_scraping_1 exited with code 0
Recommended Posts