Scraping with Python

I had to scrape it in Python, so make a note of it. Prepare the environment with Docker and implement it.

`Constitution`


$ ls
README.md  docker-compose.yaml  scraping
$ ls scraping/
Dockerfile  requirements.txt  scrap.py  scraping.py

`docker-compose.yaml`


version: '3.8'

services:
  scraping:
    build: ./scraping

`Dockerfile`


FROM python:latest

COPY . /work
WORKDIR /work

RUN apt-get update

#Install beautiful soup
RUN pip install -U pip
RUN pip install -r requirements.txt

ENTRYPOINT ["python"]
CMD ["scrap.py"]

`requirements.txt`


bs4
requests

Get it with h1

`scrap.py`


import requests
from bs4 import BeautifulSoup

url = "https://www.yahoo.co.jp"
response = requests.get(url)

soup = BeautifulSoup(response.text,"html.parser")

titles = soup.find_all("h1")

for title in titles:
    print(title.text)

`Execution result`


$ docker-compose up --build
.
.
.
scraping_1  | Yahoo! JAPAN
scraping_1  |Search
scraping_1  |About JavaScript settings
scraping_1  |Recommended browser
scraping_1  |Notice
scraping_1  |Main services
scraping_1  |news
scraping_1  |Major news
scraping_1  |His Majesty the Emperor "Deep Reflection" This year as well
scraping_1  |A love story from a lonely battlefield
scraping_1  |5000 dead without fighting an abandoned island
scraping_1  |Be wary of disaster-grade heat in the afternoon
scraping_1  |Former Recruitment Kobun "Japan and Efforts"
scraping_1  |Lawson stamp purchase rampant background
scraping_1  |Breaking news exchange match Iwaki vs.Kokushikan
scraping_1  |Riseisha finishes summer without losing even once
scraping_1  |Silent prayer at the memorial service
scraping_1  |Information about individuals
scraping_1  |Your status
scraping_1  |Today's date
b-model_scraping_1 exited with code 0

Recommended Posts

Scraping with Python

Scraping with Python (preparation)

Try scraping with Python.

Scraping with Python + PhantomJS

Scraping with Selenium [Python]

Scraping with Python + PyQuery

Scraping RSS with Python

[Scraping] Python scraping

I tried scraping with Python

Web scraping with python + JupyterLab

Scraping with selenium in Python