Web scraping with BeautifulSoup4 (serial number page)

Web scraping with Beutiful Soup 4

I wrote a code to create a URL list for downloading all at once on a page with serial numbers of common URLs, so make a note

Installation

$ apt-get install lxml-python
$ pip install beautifulsoup4

Source

`scraper.py`


# -*- coding: utf-8 -*-

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

try:
    # Python 3
    from urllib import request
except ImportError:
    # Python 2
    import urllib2 as request

from bs4 import BeautifulSoup
import codecs
import time

def getSoup(url):
    response = request.urlopen(url)
    body = response.read()
    # Parse HTML
    return BeautifulSoup(body, 'lxml')

wait_sec = 3
domain = 'http://hoge.com'
result_file = 'list.txt'
i = 1
while(True):
    url = '{domain}/{index:0>2}/'.format(domain = domain, index = i)
    try:
        soup = getSoup(url)
    except IOError:
        break

    div = soup.find('div', attrs = {'id': 'div_id'})
    all_a = div.find_all('a', attrs = {'class': 'a_class'})
    src_list = []
    for a in all_a:
        src_list.append(a.img['src'])
    with codecs.open(result_file, 'a', 'utf-8') as f:
        f.write('\n'.join(src_list))
    print(i)
    i += 1

    time.sleep(wait_sec)

Reference page

[Python: Scraping websites with BeautifulSoup4](http://momijiame.tumblr.com/post/114227737756/python-beautifulsoup4-%E3%82%92%E4%BD%BF%E3%81%A3 % E3% 81% A6-web-% E3% 82% B5% E3% 82% A4% E3% 83% 88% E3% 82% 92% E3% 82% B9% E3% 82% AF% E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0% E3% 81% 99% E3% 82% 8B)

Scraping with Python and Beautiful Soup

Recommended Posts

Web scraping with BeautifulSoup4 (serial number page)

Web scraping with BeautifulSoup4 (layered page)

[Personal note] Web page scraping with python3

Web scraping with python + JupyterLab

Save images with web scraping

Easy web scraping with Scrapy

Web scraping beginner with python

I-town page scraping with selenium

Web scraping with Python ① (Scraping prior knowledge)

Scraping Alexa's web rank with pyQuery

Web scraping with Python First step

I tried web scraping with python.

web scraping

Getting Started with Python Web Scraping Practice

Web scraping with Python ② (Actually scraping stock sites)

Horse Racing Site Web Scraping with Python

Monitor web page updates with LINE BOT

Getting Started with Python Web Scraping Practice

Import serial number videos together with Aviutl

Practice web scraping with Python and Selenium

Easy web scraping with Python and Ruby

[For beginners] Try web scraping with Python

Scraping with selenium

AWS-Perform web scraping regularly with Lambda + Python + Cron

Scraping with selenium ~ 2 ~

Scraping with Python

Scraping with Python

web scraping (prototype)

Erase & generate serial number files with shell script

Scraping with Selenium

[python] Quickly fetch web page metadata with lassie

Let's do web scraping with Python (weather forecast)

Let's do web scraping with Python (stock price)

Extract data from a web page with Python

Data analysis for improving POG 1 ~ Web scraping with Python ~

Display serial number columns and variables with Bottle template

Quick web scraping with Python (while supporting JavaScript loading)

Python beginners get stuck with their first web scraping

Serial communication with Python

Successful scraping with Selenium

Scraping with Python (preparation)

Try scraping with Python.

Web page summary (preprocessing)

Serial communication with python

Scraping with Python + PhantomJS

Introduction to Web Scraping

Flask-Python realization web page

Scraping with Selenium [Python]

Python web scraping selenium

Scraping with Python + PyQuery

Scraping with Beautiful Soup

Scraping RSS with Python

Make a gif animation from a serial number file with matplotlib

[Part.2] Crawling with Python! Click the web page to move!

[Python] Easy reading of serial number image files with OpenCV

Display a web page with FastAPI + uvicorn + Nginx (SSL / HTTPS)