In a hurry, there was a request to save data that spans multiple pages in a database, so I wrote it in a rush work. CSS selectors are deadly useful, aren't they?
scl.py
import requests, os, re, csv, bs4
import sqlite3
import lxml.html
a = 0
i = 0
url = 'https://www.〜'
while a < 55:
a += 1
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'lxml')
for u in soup.select('.plan-module > .plan-link.plan-image-container'):
urls = 'https://www.〜' + u.attrs['href']
#print (urls)
con = sqlite3.connect('url.db')
c = con.cursor()
c.execute('''CREATE TABLE IF NOT EXISTS urldata(urls unique)''')
c.execute('INSERT INTO urldata VALUES (?)',[urls])
con.commit()
con.close()
i += 1
url = 'https://www.〜?=' + str(i)
print ('success')
However, it turned out that pagination is a dynamic element and it is useless without using Selenium.
Recommended Posts