How to get data from a web page Here
Use the re module of the standard library. Official documentation
import re
from html import unescape
from urllib.parse import urljoin
#Get html data from a web page
for partial_html in re.findall(Regular expressions):
#Get specific information with regular expression from the acquired information
#In regular expression, re.search()do
Python pine for libxml2 and libxslt, well-known libraries for XML processing written in C. Official documentation
import lxml.html
a =Read html file
html = a.getroot()
#Convert all a elements to href attribute to absolute URL based on the URL of the argument
html.make_link_absolute(URL)
for b in html.cssselect(Specify the element with the CSS selector):
#Get element
Recommended Posts