As the title says. Even if I googled quite a bit, I could not find a solution in Japanese or English, so make a note so that I can refer to it in case of recurrence.
It's not actually wikipedia, but it happened when I scraped the https: // ~
site.
If you are touching the PC now, you can scrape the site without any error. Why ... environmental problems?
import requests
url = 'https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8'
response = requests.get(url)
result = response.text
print(result)
The error message at that time is not reserved, but I remember that the word bad handshake
was included in SSLError
.
I didn't want to use the verify = False
strategy, so I did a lot of research and found that using ʻurllib and
ssl` was able to scrape without error.
Once you get over here, all you have to do is extract the elements you want to use with Beautiful Soup 4.
import urllib.request
import ssl
url = 'https://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8'
context = ssl.SSLContext()
req = urllib.request.Request(url=url)
with urllib.request.urlopen(req, context=context) as f:
result = f.read().decode()
print(result)
Recommended Posts