I was trying to make a web scraping program with Python's urllib and Beautifulsoup. However, I got an error that I couldn't get a response from the first urllib.request.urlopen (...) (I get a message like the one below).
It seems that communication has not been established due to the existence of the Proxy server. Proxy was as follows in Internet Explorer.
-[Tools]-> [Internet Options]-> [Connections]-> [LAN Settings]
-[x] Use automatic configuration script
The automatic configuration script was http://proxy.-----.co.jp/proxy.pac. (----- is not the actual one, it is in hidden letters.)
Before urlopen, I solved it by preparing ProxyHandler for urllib.request, setting it in build_opener, and installing build_opener.
The sample code is below.
scrapetest.py
import urllib.request
proxies ={'http':'http://proxy.-----.co.jp/proxy.pac'}
proxy_handler = urllib.request.ProxyHandler(proxies)
opener = urllib.request.build_opener(proxy_handler)
urllib.request.install_opener(opener)
html = urllib.request.urelopen("http://wwww.pythonscraping.com/pages/page1.html")
print(html.read())
The development environment is Python 3.5.2 by Anaconda on Windows.
You will learn about web scraping programming in the book below. Web scraping with Python (O'Reilly)
Recommended Posts