Installation of required packages
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"
$ sudo aptitude install phantomjs xvfb
$ pip install selenium pyvirtualdisplay
from selenium import webdriver
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()
# <Display cmd_param=['Xvfb', '-br', '-nolisten', 'tcp', '-screen', ' - snip -
driver = webdriver.PhantomJS()
driver.get("http://www.example.com)
type(driver.page_source)
# <class 'str'>
driver.page_source
# '<!DOCTYPE html><html itemscope="" itemtype="http://schema.org/Web - snip -
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_sourve)
i = [ {"href": x["href"], "text": x.string, "class": x._class } for x in soup.find_all("a") ]
print(i)
# [{'class': None, 'text': 'MENU', 'href': 'javascript:;'}, {'class': None, 'text': 'top page', 'href': '/'}, {'class': None, 'text': 'platform', 'href': '/pf/'}, - snip -
Even now (September 2016), there are the following problems, so when using Phantomjs on Ubuntu 16.04, it is better to install it by the normal procedure instead of from the package. https://bugs.launchpad.net/ubuntu/+source/phantomjs/+bug/1578444
Recommended Posts