** May 6, 2018: I wrote a new article that matches the current situation after Headless Chrome became Stable. See also here. ** **
The other day, PhantomJS's Vitaly talked about the story of retiring as a maintainer. PhantomJS has helped me as an easy way to use a headless browser. I want you to use Headless Chrome in the future, so I tried it.
I can find many samples that use Node.js, but I wanted to use Python for various reasons, so here I will use Headless Chrome via Selenium.
It is a mode that works without displaying the screen, which will be available from Google Chrome 59. Useful for automated testing and web scraping.
As of April 28, 2017, it seems to be available on the Mac and Linux versions of the Dev or Canary channels. I tried it on the Mac version of the Canary channel. I also tried it on the Windows version of the Canary channel, but the screen was displayed even if I specified --headless
. I think it will be available soon [^ 1].
[^ 1]: Reference: https://bugs.chromium.org/p/chromium/issues/detail?id=712981
For the time being, it is easy to use from chrome-remote-interface of Node.js, and there is a lot of information, so you should try it from here. Let's do it.
By the way, it seems that there are many examples of using Chrome on a virtual display such as Xvfb headlessly from long ago. When you google, let's check which meaning it is used.
Headless doesn't mean much different than using regular Chrome. Operate Chrome from Selenium through Chrome Driver. When creating a Chrome WebDriver, pass ChromeOptions as an argument and specify the path and arguments of Chrome to be executed in it.
The environment I tried is as follows.
Assumption: Python 3.6 is installed.
--headless
is available on the Stable channel) .. Canary can coexist with Stable.(venv) $ pip install selenium
Do a Google search. I modified the sample code of Python Crawling & Scraping and replaced the part that used PhantomJS with Headless Chrome.
selenium_google.py
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
options = Options()
#Chrome path (on Stable channel)--It should be unnecessary when headless becomes available)
options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
#Enable headless mode (comment out the next line to see the screen).
options.add_argument('--headless')
#Create a Chrome WebDriver object.
driver = webdriver.Chrome(chrome_options=options)
#Open the top screen of Google.
driver.get('https://www.google.co.jp/')
#In the title'Google'Make sure that is included.
assert 'Google' in driver.title
#Enter the search term and send.
input_element = driver.find_element_by_name('q')
input_element.send_keys('Python')
input_element.send_keys(Keys.RETURN)
time.sleep(2) #In the case of Chrome, it will transition with Ajax, so wait for 2 seconds for the time being.
#In the title'Python'Make sure that is included.
assert 'Python' in driver.title
#Take a screenshot.
driver.save_screenshot('search_results.png')
#Display search results.
for a in driver.find_elements_by_css_selector('h3 > a'):
print(a.text)
print(a.get_attribute('href'))
driver.quit() #Quit the browser.
When I executed the following, the search results were output without displaying the browser screen.
(venv) $ python selenium_google.py
Python -Wikipedia
https://ja.wikipedia.org/wiki/Python
Python Tutorial — Python 3.6.1 document
http://docs.python.jp/3/tutorial/
Python basic course(1 What is Python?) - Qiita
http://qiita.com/Usek/items/ff4d87745dfc5d9b85a4
10 contents that even beginners can study Python almost for free-paiza development diary
http://paiza.hatenablog.com/entry/2015/04/09/%E5%88%9D%E5%BF%83%E8%80%85%E3%81%A7%E3%82%82%E3%81%BB%E3%81%BC%E7%84%A1%E6%96%99%E3%81%A7Python%E3%82%92%E5%8B%89%E5%BC%B7%E3%81%A7%E3%81%8D%E3%82%8B%E3%82%B3%E3%83%B3%E3%83%86%E3%83%B3%E3%83%8410
[Must-see for beginners] What is Python? Thorough explanation of language characteristics, share, and work market|samurai...
http://www.sejuku.net/blog/7720
Don't be bitten by Python:List of security risks to watch out for|programming...
http://postd.cc/a-bite-of-python/
What is Python-Hatena Keyword-Hatena Diary
http://d.hatena.ne.jp/keyword/Python
Learning site from introduction to application of Python
http://www.python-izm.com/
Learn with Python An introduction to programming from the basics(1)Programming in Python...
http://news.mynavi.jp/series/python/001/
Download Python | Python.org
https://www.python.org/downloads/
driver.quit ()
.save_screenshot ()
became a 1x1 image. Maybe I need some options. ~~At least on OS X, I could easily use Headless Chrome. It would be even easier if it could be used on the Stable channel. If you remove the --headless
option, the screen will be displayed, so I'm happy that it seems easy to debug.
Recommended Posts