HP: PhantomJS For reference: Why is "WebKit" between Google and Apple so important?
In a nutshell, a Webkit-based Headless browser. Webkit is a rendering engine mainly used in web browsers. PhantomJS uses a javascript engine called JavaScriptCore (which also uses safari) built-in to webkit. It can also be used for scraping and screen capture.
When I was trying scraping with Python 3.4, I ran into a site with JavaScript rendering. I decided to follow the site [http://qiita.com/beatinaniwa/items/72b777e23ef2390e13f8#comment-8ec96aa8e93ceb9cf3ea) for quick web scraping with Python (while supporting JavaScript loading). ..
Installation
So the installation actually started.
$ brew install phantomjs
Experiment #0
I will try various things by referring to Quick Start published on the net. Basically, we will use the Javascript language.
Experiment #1 Hello, World!
Move to the directory where you want to project on terminal (or mkdir file name). Create a file with touch hello.js
in the project directory and start fleshing with ʻopen hello.js`.
hello.js
console.log('Hello, World!');
phantom.exit();
When I run phantomjs hello.js
on the terminal, it returnsHello, World!
. The first line, console.log
, throws the stringHello, world!
Into the terminal. The thrown information is actually executed by the command phantom.exit ()
. By the way, the code is not processed unless you write the phantom.exit ()
command.
Experiment #2 Page Loading It is possible to create an object on any web page using PhantomJS and load, analyze, and generate the web page.
page_loading.js
//Make a headless browser
var page = require('webpage').create();
//Open the specified URL
page.open('https://google.com', function(status) {
console.log("Status: " + status);
if(status === "success") {
//screen capture
page.render('google.png');
}
phantom.exit();
});
If successful, you should see Status: success
and you should have a google screen capture in your working directory.
Besides, it seems that you can measure the speed of loading pages. For example, if you want to know how fast a page loads to http://www.google.com
,
loadspeed.js
var page = require('webpage').create(),
system = require('system'),
t, address;
//<>Put the specified URL in
if (system.args.length === 1) {
console.log('Usage: loadspeed.js <http://www.google.com>');
phantom.exit();
}
t = Date.now();
address = system.args[1];
page.open(address, function(status) {
if (status !== 'success') {
console.log('FAIL to load the address');
} else {
t = Date.now() - t;
console.log('Loading ' + system.args[1]);
console.log('Loading time ' + t + ' msec');
}
phantom.exit();
});
If you run it on phantomjs loadspeed.js http://www.google.com
, it will be Loading http://www.google.com The execution result such as Loading time 698 msec
is displayed.
Experiment #3 Code Evaluation Basically, you can get JavaScript data on the web using the ʻevaluate ()` function. For example, when you want to get the title on a web page,
//Create a headless browser
var page = require('webpage').create();
//Open URL
page.open('http://www.google.com', function(status) {
//Get data via JS in the browser
var title = page.evaluate(function() {
return document.title;
});
console.log('Page title is ' + title);
phantom.exit();
});
When I run it on the terminal, it says Page title is Google
.
Supplemental tryout You can even use PhantomJs to provide driving guidance based on google map.
direction.js
// Get driving direction using Google Directions API.
var page = require('webpage').create(),
system = require('system'),
origin, dest, steps;
if (system.args.length < 3) {
console.log('Usage: direction.js origin destination');
console.log('Example: direction.js "San Diego" "Palo Alto"');
phantom.exit(1);
} else {
origin = system.args[1];
dest = system.args[2];
page.open(encodeURI('http://maps.googleapis.com/maps/api/directions/xml?origin=' + origin +
'&destination=' + dest + '&units=imperial&mode=driving&sensor=false'), function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
steps = page.content.match(/<html_instructions>(.*)<\/html_instructions>/ig);
if (steps == null) {
console.log('No data available for ' + origin + ' to ' + dest);
} else {
steps.forEach(function (ins) {
ins = ins.replace(/\</ig, '<').replace(/\>/ig, '>');
ins = ins.replace(/\<div/ig, '\n<div');
ins = ins.replace(/<.*?>/g, '');
console.log(ins);
});
console.log('');
console.log(page.content.match(/<copyrights>.*<\/copyrights>/ig).join('').replace(/<.*?>/g, ''));
}
}
phantom.exit();
});
}
To an appropriate directory, and execute it on the terminal at phantomjs direction.js departure point arrival point
. When I ran it with phantomjs direction.js Tokyo Osaka
as a trial,
Head south
Turn right at Tokyo Metropolitan Government South (intersection) toward Metropolitan Road 431
At Tsunohazu Citizens Center (intersection),continue onto Metropolitan Road 431
Turn left at Nishi-Shinjuku 4-chome (intersection) onto Yamate-dori/Route 317
Continue straight to stay on Yamate Dori/Route 317
Take the ramp on the right to Metropolitan Expressway Central Circular Route
Toll road
Continue onto Exit Hatsudai Minami Tollhouse
Toll road
Merge onto Metropolitan Expressway Central Circular Route
Toll road
Take exit Ohashi JCT on the right toward Tomei Expressway Inner Circular
Toll road
Keep left at the fork,follow signs for Tomei and merge onto Metropolitan Expressway No. 3 Shibuya Line
Toll road
Keep right to continue on Tomei Expressway
Toll road
Keep right at the fork to stay on Tomei Expressway,follow signs for right route, Shizuoka, Gotemba
Toll road
Take exit Gotemba JCT toward Shin Tomei / Shizuoka / Nagoya
Toll road
Continue onto Shin Tomei Expressway
Toll road
Continue onto Exit Hamamatsu Inasa JCT
Toll road
Keep right at the fork,follow signs for Tomei / Tokyo / Nagoya and merge onto Shin Tomei Expressway
Toll road
Take exit Mikkabi JCT on the right toward Tomei / Nagoya
Toll road
Merge onto Tomei Expressway
Toll road
Take exit Toyota JCT toward Tokai Kanjo Expressway, Isewangan Expressway, Toyota East Exit, Toki JCT, Yokkaichi, Shin-Meishin Expressway
Toll road
Keep right at the fork,follow signs for Isewangan Expressway, Yokkaichi, Shin-Meishin Expressway and merge onto Isewangan Expressway
Toll road
Take exit Tobishima IC on the right toward Isewangan Expressway
Toll road
Continue onto Isewangan Expressway
Toll road
Take exit Yokkaichi JCT on the right toward Higashi-Meihan Road / Osaka / Ise Road
Toll road
Merge onto Higashi-Meihan Expressway
Toll road
Take exit Kameyama JCT toward Shin-Meishin Expressway, Kyoto, Osaka
Toll road
Continue onto Shin-Meishin Expressway
Toll road
Take exit Kusatsu JCT toward Kusatsu PA / Meishin / Keiji / Kyoto / Osaka
Toll road
Keep right at the fork to continue on Exit Kusatsu PA,follow signs for Meishin and merge onto Meishin Expressway
Toll road
Keep right at the fork to stay on Meishin Expressway,follow signs for right route
Toll road
Take exit Toyonaka IC toward Hanshin Expressway / Toyonaka Exit / Osaka City
Toll road
Keep left at the fork,follow signs for Osaka City / General Road Exit / Hanshin Expressway
Toll road
Keep right at the fork to continue on Exit High-speed Toyonaka IC,follow signs for Hanshin Expressway
Toll road
Continue onto Exit Meishin Hanshin Tollhouse
Toll road
Merge onto Hanshin Expressway No. 11 Ikeda Line
Toll road
Merge onto Hanshin Expressway No. 1 Loop Line
Toll road
Take exit Toshitaka Kitahama toward Kitahama exit
Partial toll road
Turn right at Sugaharacho Nishi (intersection) onto Sakaisuji
Slight right to stay on Sakaisuji
Slight right onto Tenjinbashi
Turn right at Tenjinbashi (intersection) onto Tosabori-dori/Prefectural Road 168
Turn right at Kitahama 2 (intersection) toward Nakanoshima-dori
Turn is not allowed 8:00 AM – 8:00 PM
Turn left toward Nakanoshima-dori
Turn left onto Nakanoshima-dori
Destination will be on the left
Apart from accuracy, it seems to move for the time being. It turns out that you can try various things with PhantomJS alone. Reference: direction.js
Recommended Posts