Prepare a scraping environment with Docker and Java

what's this

Prepare the scraping environment with Java. However, I don't want to install Chrome, so let Docker (selenium / standalone-chrome) do it.

reference

Thank you very much. https://qiita.com/wizpra-koyasu/items/7b7e0938ad6d36caf4be https://stackoverflow.com/questions/12836114/selenium-webdriver-remote-setup https://www.seleniumhq.org/docs/03_webdriver.jsp

Prerequisites

Run Selenium Standalone-chrome

This is very easy. Just click + NEW in Kitematic, search for standalone-chrome and CREATE If you start it safely, you will see which port it is published on at ʻACCESS URL`, so make a note of it. If you understand, I think you should specify Ports

Hit Selenium from Java

Almost the sample code

package org.openqa.selenium.example;

import java.net.MalformedURLException;
import java.net.URL;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;

public class Selenium2Example  {
    public static void main(String[] args) {
        // Create a new instance of the Firefox driver
        // Notice that the remainder of the code relies on the interface,
        // not the implementation.

        DesiredCapabilities capability = DesiredCapabilities.chrome();

        WebDriver driver = null;
		try {
			driver = new RemoteWebDriver(new URL("http://localhost:32778/wd/hub"),
			        capability);
		} catch (MalformedURLException e) {
			//TODO auto-generated catch block
			e.printStackTrace();
		}

		if(driver != null) {

	        // And now use this to visit Google
	        driver.get("http://www.google.com");
	        // Alternatively the same thing can be done like this
	        // driver.navigate().to("http://www.google.com");

	        // Find the text input element by its name
	        WebElement element = driver.findElement(By.name("q"));

	        // Enter something to search for
	        element.sendKeys("Cheese!?");

	        // Now submit the form. WebDriver will find the form for us from the element
	        element.submit();

	        // Check the title of the page
	        System.out.println("Page title is: " + driver.getTitle());

	        // Google's search is rendered dynamically with JavaScript.
	        // Wait for the page to load, timeout after 10 seconds
	        (new WebDriverWait(driver, 10)).until(new ExpectedCondition<Boolean>() {
	            public Boolean apply(WebDriver d) {
	                return d.getTitle().toLowerCase().startsWith("cheese!");
	            }
	        });

	        // Should see: "cheese! - Google Search"
	        System.out.println("Page title is: " + driver.getTitle());

	        //Close the browser
	        driver.quit();
		}
    }
}

Since Selenium is used, put the library in Maven or Gradle. I used maven

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>hoge</groupId>
	<artifactId>fuga</artifactId>
	<version>0.0.1-SNAPSHOT</version>

	<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
	<dependencies>
		<dependency>
			<groupId>org.seleniumhq.selenium</groupId>
			<artifactId>selenium-java</artifactId>
			<version>2.41.0</version>
		</dependency>
	</dependencies>

</project>

If you move it with this, you should get the following output ... It's easy.

Page title is: Cheese!? -Google search
Page title is: Cheese!? -Google search

Question

Where will it be stored when I take a screenshot? I made Remote's WebDriver for the first time, but I wonder if the generated image can be controlled on the client side. Or will it be held on the server and collected? Well, I'm sure it's the former ... Let's find out tomorrow ...

Recommended Posts

Prepare a scraping environment with Docker and Java
Prepare the environment for java11 and javaFx with Ubuntu 18.4
Create a Vue3 environment with Docker!
Prepare Java development environment with Atom
Build a Node-RED environment with Docker to move and understand
Prepare Java development environment with VS Code
Build a PureScript development environment with Docker
Create a MySQL environment with Docker from 0-> 1
Build a Wordpress development environment with Docker
Install Docker and create Java runtime environment
Prepare a transcendentally simple PHP & Apache environment on Mac with Docker
I built a rails environment with docker and mysql, but I got stuck
[Memo] Create a CentOS 8 environment easily with Docker
Make SpringBoot1.5 + Gradle4.4 + Java8 + Docker environment compatible with Java11
[Windows] [IntelliJ] [Java] [Tomcat] Create a Tomcat9 environment with IntelliJ
Build a Laravel / Docker environment with VSCode devcontainer
Build a WordPress development environment quickly with Docker
Creating a java web application development environment with docker for mac part1
[Copy and paste] Build a Laravel development environment with Docker Compose Part 2
A reminder of Docker and development environment construction
Build a development environment for Docker, java, vscode
Create a java web application development environment with docker for mac part2
Create a Spring Boot development environment with docker
Build a Java development environment with VS Code
Build Apache and Tomcat environment with Docker. By the way, Maven & Java cooperation
Easily build a Vue.js environment with Docker + Vue CLI
[Note] Build a Python3 environment with Docker in EC2
Build Java development environment with WSL2 Docker VS Code
[Environment construction] Build a Java development environment with VS Code!
Build WordPress environment with Docker (Local) and AWS (Production)
Try to build a Java development environment using Docker
Comfortable Docker environment created with WSL2 CentOS7 and Docker Desktop
Creating a lightweight Java environment that runs on Docker
Pytorch execution environment with Docker
Build docker environment with WSL
React environment construction with Docker
HTML parsing with JAVA (scraping)
Until you build a Nuxt.js development environment with Docker and touch it with VS Code
Create a Java (Gradle) project with VS Code and develop it on a Docker container
[Rails] [Docker] Copy and paste is OK! How to build a Rails development environment with Docker
How to quit Docker for Mac and build a Docker development environment with Ubuntu + Vagrant
Compiled kotlin with cli with docker and created an environment that can be executed with java
Create a Java (Maven) project with VS Code and develop it on a Docker container
Docker × Java Building a development environment that is too simple
Deploying a Java environment with Windows Subsystem for Linux (WSL)
I tried to create a java8 development environment with Chocolatey
I made a development environment with rails6 + docker + postgreSQL + Materialize.
I want to make a list with kotlin and java!
Easy environment construction of MySQL and Redis with Docker and Alfred
I want to make a function with kotlin and java!
Socket communication with a web browser using Java and JavaScript ②
Socket communication with a web browser using Java and JavaScript ①
Create Rails5 and postgresql environment with Docker and make pgadmin available
Java: Start WAS with Docker and deploy your own application
Create a Java and JavaScript team development environment (gradle environment construction)
I tried to create a padrino development environment with Docker
Rails + MySQL environment construction with Docker
Node.js environment construction with Docker Compose
Deploy a Docker application with Greengrass
Build Couchbase local environment with Docker
Build a Java project with Gradle