[JAVA] Scraping with puppeteer in Nuxt on Docker.

Introduction

Ah, I want to scrape.

So I will scrape it with Nuxt on Docker. In the case of node type, the library called puppeteer seemed to be recommended for scraping, so scraping quickly from Nuxt's server Middleware.

I grew up being told that I shouldn't bother people too much, so I'll scrape my site. (You don't need to log in. Please use it ♪) toribure | Simple is the best brainstorming tool that can be used alone or as a team image.png

It's a little publicity. There is an image of a cute bird (Irasutoya) on the top page. This time, I will scrape this and display it.

Preparing Nuxt on Docker

I think there are many other articles around here, so let's take a look. By the way, the environment is

$ docker -v
  Docker version 19.03.13-beta2, build ff3fbc9d55
$ docker-compose -v
  docker-compose version 1.26.2, build eefe0d31

was.

Make a Nuxt app

$ docker run --rm -it -w /app -v `pwd`:/app node yarn create nuxt-app scraping
? Project name: scraping
? Programming language: JavaScript
? Package manager: Yarn
? UI framework: None
? Nuxt.js modules: Axios
? Linting tools: 
? Testing framework: None
? Rendering mode: Universal (SSR / SSG)
? Deployment target: Server (Node.js hosting)
? Development tools: 

Only axios will be used later, so I will consciously include it.

By the way, at the time of article creation, the version of the node: latest image was 14.9.0, the version of create-nuxt-app was v3.2.0, and the version of nuxt was 2.14.0.

Prepare Dockerfile, docker-compose.yml

From here on, the just-created scraping / directory is your working directory.

$ cd scraping

Dockerfile


FROM node

ENV HOME=/app     \
    LANG=C.UTF-8  \
    TZ=Asia/Tokyo \
    HOST=0.0.0.0

WORKDIR ${HOME}

RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

COPY package.json ${HOME}
COPY yarn.lock ${HOME}
RUN yarn install

COPY . ${HOME}
EXPOSE 3000
CMD ["yarn", "run", "dev"]

For more information on RUN apt-get ..., see Troubleshooting puppeteer. there is. It means that an error will occur if the browser and fonts are not prepared in the container.

docker-compose.yml


version: "3"

services:
  nuxt:
    build: .
    volumes:
      - .:/app
    ports:
      - 3000:3000

Once this is done, build the container once.

$ docker-compose build

I'll make a scraping app

Introduced puppeteer

I will put it in yarn.

$ docker-compose run --rm nuxt yarn add puppeteer

Make an API

We will use serverMiddleware. We will scrape through serverMiddleware as an API by referring to express-template which is also officially introduced.

js:nuxt.config.js


export default {
  ...
  ,
  serverMiddleware: {
    '/api': '~/api'
  }
}

This will direct / api access to ~ / api / index.js. So I will make a file.

$ mkdir api
$ touch api/index.js api/scraping.js

I made two files, but index.js is the receiver and I'm going to let scraping.js do the actual processing.

api/index.js


const app = require('express')()
const scraping = require('./scraping')

app.get('/get_image', async(req, res) => {
  const image = await scraping.getImage()
  res.send(image)
})

module.exports = {
  path: '/api',
  handler: app
}

This is to call the get_image () method of scraping.js when / api / get_image is accessed.

api/scraping.js


const puppeteer = require('puppeteer')

async function getImage() {
  const browser = await puppeteer.launch({
    args: [
      '--no-sandbox',
      '--disable-dev-shm-usage'
    ]
  })
  const page = await browser.newPage()
  await page.goto("https://toribure.herokuapp.com/")
  const image = await page.evaluate(() => {
    return document.getElementsByTagName("main")[0].getElementsByTagName("img")[0].src
  })
  return image
}

module.exports = {
  getImage
}

It almost follows the puppeteer official README. You can get and manipulate elements by using page.evaluate. As you can see from the HTML structure of this scraping destination (https://toribure.herokuapp.com/) with the Developer tool etc., there is only one in totalThis is an image of Mr. Tori, whose target is theimg element, which has only one under the main element. (It has a dirty structure, but it's cute) Once you know that, all you have to do is get the elements just like normal js.

This is the end of coding on the API side.

The front is refreshing

I'm getting tired, so when I press the button on the front, the image is displayed.

pages/index.vue


<template>
  <div>
    <button @click="showBird">Scraping!!</button>
    <br>
    <img v-if="src" :src="src">
  </div>
</template>

<script>
export default {
  data() {
    return {
      src: ""
    }
  },
  methods: {
    async showBird() {
      this.src = await this.$axios.$get("/api/get_image")
    }
  }
}
</script>

**Complete! !! ** **

Operation check

test.gif

A cute bird came out ♪

finally

If you can do so far, the rest is the world of DOM manipulation, so if you understand the structure of the target page and write js, you can scrape anything. Some sites prohibit scraping, so it would be great if you could realize various ideas while paying attention to that point!

reference

-[Nuxt] Scraping with Puppeteer-From data acquisition to display (serverMiddleware) --7839 -[[Procedure explanation] Easy scraping with JavaScipt! – Take a screenshot with Puppeteer | ProgLearn --ProgLearn --Proglearn, a comprehensive programming information site](https://blog.proglearn.com/2019/06/20/javascipt%E3%81%A7%E7%B0%A1%E5 % 8D% 98% E3% 82% B9% E3% 82% AF% E3% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0% EF% BC % 81-puppeteer% E3% 81% A7% E3% 82% B9% E3% 82% AF% E3% 83% AA% E3% 83% BC% E3% 83% B3% E3% 82% B7% E3% 83 % A7 /) -Scraping with Puppeteer | grgr-dkrk's blog -I tried scraping with Docker + docker-compose + puppeteer --Qiita -Data acquisition by Axios with Nuxt.js --Qiita

Recommended Posts

Scraping with puppeteer in Nuxt on Docker.
WordPress with Docker Compose on CentOS 8
Alert slack with alert manager in Docker environment
Use docker in proxy environment on ubuntu 20.04.1
Edit Mysql with commands in Docker environment
Try putting Docker in ubuntu on WSL
Build an environment with Docker on AWS
Launched Redmine with Docker on Raspberry Pi 3
Run Ubuntu + ROS with Docker on Mac
Dealing with composer installation errors in Docker
Starting with installing Docker on EC2 and running Yellowfin in a container
Put Zabbix in Ubuntu with Docker and monitor Docker on the same host
Try running MPLS-VPN with FR Routing on Docker
Liberty on Docker
Self-hosting with Docker of AuteMuteUs in Windows environment
Just install Laravel8 on docker in PHP8 environment
Introducing Rspec with Ruby on Rails x Docker
Environment construction command memo with Docker on AWS
Try running OSPF with FR Routing on Docker
Run JSP Hello World with Tomcat on Docker
Notes on building Rails6 / PostgreSQL with Docker Compose
Update container image with KUSANAGI Runs on Docker
Docker in LXD
Prepare a scraping environment with Docker and Java
Redmine on Docker
A simple CRUD app made with Nuxt / Laravel (Docker)
Error encountered with notes when deploying docker on rails
[ARM64] Docker server monitoring with New Relic on Docker on RasPi4
[Note] Build a Python3 environment with Docker in EC2
Display the list in setDetails on the screen with spring-security
How to delete untagged images in bulk with Docker
Display ROS application on Docker with GUI on host side
Database environment construction with Docker in Spring boot (IntellJ)
Docker installation on CentOS 6
python notes on docker
Rails deploy with Docker
Run Pico with docker
Explode Docker with WSL2
Install Docker on Manjaro
Use Puphpeteer with Docker
Make Nuxt apps Docker
Operate Emby with Docker
Try WildFly with Docker
Use ngrok with Docker
Run Payara with Docker
[Docker] Connection with MySQL
Php settings with Docker
With podman in docker, everyone wants to get along and use docker on a shared computer
Getting Started with Docker
M.S. docker on Windows
Disposable PHP with Docker
Docker installation on WSL2
Install Composer with Docker
Run phpunit on Docker
Setting the baseURL in the axios module of Docker environment Nuxt
Install docker and docker-compose on ubuntu in the shortest process
[Apple login] Sign in with Apple implementation procedure (Ruby on Rails)
Time is wrong with the application launched on the Docker container
I tried running WordPress with docker preview on M1 Mac.
Configuration script for using docker in proxy environment on ubuntu 20.04.1
Write DiscordBot to Spreadsheets Write in Ruby and run with Docker