There are some samples that use puppeteer-heroku-buildpack, but there aren't many examples that use Docker, so make a note.
Almost anything works with an HTTP server that just runs one container. All you have to do is start the HTTP server on the port number specified by $ PORT
.
The details are written around.
First, try deploying something that just uses Python's http.server.
Dockerfile
FROM python:3.8-alpine
heroku.yml
build:
docker:
web: Dockerfile
run:
web: python -m http.server $PORT
$ heroku create
$ heroku stack:set container
Just git push. It's really easy.
$ git push heroku master
puppeteer has the ability to download Chromium, but it's best to use Google Chrome (Stable) anyway. (prejudice)
So, try deploying puppeteer-core, which does not have Chromium download function, and Google Chrome in one Docker image.
Dockerfile
# ref: https://github.com/puppeteer/puppeteer/blob/main/.ci/node12/Dockerfile.linux
FROM node:12-slim
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
ENV PUPPETEER_EXECUTABLE_PATH /usr/bin/google-chrome-stable
ENV PUPPETEER_PRODUCT chrome
RUN mkdir /app
WORKDIR /app
COPY package.json /app/package.json
RUN npm install
COPY index.js /app/index.js
CMD npm start
For index.js, just create an HTTP server appropriately, search hogehoge on Google when parameters like ? Q = hogehoge
are specified, and display the text of the search result screen. With things.
index.js
const http = require("http");
const url = require("url");
const puppeteer = require("puppeteer-core");
//create a server object:
http
.createServer((req, res) => {
const requestUrl = url.parse(req.url, true);
let keyword = requestUrl.query["q"];
if (!keyword) {
res.write("request with ?q=xxxx");
res.end();
} else {
puppeteer
.launch({
args: ["--no-sandbox", "--disable-setuid-sandbox"],
executablePath: process.env.PUPPETEER_EXECUTABLE_PATH,
})
.then(async (browser) => {
const page = (await browser.pages())[0] || (await browser.newPage());
//Automatic operation from here
await page.goto("https://google.com/");
await page.type("input[name='q']", requestUrl.query["q"]);
await Promise.all([
page.waitForNavigation(),
page.keyboard.press("Enter"),
]);
let text = await page.evaluate(() => document.body.textContent);
await page.close();
res.write(text);
res.end();
});
}
})
.listen(process.env.PORT);
package.json is just like npm init --yes`` npm install puppeteer-core
without any twist.
package.json
{
"name": "heroku-docker-playground",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"start": "node index.js",
"test": "echo \"Error: no test specified\" && exit 1"
},
"repository": {
"type": "git",
"url": "git+https://github.com/YusukeIwaki/heroku-docker-playground.git"
},
"keywords": [],
"author": "",
"license": "ISC",
"bugs": {
"url": "https://github.com/YusukeIwaki/heroku-docker-playground/issues"
},
"homepage": "https://github.com/YusukeIwaki/heroku-docker-playground#readme",
"dependencies": {
"puppeteer-core": "^5.2.1"
}
}
heroku.yml
build:
docker:
web: Dockerfile
run:
web: npm start
It takes a long time because docker build is done at the time of deployment, but if you wait, the HTTP server will start naturally.
With this kind of feeling, if the Google case result text appears in a row, it is OK to communicate.
If puppeteer-core works, playwright-core will work as well.
For playwright, the Dockerfile is published by Microsoft, so you can refer to it. https://github.com/microsoft/playwright/blob/master/docs/docker/Dockerfile.bionic
It's a complete digression, but since playwright isn't intuitively available on Azure Functions at this point, Heroku is highly recommended for anyone who just wants to get playwright running in response to an HTTP request!
Recommended Posts