I made a list site of Kindle Prime Reading using Scrapy and GitHub Actions

Background

While refraining from going out, I participated in Amazon's prime membership, but I rarely use it except to buy rice and drinks. Just the other day, I started using the privilege called Prime Reading. However, I would like to see what kind of books I can read, but checking page by page is still troublesome, so I launched a list / search site using Scrapy.

Go here: https://kpr.gimo.me/

What you are using

-Scrapy (Get HTML, parse, etc.) --DataTables (Store data in tables) -GitHub Pages (Site Creation) -GitHub Actions (Automation)

Development flow

Scrapy You can write Spider to define any data acquisition, extraction, etc. Click here for details: https://github.com/masakichi/KindleSpider/blob/master/KindleSpider/spiders/PrimeReading.py

Once complete, you can get all the books for about a minute using the command scrapy crawl PrimeReading -o public / output.json.

Write a minimum index.html

Representing the data acquired by Scrapy must be stored in HTML, fortunately it is easy to use the jQuery Plugin called DataTables. You can make a highly complete table with about 20 lines of code. (Equipped with sorting and search functions)

$('#prime-reading').DataTable({
    "paging": false,
    "order": [[4, 'desc']],
    "ajax": { "url": "./output.json", "dataSrc": "", "cache": true },
    "language": {
        "url": "./Japanese.json"
    },
    "columns": [
        { "data": "asin", "visible": false },
        { "data": "title", "render": function (data, type, row) { return `<div><a class="title" data-image="${row.cover}" href="https://www.amazon.co.jp/dp/${row.asin}/" target="_blank">${data}</a></div>` }, "width": "40%" },
        { "data": "author" },
        { "data": "star" },
        { "data": "rating_count" },
        { "data": "price" },
        { "data": "publish_date" },
        { "data": "cover", "visible": false },
    ]
});

Published on GitHub Pages

You can publish to GitHub Pages based on the index.html and output.json above. There are many ways to publish it online, so I will omit it here.

All automated with the power of GitHub Actions

If you define the requirements in the form of yaml as shown below, you can automatically acquire, extract and launch the site when the code is pushed and every day at UTC 00:00 (9:00 am Japan time).

name: publish to gh-pages

on:
  push:
    branches:
      - master
  schedule:
    - cron: "0 0 * * *"

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
      - uses: dschep/install-pipenv-action@v1
      - run: pipenv install
      - run: TZ='Asia/Tokyo' date --iso-8601="minutes" > public/update_time.txt
      - run: pipenv run scrapy crawl PrimeReading -o public/output.json
      - name: Deploy to GitHub Pages
        if: success()
        uses: crazy-max/ghaction-github-pages@v2
        with:
          target_branch: gh-pages
          build_dir: public
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Impressions

――It seems that there are many magazines in Prime Reading. ――It's great that GitHub Actions is convenient and free for 2000 minutes a month. ――Eiji Yoshikawa's Sangokushi can now be read for free on Prime Reading.