[JAVA] A story about creating a service that proposes improvements to a website using a machine learning API

Preface

We have developed an open source web service ** "Visible" ** that diagnoses websites from the perspective of accessibility and proposes best practices based on the information obtained using the AI platform.

Visible ─ アクセシビリティー診断＆修正提案

-** Website URL: ** https://visi.dev -** GitHub repository (Star please!): ** https://github.com/visible/visible

There have been services such as Google's Lighthouse and lighthouse that diagnose websites for a long time, but ** they are new services that not only diagnose but also suggest improvements **. In addition, it is designed to deepen the understanding of accessibility, and the command line version can be executed standalone.

It was adopted by the "Unexplored Junior" (https://jr.mitou.org/) "Support Program for Creators of Elementary, Junior High and High School Students with Original Ideas and Outstanding Technology" in 2020, and received technical and financial support. The Final Results Report Meeting on YouTube Live will be held on November 1st!

Introduction of functions

By entering the URL of the website, the page will be diagnosed and improvements will be suggested automatically.

診断結果のスクリーンショット

Below is an example of the proposed modifications.

`alt` attribute

It is recommended that the <img> element be provided with the alt attribute to explain the content of the image to screen readers and SEO crawlers, and generate captions using the Google Cloud Platform Vision API. We will suggest suggestions for improvement.

`lang` attribute

If the language of the web page is not explicitly specified, there is a problem with the user agent that requires the language information, so the Translate API detects the language information from the content of the page and proposes it.

Color contrast ratio

Of course, improvement proposals that do not use machine learning are also possible. If the color contrast ratio is low, it is difficult for users with color vision characteristics to use it, so we propose a correction that increases the contrast.

コントラスト

Accessibility best practices are standardized by the W3C under the names WCAG wcag, and there are several other standards-based rules.

Proposal mechanism

To execute the diagnostic program, Puppeteer, which can execute Chrome headlessly, is used, the program that implements the checkpoint (Rule) interface is executed, and the file information and link are based on the values returned from each rule. It can be attached and finally displayed as a difference.

ワークフロー。下で詳解します

The whole is composed of three components, core, plug-in, and application, and the actual processing is written by the plug-in in the form of implementing the minimum implementation and interface exposed in the core. The format of the plug-in is based on ESLint.

動作フローを図で示しています。下で詳しく説明します。

Since the improvement generation algorithm and headless browser can be extended as a plug-in like rule, it is possible to use other methods besides Google Cloud Platform.

Technology used

The project is developed in TypeScript. The Tech stacks are:

** Core part **

--Puppeteer --Headless browser using Chrome. I'm using it to run a diagnostic program --domhandler --HTML AST --PostCSS --Known as a preprocessor, but using stylelint as a reference for AST

** Web backend **

--Clean Architecture --This is a famous backend design method with 4 layers. --TypeORM --ORM for TypeScript. --Bull --A job queue framework using Redis. --Apollo --GraphQL implementation of Node.js

** Web front end **

--Next.js --BFF for React that does SSR / SSG / Lambda --Apollo --GraphQL implementation of Node.js --Tailwind CSS --Utility-first CSS framework -i18next --This is an internationalized library for JS.

** Other **

--GitHub Actions --Used for CI / CD --Docker --I'm using it to deploy the web version -Yargs --The framework used for the CLI version. --Lerna --A tool for JS mono repositories

Episode during development

While I've always been interested in the technology of web accessibility, I've never been forced to use assistive technology, and my only motivation for correct markup was optimization for search engines. Even so, I often create websites that lack accessibility, and I thought it would be nice to have software like ESLint that would teach me how to fix it.

At the same time, the age limit for applying for unexplored juniors is up to 17 years old, and at that time I was already 17 years old, so I was thinking of trying at the end, so I started making prototypes to make it and apply at the right opportunity. I did.

prototype

I myself have been writing programs since I was a junior high school student, and I have been working part-time, so I wrote a lot of code, but it is still quite difficult to finish the prototype with only a few months to apply (honestly, I applied more than after adoption). I needed to set a clear goal (the stage might have been busier).

The prototype was not essential for the application itself, and I should have sent a document outlining the product, but in order to be adopted, I had to show my own technology and prove that I could complete it to the end. did. On the contrary, if you can show it, you can judge what is interesting and narrow down the items that can be diagnosed to the minimum, discard the "correction proposal" and do up to "the diagnosis result comes out when you enter the URL" I decided to. After that, I wrote the fluffy concept in my brain in a notebook, incorporated it into the domain model, and started writing code with the most familiar tech stack.

Screenshot of prototype at the time of application *

A prototype that works in about two months was completed, and the document screening was successfully completed. After that, I was supposed to be interviewed and asked the mentors online about the product. To be honest, I don't remember what I said, but I remember being asked what kind of goals the product itself had in the future, and I could only give a vague answer ...

After adoption

After the adoption, we already had a prototype, so we developed it agile. Unexplored juniors are supposed to receive mentoring on a regular basis, and in my project I reported progress to the mentor once a week and received feedback, so I set milestones about each week and subdivided by then. We proceeded with the development of the function.

The priority of the tasks until they were used by the user was completely in the order of the strongest impact when I said "I can do this", but now I think about it, it's a very good method. I think it wasn't. However, thanks to that, I feel that the speed has increased without being overly tampered with the details of the framework, even if I am concerned with business logic.

Examples of impact-oriented functions *

First user test

During the period, the unexplored junior had the opportunity to take the stage immediately after the adoption, at the middle point, and the last three times (although it was online this year), and since the opportunity for the first presentation came, it is already in the prototype + α stage. Was announced.

At that time, I had people who shared and listened to the URL that deployed the one at that time actually used it, but I made a mistake in setting the shared memory of docker or synchronized where it should be asynchronous I wanted to respond and the server went down immediately after the announcement, so I could not get the feedback I expected (Tohoho ~)

Interviews with people who specialize in A11y

After a quarter of the period, with the cooperation of the mentors, I had the opportunity to give feedback to the accessibility team of a certain company.

We were able to ask what kind of process is going on in the field of accessibility, what kind of tools are used, problems specific to team development, and in this interview, we were able to materialize the storyline and narrow down the user base. It was. I think the task priority has become clearer from here.

Subsequent improvement

Using the second announcement at Unexplored Junior (of course, we took measures against the load ...) and Twitter followers, we divided the feedback loop into subdivisions in which people actually use it and answer the questionnaire.

Feedback is a form for feedback created in Google Forms by searching for "How to create a good survey" and popping up the question that is said to be good, and a Google Analytics tag embedded. I try to get it from multiple sides.

In particular, we place importance on what information the user wants, for example, assign a URL to an unimplemented function and prioritize the function according to how much traffic is generated at that URL to reflect it in development. did.

Google formのスクリーンショット

Where it gets stuck

It may be a great niche, but I will write a memo about the development.

Cannot map AST with information that can be obtained from CSS DOM

Source code such as HTML and CSS is converted to DOM after being parsed by the browser and made available from JavaScript, but which file or declaration is applied from the CSS information that can be obtained from methods such as getComputedStlye. I don't know if it is done.

Therefore, I decided to use [Chrome Devtools Protocol] CDP, which is the API of Google Chrome developer tools. Since CDP can get the information of the style sheet read from the event [CSS.styleSheetAdded] styleSheetAdded, when a problem is detected, the corresponding CSS file is searched from the Node ID and the corresponding CSS property, and it is set to AST of PostCSS. I was able to convert and handle it.

The question of where to put the ORM in Clean Architecture

The book explains that "the Interface adapter layer is the layer that converts to the data format required by the framework layer", so the part that issues SQL queries is the interface adapter layer, and it is the framework that executes it with a concrete RDBMS. Although it is treated as a layer, in ORM the boundary between these two processes is ambiguous, and even if you google it, it seems that various people are saying that they are completely different.

TypeORM is an abstraction of which RDBMS to use (although there are limits, of course), and it is finally decided by ormconfig, so if you do not mention the details, you can divide it in the interface adapter layer. I decided to implement DAL.

As an aside, the presenter that converts the domain model to the API form also keeps the direction of dependence by using the type defined on the presenter side instead of handling the GraphQL definition directly.

Doing Docker in Monorepo (Yarn Workspace) is too painful

I think that modules do not depend much on microservices, but if you want to use Yarn workspace and Docker in a case where there are packages shared by the front end and back end, each package I can't create a lockfile, or because the following node_modules is a symbolic link, it doesn't work even if I simply copy it, and it gets stuck.

For now, I've written a [messy workaround] dockerfile and it's working. Yarn v2 (berry) provides a function called [yarn workspace focus] focus, and it seems that you can install only the dependencies of the package you want and run them independently, so I want to move quickly. However, I haven't touched it yet because I don't fully understand the surroundings of Plug'n'Play.

styled-components are spicy

For the first time, I made a UI component like a design system properly, but when one component seems to have multiple variants (below), if I do it with styled-components, the readability is the worst and I finally escaped to Tailwind.

const Button = styled.button`
  font-size: 12px;

  ${
    (props) => props.variant === 'primary'
      ? css`
        color: ${props.theme.primaryFgColor};
        background-color: ${props.theme.primaryBgColor};
      `
      : css`
        color: ${props.theme.secondaryFgColor};
        background-color: ${props.theme.secondaryBgColor}
      `;
  }
`;

Details: https://qiita.com/rigarashi/private/5c97be5ed8fb15ea2d96

There seems to be a styled-system or xstyled that imported Utilify-first to CSS-in-JS, but I gave it up because the theme type was not statically attached.

Parallel processing with Puppeteer

The library called puppeteer-cluster seemed to be nice, but it seems to be unmaintained, and receiving an instance of page as a callback and doing simple processing I could only do it, and I stopped it because it was strict in this use case such as converting to Observable.

Instead, the so-called Object Pooling manages busy and non-busy instances and throws processing to those that seem to be free. .. I think it's probably better to fork the worker process, but it didn't make much sense because it was already separated from the Chromium process when Puppeteer was started, and it became the source of GraphQL subscription. It's just a singleton, and you can't tell the browser unless you call it in the same process without redis.

Afterword

Forgive me for the slightly mine-like title as a result of thinking about how to get people's attention with accessibility, which is often despised.

In addition, we have released the web version etc., so please try it and ask for ** feedback **.

-** Website URL: ** https://visi.dev -** Feedback: ** https://forms.gle/SvSkkKMs1NDbm4vn9