The story of migrating a stray batch without an owner from EC2 to a Docker environment

This article is the 14th day article of Cloudworks Advent Calendar 2020.

Introduction

Hi, this is @ shimopata. Recently, I'm addicted to craft cola [^ 1]. Last year, I wrote an article entitled "Story of successfully converting 200 million production data". It was a valuable experience that I rarely experience at the personal development level, but this year as well, I was involved in work that I would not have experienced unless I was a company that provides in-house services. I thought that no one would have the same experience, but I thought it would be useful if it could be used as much as possible.

Overview

The other day, I migrated the batch group that was running on EC2 to the batch server platform. The batch server platform was introduced earlier on the CrowdWorks engineer blog, and is a batch execution environment using Fargate. Previously introduced article → Story of making batch serverless and migrating to Fargate due to EOL of Amazon Linux [^ 2]

As an image of migration, we prepared 1 batch and 1 container environment as shown in the following figure. [^ 3]

upload_2c5be6cd6de0d619a85e1cf9c55c7fd3.png

My team's main focus was on migrating layers above the container running on Fargate. Therefore, I had to create a Docker file and make appropriate modifications to the batch to make it work in it. There are 6 types of batches to be migrated, and the process of support is different for each, so we will pick up only one and introduce the process of support.

Why you need to migrate in the first place

Various batches were running on CrowdWorks, and they were distributed and executed in several instances. Some of them were running on Amazon Linux, but I was notified that Amazon Linux will expire at the end of December 2020.

It is stated that the maintenance support period will be set even after the deadline, but from the viewpoint of dealing with future vulnerabilities and extensibility, we have migrated for the following reasons.

--Only important and critical security updates are offered as a reduced package set and will not take care of everything. --Support for new features may not be guaranteed

Current issues of batch operation

Many of the batches to be migrated had been in operation before I joined the company, and due to the so-called historical background and the change of personnel, I had the following issues.

--Batches of multiple contexts are running on the same server, and the relationship between each batch is unclear. --There are few documents about servers and batches. Even if there are things that have not been updated and are not accompanied by reality ――Few people know why it is necessary and what effect it will have if it is stopped. --The engineer at that time was absent due to a transfer, etc.

It was necessary to clarify the relationship and range of influence of each batch in a state like a stray batch that is not a stray cat.

What I did for batch migration

It is roughly divided into the steps of research, design, implementation, migration, and document maintenance. I will explain each item in detail.

Research

When migrating, we will investigate how the current batch works in the first place and what kind of execution environment it is. The specific investigations are as follows.

――What is it used for and what kind of business problems will be solved? ――We interviewed the people concerned in the past and reconfirmed the necessity of batches in the first place. This is because if you find that you don't need it, you don't have to migrate in the first place. --cron definition ――I checked the execution interval and which user is running it. --The contents of the file running from cron -I checked the contents of the defined shell files and scripts and understood the processing contents. --Environment variables registered on the server --Check what variables are used and how by combining environment variables with the script being executed.

By doing these things, you can see the whole picture of the system. Once you can see the whole thing, it's time to create a service overview diagram. The image of the outline diagram of the batch that we were in charge of is as follows.

upload_6e1c020c34e3f50182910b49fa9d6907.png

  1. Launch batch
  2. Execute SQL against the database for data analysis
  3. Save the execution result in the server
  4. Transfer to the specified external server

design

Now that we have a complete picture of the current batch, we have created a post-migration overview. At this time, we examined the impact of changing the execution platform from EC2 to Fargate. In the case of this batch, the SQL execution result was saved as CSV in EC2 for one month for backup. When the execution environment became Fargate, I thought that it would be better to save it in S3 later, so I reviewed the design.

The image of the final design is as follows.

upload_8a6ccd4b7e27b1ca3291f10faff93298.png

If you make a blueprint like this

--Easy to explain what function you are making at the time of review --Can be used as a material for handing over —— Above all, deepen your understanding of your system

It is recommended to create it because it is a good thing.

Implementation

First, I started by creating a Docker file. The image used was Alpine Linux. The reason for adoption is

--Batch requirements are as simple as executing SQL and transferring files --Alpine Linux contains only the minimum required packages, so you can build a secure environment. --Alpine Linux has a light image, so it builds quickly

That is the point. Fast build is a very important point. To create an environment for running batches

--Install the required packages → Rebuild --Check the operation → You can see that the prerequisite packages are missing → Rebuild --I find that I don't have the required permissions to run the batch → Rebuild --Comparing with the migration source environment, you can see that it is necessary to create a user → Rebuild

So, every time I make a change to the Docker file, I build it and check the operation. If it is about 1 or 2 times, the build time does not matter, but as mentioned above, the build is repeated many times while repeating trial and error, so a light image and quick build will greatly increase development efficiency. It depends. Therefore, it is recommended to use a light image of Docker, which eliminates unnecessary ones as much as possible.

Migration

After implementing the Docker files and modifying the batch, I made a migration plan. Why do you need to plan your migration?

--Because it may destroy existing data --The original batch is not an idempotent batch and re-execution of the batch may not restore the original data

And, if it fails, there are innumerable risks. Therefore, we made a migration plan from the following perspectives.

--Clarify what you should expect when the migration is complete --It cannot be concluded that the batch is operating normally just by completing the execution of the batch without any errors. For example, the batch may have been executed successfully, but the intended data has not been created. In some cases, the number of output data may differ depending on the timing of execution. You should always decide what made the migration complete. --Clarify what the impact would be if a batch failed ――When the execution fails, in some cases, someone should be affected by it. I checked in advance what kind of impact it might have. ――Is it possible to switch back, and what should be the timing and method of switching back? ――Whenever a batch that you thought to succeed fails, humans are impatient. Then, it is easy to make unnecessary operation mistakes and judgment mistakes. I checked in advance whether it is possible to switch back to the original batch and how to do it.

Even if it is troublesome to plan in advance, it is recommended to leave it as a sentence as much as possible because it will be useful in case of an emergency.

By the way, the case of this batch is as follows.

――What is the expected result? --The number of CSV file lines generated during batch execution before and after migration is the same. ――What is the impact of a batch failure? --Some of the data presented to the user becomes outdated ――Is it possible to switch back, and what should be the timing and method of switching back? --It is possible to switch back by executing the batch before migration

You can switch back, and you can rest assured that it is not a high-risk work that you can return to the original state after executing the original batch.

Document maintenance

The hard part in migrating the system was reading the current specifications. In some cases, the necessary information was written in the batch material, but in some cases, it did not describe which system was linked, cron settings, user information at the time of batch execution, and so on. If I didn't understand it, I had to go inside the server and check the value each time, and then I had to check and proceed with the migration work, which was a difficult task. Therefore, when the migration to the batch server platform was completed, we discussed "what kind of information would have made the migration work easier if it remained as a document" and decided to leave that information as a Readme. did.

Here is the material created at the time of the meeting. upload_081307001f0ab5dd0d44d1270d3d3a13.png

I will pick up some of them and introduce them.

Write down what this service (batch) is doing and what it is solving

For example, write "Batch to create a summary table for the database for analysis". Also, it would be very helpful for those who maintain this batch later if you briefly describe what kind of problem occurred and what was created to solve it. For those who come later, it's often hard to tell if this batch is still needed. In fact, it's not unlikely that the batch is unnecessary due to other factors over time. It is possible to read the behavior from code and configuration files, but it is difficult to read the original requirements from them. Therefore, you should describe not only what you are doing, but also why you made it and what kind of problem you had when you made it (what you wanted to solve).

Describe the overall outline of the function

Depending on the created system or batch, resources other than application server and DB are often used. I think there are cases where it is an AWS resource (S3, SQS, etc.) or uses the API of an external application. Just by creating a simple configuration diagram for each resource, the understanding of the engineers who came later will increase. Personally, I recommend using drowio because you can easily create easy-to-read diagrams.

Description of deployment flow

If it is a recently created system, I think that it is common to deploy using CI/CD tools such as CircleCI. In fact, most of our systems are mostly automated by such tools. However, if for some reason you need to deploy manually, leave that step.

Impact when stopped

It's also related to "what the service is doing", but let's also describe the impact if the service is stopped or the batch is stopped. The successor will be very helpful if you know what priority you need to take when a service goes down for some reason.

At the end

The issues that were raised as issues before the migration work

--Batches of multiple contexts are running on the same server, and the relationship between each batch is unclear. --Solution by managing containers separately for each context --There are few documents about servers and batches. Even if there are things that have not been updated and are not accompanied by reality --Solved by preparing documents ――Few people know why it is necessary and what effect it will have if it is stopped. --Solved by preparing documents

I was able to solve it in the form of. I'm glad that I was able to make the transition by eliminating the original debt.

It feels like it's a very niche article after I finish writing it, but I hope it helps you.

Tomorrow's article will be @ tmknom's article on organizational management on batch migration. I only have a feeling of a blockbuster, so please read that as well !!

[^ 1]: If you are interested, you can buy it online or make it yourself by searching. [^ 2]: If you like, please read this article as well! [^ 3]: Since batches in the same context are used using the same container, there is also an environment where two or more batches are executed for one container.

Recommended Posts

The story of migrating a stray batch without an owner from EC2 to a Docker environment
The story of migrating from Paperclip to Active Storage
How to install Docker in the local environment of an existing Rails application [Rails 6 / MySQL 8]
How to use git with the power of jgit in an environment without git commands
The story of raising Spring Boot from 1.5 series to 2.1 series part2
The story of pushing a Docker container to GitHub Package Registry and Docker Hub with GitHub Actions
Migrating from vargrant to docker
I tried to build the environment of WSL2 + Docker + VSCode
Story from inexperienced x self-study to becoming an engineer of a web-based in-house development company
What I was addicted to when updating the PHP version of the development environment (Docker) from 7.2.11 to 7.4.x
The story of setting up an Oracle V $ -like thing from 0 on PostgreSQL: 12 of the official Docker image
How to build an environment with Docker, which is the minimum required to start a Rails application
A story of frustration trying to create a penetration environment on Ubuntu 20.04
I tried to build the environment of PlantUML Server with Docker
A story of connecting to a CentOS 8 server with an old Ansible
The story of toString () starting with passing an array to System.out.println
Set up a Wordpress Docker environment without using the Worpdress image
[chown] How to change the owner of a file or directory
The story of introducing Gradle as a retrofit to an existing system that did not manage packages
The story of forgetting to close a file in Java and failing
The story of switching from Amazon RDS for MySQL to Amazon Aurora Serverless
A story that was embarrassing to give anison file to the production environment
I tried migrating the portfolio created on Vagrant to the Docker development environment
Investigate the replacement from Docker to Podman.
[Docker] Building an environment to use Hugo
The story of updating SonarQube's Docker Container
Create a MySQL environment with Docker from 0-> 1
Build a WAS execution environment from Docker
The story of RxJava suffering from NoSuchElementException
Find the difference from a multiple of 10
Up to the point of launching a Docker container built using RedHat Quarkus
Introduce Docker to the development environment and test environment of existing Rails and MySQL applications
How to get an arbitrary digit from a number of 2 or more digits! !!
How to build an environment of [TypeScript + Vue + Express + MySQL] with Docker ~ Vue edition ~
About the solution of the error that occurred when trying to create a Japanese file of devise in the Docker development environment
How to build a Jenkins server with a Docker container on CentOS 7 of VirtualBox and access the Jenkins server from a local PC
A story addicted to EntityNotFoundException of getOne of JpaRepository
Make a margin to the left of the TextField
How to get a heapdump from a Docker container
The story that docker had a hard time
The story of introducing Ajax communication to ruby
Set the time of LocalDateTime to a specific time
Improve the performance of your Docker development environment
The story of raising Spring Boot 1.5 series to 2.1 series
A reminder of Docker and development environment construction
The story of adding the latest Node.js to DockerFile
[Rails] How to build an environment with Docker
Temporarily move Docker environment from Mac to AWS
From building an AWS cloud environment to deploying a Spring Boot app (for beginners)
[Docker] How to see the contents of Volumes. Start a container with root privileges.
Docker command to create Rails project with a single blow in environment without Ruby
I built an environment to execute unit tests using Oracle database (oracle12c) on the Docker in Docker (dind) image of GitLab-CI.