System operation element

--Library management --Monitoring --Life and death monitoring --Performance monitoring --Security monitoring --Capacity management --Log management --Security log --Business application log --Backup management --Job operation

Operational elements but not included this time

--Batch operation --Incident management --Patch and version control --Security control

Basics that Docker should suppress

First, I will write two points that should be suppressed here.

--All data will be erased when the container is stopped. --When increasing or decreasing the number of containers with n + or operating a large number of systems at the same time, it is necessary to organize how to distinguish the containers.

Creating a Docker Image

Create something called a Docker Image. Basically everything is written in code called DockerFile. It doesn't make any changes after installation like a normal OS setup. All are described in code.

To be precise, you can make corrections even when the container is started, and in many cases you will make changes, but first of all, it is described above for brief explanation

Container creation and operation

Based on what is called DockerImage, DockerContena starts and the application runs. A running container is basically the same as normal Linux.

`Actually, the OS on the container side has restrictions on kernel relations and hardware relations because it is just messing around so that it looks like another OS, but it is not related to this topic, so I will save labor. .. For the sake of simplicity, the base OS is running! !! I will proceed as. ``

You can log in to a running container with telnet or ssh and type commands or run programs on the command line. When creating an Image, the base OS Linux is running because it always selects what is called the base OS.

`You can customize network settings, drive mounts, etc. when the container starts. ``

Stop container

When the operation is finished, stop the container. Normally, in this case, all data in the container including logs etc. will be deleted.

`To be precise, you can mount the file system when creating a container. This is an image of using a network drive in Windows. When mounted in that way, even if the container is stopped, the data remains in the mount source. ``

Each problem and how to deal with it from the system operation perspective

Library management

If you release the library to a container, all the files will disappear when the container is stopped. Next time, 0 will be started from the state of DckerImage.

There are the following types

Recreate the Image for each release and re-install the program.
Mount the file area containing the latest program when starting the container and start it.
Release the latest program to the container after starting the container

I have listed multiple proposals, but since there is not much merit in the bottom two from an operational point of view, I will adopt 1. `If you use Docker at the development stage individually, 2 is convenient, so 2 is adopted by individuals. ``

However, there is room for consideration as to whether the release from the development environment to the production environment is in image units or individual program units.

Create an Image in the development environment and release the entire Image to the production environment
Move only the program from the development environment to the production environment and create an Image on the production environment side

Which one is better? .. .. .. ..

Surveillance

Since it is normal Linux at startup, it can be monitored in the same way as on-premise. However, the selling point is that multiple containers can be started, so it is necessary to consider how to identify and monitor individual containers at n + startup. If you always fix the number of startups, the advantage of the container will decrease, but you can still monitor it as before.

When starting a container variable with n +, the individual container is identified by linking with the monitoring manager with a self-reported name. It is necessary to design a naming convention for what kind of self-reported name to use.

Life and death monitoring

There are the following types

Put an agent in each container and let the agent output various information for monitoring
Many container management software is available, so monitor through them

If it is an environment that must correspond in parallel with the on-premise environment, it seems that it will correspond to a combination of both of the above. If it is only 1, the container-specific information cannot be obtained, and if it is only 2, there may be a shortage with the existing on-premise acquisition information.

Performance monitoring

There are the following types

Put an agent in each container and let the agent output various information for monitoring
Many container management software is available, so monitor through them

If it is an environment that must correspond in parallel with the on-premise environment, it seems that it will correspond to a combination of both of the above. If it is only 1, the container-specific information cannot be obtained, and if it is only 2, there may be excess or deficiency with the existing impression acquisition information.

Also, unlike life and death monitoring, it is necessary to store logs. When the container is stopped, all the data will be lost, so It is essential to send data to the outside and store it.

Security monitoring

Normal security is similar to Linux OS, so especially here Describes the OS security log (login / logout log).

The basics are the same as up to this point. Since it is normal Linux at startup, you can check the log in the same way as on-premise. Similarly, when the container is stopped, all the data will be lost, so it is essential to send the data to the outside and store it.

However, as a precaution, if there is a suspicious security breach until now, Based on the basic security log, check various logs in the OS I think that I have investigated comprehensively, but in the case of a container, explicitly outside the container If you don't keep the logs, they will all disappear.

Backup management

Image corresponds to the backup file.

After that, in addition to the Image file itself, I used it to create the Image file, It also backs up the DockerFile source code and files used when starting the container (docker-compose.yml, etc.).

Job operation

Since it is normal Linux at startup, log in like on-premise and Operations such as job execution are possible.

But which container will it fit in? We need a mechanism to support container startup status and identification so that we can determine.

These are within the range supported by container management software, but it seems to be difficult. The developer of the individual app says xxx job of my system app, There are innumerable containers in front of the operations staff, and it is difficult to determine which one the app staff is saying. Also, the person in charge of the app does not know how the container other than himself is working.

that? How do you solve this? ?? ?? One or two development units are fine, but with hundreds of systems created by dozens of development units, What if the number of containers is on the order of thousands? ?? If the naming convention at the time of container startup is not correct, it will be difficult to identify. .. .. .. Early detection of violators is also necessary. .. .. .. ..

What needs to be confirmed about the design

--Understanding how to identify individual containers and how they work --How to access individual containers, especially IP distribution method for IP, naming convention for container name (or host name)

Re-study Docker from a system operation perspective

System operation element

Operational elements but not included this time

Basics that Docker should suppress

Creating a Docker Image

Container creation and operation

Stop container

Each problem and how to deal with it from the system operation perspective

Library management

Surveillance

Life and death monitoring

Performance monitoring

Security monitoring

Backup management

Job operation

What needs to be confirmed about the design