Just looking at the title makes my heart squeaky ... I wrote a memo from my experience so far About half for myself
https://developer.mozilla.org/ja/docs/Web/HTTP/Status Of these, the ones that cause an error are the 400 series and 500 series. The rough differences are as follows
--400 series: Access itself is not possible --Name resolution is not possible
Correspondence changes depending on which one, so first separate here
If you haven't tampered with the output path on linux, you'll probably find a rough log in / var / log /.
apache: /var/log/httpd/
nginx : /var/log/nginx/
php for nginx + php-Check fpm: /var/log/php-fpm/
If you want to see what's working, ps aux
Since a large amount of information will come out, if there is a hit, also use grep
together
With AWS, you can check a lot of information from the console
It's rather important. Human temper does not do anything good ... Let's calm down by organizing the current situation or consulting with a great person
As mentioned above, there are many reasons why you cannot access, so I will isolate it from now on. Most browsers should have a status code on the screen
It is easy to deal with because the code is divided finely depending on the cause I often see the following
I often see the following It starts from checking the error log for the time being Correspondence contents vary depending on the log, so I will omit it
From here on, an example
There are various causes, so I will put it in a separate frame. What is possible
ping {IP/hostname}
I will test the communication with (This is localhost as a dummy)
$ ping localhost
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=6.893 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.115 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.117 ms
^C
--- localhost ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.076/1.800/6.893/2.940 ms
If I live, I will come back for the time being. If you want to specify the port number, you can not do it with plain ping, so use another method. I use nping. Convenient https://qiita.com/Yu-s/items/4b4f683fda374c8ddcc9
Mostly you should be able to log in with ssh or something If you can't log in with the command you should have been able to do before, it's likely that you're down.
EC2 Dashboard> Instances> Instance Status You can check from. When it becomes stop, it has fallen. (If you don't use aws-cli or autoscale, it's possible that someone stopped it intentionally ... it shouldn't stop automatically ...) You can also check if the status check has failed, so even if this fails, it will fail.
However, please note that it may be running even if the instance is restarting automatically (= it is actually down).
If you can confirm that the server is alive so far, but you can not access it with the domain, you probably can not resolve the name
This is a quick check https://www.atmarkit.co.jp/ait/articles/1711/09/news020.html
You can also do it with nslookup https://www.atmarkit.co.jp/ait/articles/1710/27/news021.html
$ dig www.google.com
; <<>> DiG 9.10.6 <<>> www.google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 3344
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 89 IN A 172.217.24.132
;; Query time: 13 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Sep 11 13:51:58 JST 2020
;; MSG SIZE rcvd: 59
Name resolution cannot be done without ;; ANSWER SECTION:
It is possible that an error has occurred
--Server software --The framework that runs the code
Since there are two choices, I will look at the two types of logs for the time being. There are various ways to deal with it depending on the content of the error, but the following are the errors I have encountered.
The following appeared in the error log of nginx
write() to "/var/log/nginx/access.log" was incomplete: 83 of 314 while logging request
"I couldn't write the access log." When I went to see it, only the access log of the day was messed up, so I guessed that the storage was exhausted. If you delete the corresponding file, it should be solved for the time being, but even if you delete the access log, it is not solved ... (Maybe there were other heavy files as well) I missed the time to look for it and autoscaled it, so I started a new server and replaced it to deal with it for the time being.
So, in the meantime, when I investigated the root cause, I noticed that it was not logrotated ... I updated nginx a few days ago, but at that time it seems that I forgot to restore the configuration file around log rotation. In addition to this, if storage and memory are exhausted, you will not be able to access it, so it may be good to make a note of the command to check it.
$ df -h //Storage check
$ free -m //Memory check
One day the site suddenly went down and I got a 500 error, so check the error log Looking at the cakephp2 log, I see an error related to a new feature that I made a few days ago! Should have been fixed ...? I immediately noticed that, but I forgot to reflect the setting change in the startup settings ... Apparently Traffic goes up => Start autoscale server that reproduces the error => Inaccessible It seems that it was the flow of.
It was a kind of thing that could be fixed by deleting the cache file due to an error around the cache of cakephp2, so Clear cache again => Create AMI in that state => Specify in startup settings I was able to respond with
I'm impatient to death, but ... ・ Calm down ・ Isolation of the cause ・ Consult a great person It's pretty good.
Recommended Posts