I upgraded my Azure VM Ubuntu 18.04 (LTS) to 20.04 (LTS) and rebooted, but I couldn't connect remotely with SSH indefinitely.
The Azure portal serial console is also unresponsive. The following message is displayed in the screenshot of the boot diagnosis of Azure VM in the Azure portal.
'grub_file_filters' not found
Grub seems to be broken.
I upgraded to Ubuntu 18.04 (LTS) of Azure VM by remote connection from Ubuntu of WSL by SSH and following the procedure below.
Check and apply updates in advance
$ sudo apt update
$ sudo apt upgrade
Perform the upgrade
$ sudo do-release-upgrade
Choices are basic y (YES), all upgrade suggestions accepted, LXD 4.0
Reboot to complete the upgrade
Unable to restart ... ('grub_file_filters' not found)
I want to reinstall Grub and repair it, but when I was wondering what happened in the Azure VM environment, I found the following information from Microsoft.
I didn't find the corresponding information in "Recommended steps", so I followed the link.
How to recover Azure Linux virtual machines from kernel-related boot issues
I thought it might be about "Method 2: Offline repair" in "How to update the configuration file", so I looked at the link.
Troubleshoot a Linux VM by attaching the OS disk to a recovery VM with the Azure CLI
I solved the problem by creating a disk from a snapshot of the OS disk of the Azure VM in question and mounting it on the Azure VM for repair, as described in the "Recovery process overview" below. It seems that we can go by returning to the original Azure VM.
Stop the affected VM.
Take a snapshot from the OS disk of the VM.
Create a disk from the OS disk snapshot.
Attach and mount the new OS disk to another Linux VM for troubleshooting purposes.
Connect to the troubleshooting VM. Edit files or run any tools to fix issues on the new OS disk.
Unmount and detach the new OS disk from the troubleshooting VM. Change the OS disk for the affected VM.
From the Azure portal, you have stopped the problematic Azure VM.
It is OK to install Latest version of Azure CLI and stop using Azure CLI. ..
> az login
> az vm stop --resource-group <Resource group name> --name <Virtual machine name>
Take a snapshot of the OS disk of the Azure VM in question.
I am using the Azure CLI. (Create a snapshot based on the disk ID)
> $osdiskid=(az vm show -g <Resource group name> -n <Virtual machine name> --query "storageProfile.osDisk.managedDisk.id" -o tsv)
> az snapshot create --resource-group <Resource group name> --source "$osdiskid" --name <Snapshot name>
Create a managed disk to be repaired based on the created snapshot.
I am using the Azure CLI. (Create a managed disc based on the snapshot)
> $resourceGroup="<Resource group name>"
> $snapshot="<Snapshot name>"
> $osDisk="<New disk name = for repair>"
> $diskSize=30 #Specify the required size (GB)
> $storageType="Standard_LRS" #Specify the required storage type
> $osType="Linux" #Specify the required OS type
> $snapshotId=(az snapshot show --name $snapshot --resource-group $resourceGroup --query id -o tsv)
> az disk create --resource-group $resourceGroup --name $osDisk --sku $storageType --size-gb $diskSize --source $snapshotId
From the Azure portal, I created a new Azure VM to use for repairs.
Reference
Creating and preparing Azure credentials to create an Azure virtual machine with Ansible
You are creating virtual machines, public IP addresses, network security groups, and network interfaces of the same size in the same virtual network in the same resource group.
I set the DNS name label on the created virtual machine and confirmed that it can be connected remotely by SSH.
From the Azure portal, I connected the disk created from the snapshot to the Azure VM for repair.
It's okay to connect the disk using the Azure CLI.
> $newdiskid=(az disk show -g $resourceGroup -n $osDisk --query id -o tsv)
> az vm disk attach --disk $newdiskid --resource-group $resourceGroup --size-gb $diskSize --sku $storageType --vm-name <Virtual machine name for repair>
I connected to the Azure VM to be used for repair remotely by SSH and mounted the connected disk.
Check the disk usage of the Azure VM for repair.
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 2.0G 0 2.0G 0% /dev
tmpfs 394M 676K 393M 1% /run
/dev/sdc1 29G 1.4G 28G 5% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/sdc15 105M 3.6M 101M 4% /boot/efi
/dev/sdb1 7.9G 36M 7.4G 1% /mnt
tmpfs 394M 0 394M 0% /run/user/1000
List block devices to see which disks to mount.
$ lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep -i "sd"
sda 3:0:0:0 30G
├─sda1 29.9G
├─sda14 4M
└─sda15 106M
sdb 1:0:1:0 8G
└─sdb1 8G /mnt
sdc 0:0:0:0 30G
├─sdc1 29.9G /
├─sdc14 4M
└─sdc15 106M /boot/efi
Since sdb and sdc are mounted, sda seems to be the disk to be repaired.
Create a mount directory and mount sda1 of sda.
$ sudo mkdir /datadrive
$ sudo mount /dev/sda1 /datadrive
Make sure you have mounted the disc to be repaired.
$ ls /datadrive/
bin dev home initrd.img.old lib64 media opt root sbin srv tmp var vmlinuz.old
boot etc initrd.img lib lost+found mnt proc run snap sys usr vmlinuz
Bind mount sys, proc, dev to the chroot environment to play with the mounted disk to be repaired.
$ sudo mount --bind /sys /datadrive/sys
$ sudo mount --bind /proc /datadrive/proc
$ sudo mount --bind /dev /datadrive/dev
$ sudo chroot /datadrive
Install Grub on the disk to be repaired.
$ sudo grub-install /dev/sda
Installing for i386-pc platform.
Installation finished. No error reported.
Unmount the repaired disc.
$ cd /
$ sudo umount /dev/sda1
From the Azure portal, you disconnected the repaired disk that is connected to the Azure VM for repair.
It's okay to disconnect the disk using the Azure CLI.
> az vm disk detach -g <Resource group name> --vm-name <Virtual machine name for repair> --name <New disk name = for repair>
If you don't need it, stop the Azure VM for repair.
From the Azure portal, I stopped the problematic Azure VM and replaced the OS disk with a repaired disk.
I am using "OS Disk Swap" of the virtual machine disk in the Azure portal.
It's okay to change the disk using the Azure CLI.
> az vm stop -n <Virtual machine name> -g <Resource group name>
> $newdiskid=(az vm show -g <Resource group name> -n <New disk name = for repair> --query id -o tsv)
> az vm update -g <Resource group name> -n <Virtual machine name> --os-disk $newdiskid
> az vm start -n <Virtual machine name> -g <Resource group name>
I have confirmed that I can connect to the Azure VM that I replaced with the repaired disk remotely by SSH. The upgrade to 20.04 (LTS) has also been completed.
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-1031-azure x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Sat Nov 7 13:14:17 JST 2020
System load: 0.04 Processes: 139
Usage of /: 26.2% of 28.90GB Users logged in: 1
Memory usage: 61% IPv4 address for eth0: 10.0.0.5
Swap usage: 0%
* Introducing self-healing high availability clustering for MicroK8s!
Super simple, hardened and opinionated Kubernetes for production.
https://microk8s.io/high-availability
0 updates can be installed immediately.
0 of these updates are security updates.
Last login: Fri Nov 6 13:55:00 2020 from 133.200.8.0
Delete any unnecessary items such as snapshots and discs before replacement.
If you want to keep the Azure VM for repair, it's safe to leave it stopped (deallocated).
I'm glad I was able to repair it as a result, but I should have backed it up before the upgrade.
Recommended Posts