| certifications | open source - Jon Welling
What to Do With a Kernel Panic
The first time I saw a kernel panic I experienced a rollercoaster of emotions. First, of course, panic, then confusion, followed by a bit of laughter. Why is a computer panicking? It doesn't have feelings. At any rate, I'm sure a lot of system administrators don't approach a kernel panic with the same dose as humor that I would. It can be a serious and scary issue. So, let's discuss what a kernel panic is and how to fix it.
What is a Kernel Panic?
The 'kernel panic' error code may not be the best-worded error code. It's not clear what's going on other than the computer is having an issue with the kernel. In fact, a kernel panic does not have much to do with the kernel typically. Let's talk about the Linux boot process quickly so we can explain that.
How Does the Linux Boot Process Work?
The boot process for Linux occurs throughout multiple steps. When a computer first turns on, the BIOS or UEFI takes control. The job of the BIOS/UEFI is to enumerate the hardware, make sure everything on the computer is in working order, and then pass priority to the bootloader to startup the OS installed on the storage drives of that computer.
When the BIOS/UEFI passes priority to the bootloader, it calls the MBR, or master boot record. The MBR is a small partition at the beginning of the primary storage drive in a computer. Traditionally the MBR would contain the information and instructions for calling the files needed to start the OS.
That MBR partition is incredibly small, though. In the world of modern OSes, it is not large enough to hold the entire bootloader needed to start the OS. This is especially true with multi-OS environments.
Taking a quick detour, the most common bootloader for Linux today is Grub2. Most major Linux distributions have migrated to using Grub2 as their default bootloader. Grub2 is a Linux bootloader capable of working with various types of hardware (including non-x86 based architectures) and filesystems. It is too large to fit entirely into the MBR partitions because of that. So, when the MBR calls the Grub2 bootloader, it is instead calling another bit of information that holds the instructions to start Grub2 and not Grub2 itself.
When the MBR calls the Grub2 bootloader, it mounts a file called initrd or initramfs (depending on the Linux OS). That file is like a tiny virtual hard drive that contains enough files and information to start the rest of the boot loader process. When the initrd file is mounted, that file then mounts key system partitions like the root filesystem and the boot partition. The boot partition contains the Linux kernel. Once those partitions are mounted, the initrd file starts loading the Linux kernel and the rest of the OS. A kernel panic occurs because something goes wrong with this part of the boot process.
The TLDR Version of Why Kernel Panics Happen
When the initrd can't mount the boot or root partitions, or the Linux kernel can't load properly, this is when a kernel panic occurs. To boil it down a little more, a kernel panic can occur because:
- The /boot or root file system can't be mounted due to a file system issue or a corrupted initrd file issue.
- The /boot or root filesystem changed locations and the initrd file can't find them.
- Patches or kernel modules were installed on a system but the initrd file wasn't updated to reflect that properly.
- Patches or kernel modules were installed that have an issue with the computer system.
How Do You Fix a Kernel Panic?
There are a lot of ways to fix a kernel panic. Those methods for fixing a kernel panic might change between various Linux distributions, too. So, it would be beyond the scope of this article to explain every possible way to resolve a kernel panic issue. Instead, we'll discuss the overall process for triaging and fixing a kernel panic issue.
Fix 1: Did you install updates recently?
It's easy to install updates in Linux and not fully realize everything that is updating. Linux updates may include new kernel images or updates to drivers that include new kernel modules or patches. If these updates don't configure the Grub config files properly, that could cause a kernel panic issue.
First, pay close attention to Grub when it first loads. Many Linux distributions will keep the old kernel and create a new line in Grub to boot from it. Instead of booting from the new kernel, try booting your computer with the old kernel instead.
If that does not work, try booting from a live Linux environment. Most Linux distributions include a live environment on their install media. Make sure to use the same distribution you currently have installed on the computer.
Once you can boot into the OS, either through the live environment or through an old kernel, remake the Grub configuration files. That will re-create the configuration files for the current system with the kernels and configs currently installed. Make sure to mount the OS root and boot partitions before remaking the configuration files. Also, make sure to target the boot partition of the computer and not the live environment, too.
If remaking the Grub configuration files does not work, the Grub automated config file tool also includes flags to include all kernel modules for a system in the boot loader. By default, Grub will only include what the computer needs to get off the ground. Then the rest of the kernel and kernel modules are started with the rest of the boot process. You can opt to have the entire kernel and all kernel modules load with Grub, but that can make Grub bloated and less efficient. It is a good way to recover a Linux bootloader, though.
Once you have the OS booted properly again, remake the Grub configuration files one more time just to clean things up.
Fix 2: Is the storage drive working properly?
A kernel panic could be caused by the boot and root partitions not being available as well. This could happen for any number of reasons. For instance, a hardware upgrade might have messed something up, or a storage drive might be failing.
The recovery process is largely the same as above, but you'll want to add a couple of steps. After you load a Linux live environment, mount the storage drive manually. After the storage drive is mounted, check its partitions to make sure they exist and are healthy. After that is confirmed, use tools to check the drive itself for issues or failures. Any drive utility capable of reading the SMART information for a drive should work.
If you are having an issue with a drive, make sure to backup any important information right away. If the storage drive is not experiencing catastrophic failure, it may be able to be cloned to a new, working storage drive to fix this issue.
The Last Resort
When all else fails, re-install the bootloader from scratch. If you haven't installed any recent updates, there haven't been any hardware upgrades, and the storage drives and partitions appear to be working, something may have happened with the bootloader itself. In this case, re-install the bootloader along with its configuration files.