Fixing Linux boot issue :: "kernel panic" message at boot
After installing an updated driver, it has happened that the
initial RAM disk, or initrd, was broken and the server rebooted into a
"kernel panic" message. Let's find out what we can do to resolve a
problem like this.
Fixing a broken initrd
The initrd is a temporary file system created by system for the boot. So, if you ever want to get your system running again, you'll need to correct this error.
Step 1: Determining what is wrong
Troubleshooting is a near-science by itself on which I could spend many articles, but I'll try to keep it brief. During the system boot procedure, several phases occur, starting in GRUB, the Linux boot loader. Roughly, these are the following:
Fixing a broken initrd
The initrd is a temporary file system created by system for the boot. So, if you ever want to get your system running again, you'll need to correct this error.
Step 1: Determining what is wrong
Troubleshooting is a near-science by itself on which I could spend many articles, but I'll try to keep it brief. During the system boot procedure, several phases occur, starting in GRUB, the Linux boot loader. Roughly, these are the following:
1. GRUB
loads the kernel
2. GRUB
loads the initrd
3. The root
file system is accessed by the kernel
4. The
/sbin/init process takes over.
5. The
initial boot stage happens
6. The
default runlevel is activated
7. A login
prompt occurs.
When a problem occurs, try to pin-point it to any of these seven
phases. In some cases it is possible to tell exactly what happens; more often
you will see that you can only give a rough indication of what is happening. In
the case of a kernel panic, you can be sure
about one thing: GRUB has loaded successfully and you are not yet at phase 4 of
the boot procedure where the init process takes over. If a kernel panic occurs
immediately after a driver installation, this is often caused by an error in
the initrd.
How can we be sure? Sometimes it is quite obvious that the error
is in initrd, as GRUB tells you that it failed to load the file /boot/initrd,
in other cases some forensic work is needed as only a vague driver error
message is generated. In the latter case, you have to check if the driver that
fails is included in the initrd, as this helper file is used by the kernel to
include drivers that are needed immediately. On SUSE Linux Enterprise, the file
/etc/sysconfig/kernel contains a list of all drivers that should be included in
the initrd. When you run the mkinitrd command, these drivers are written to
your new initrd. When this happens automatically, something could go wrong.
Step 2: Fixing it
If an error occurs in the initrd, you will not be able to boot your server anymore. So, to fix it, you need the rescue system that is available from the installation dvd. This rescue system loads a complete Linux system off of the installation media. The next step is to mount all your Linux file systems off of that disk. Next, you need to run mkinitrd. You can only do this once the local file systems are all mounted, because the initrd has to be written to the local file systems. However, there is a caveat.
Step 2: Fixing it
If an error occurs in the initrd, you will not be able to boot your server anymore. So, to fix it, you need the rescue system that is available from the installation dvd. This rescue system loads a complete Linux system off of the installation media. The next step is to mount all your Linux file systems off of that disk. Next, you need to run mkinitrd. You can only do this once the local file systems are all mounted, because the initrd has to be written to the local file systems. However, there is a caveat.
The problem with this approach is in the disk devices access in
combination with the necessary use of a chroot environment. To start, you need
to mount your server's file systems on a temporary mount point like /mnt. Let's
say that you have the /boot directory on /dev/sda1 and your / directory on
/dev/sda2. To mount them, you need the following two commands:
1. mount
/dev/sda2 /mnt
2. mount
/dev/sda1 /mnt/boot
Since the mkinitrd command wants to write the new initrd in /boot
and the /boot on your hard drive is now in /mnt/boot, you need to change the
root directory to be set to /mnt. You can use chroot to do that:
chroot /mnt
The contents of /mnt now becomes /, so all path references are OK. But we still have a problem. If you look in the /proc and /dev directory on your new root environment, you'll see that /proc is empty and /dev is as good as empty. Both are dynamically created file systems and they are created at the moment that your server boots. This means that they were created in / when the server booted from the rescue cd. Now, since the new root is in /mnt, you cannot access them anymore. We need to fix this.
chroot /mnt
The contents of /mnt now becomes /, so all path references are OK. But we still have a problem. If you look in the /proc and /dev directory on your new root environment, you'll see that /proc is empty and /dev is as good as empty. Both are dynamically created file systems and they are created at the moment that your server boots. This means that they were created in / when the server booted from the rescue cd. Now, since the new root is in /mnt, you cannot access them anymore. We need to fix this.
1. Type exit
to exit from the chroot environment. You'll now get back to the original /mnt
under which your servers local file systems where mounted.
2. Use mount
-t proc none /mnt/proc to make the proc file system available from the /mnt
environment.
3. Use mount
-o bind /dev /mnt/dev which will make the original /dev which was filled by the
udev process when booting available from /mnt/dev.
Now that you have the repair environment all in place, you need to
check that the line in /etc/sysconfig/kernel that is used to generate a new
initrd is as it should be. You are looking for the following line:
INITRD_MODULES="ata_piix processor thermal fan jbd ext3 dm_mod edd pciback"
INITRD_MODULES="ata_piix processor thermal fan jbd ext3 dm_mod edd pciback"
This line will be different on every server, so check to make sure
that all modules are included that are necessary to start your server (your
server's documentation will help you with that.)
Now under /mnt you have the complete environment that is needed to
repair your server, so take the following two steps to fix your server.
1. Activate
/mnt using cd /mnt and make it your new root environment using chroot .
2. Issue the
command mkinitrd to write the new initrd to /boot.
You have now fixed the initrd. Reboot your server and check that
everything is working all right.
Now you know how to fix your server when the initrd fails. This
information is most helpful for boot problems after major system modifications.
Keep this information handy so you'll be able to apply a quick fix to your
server if and when something goes wrong during an upgrade.
Comments
Post a Comment