Tag Archives: Recovery

Rebooting with ‘The Big Hammer’

Today I had a machine I was working on spit the dummy in a really bad way. It had a tonne of IO errors to its root filesystem and eventually decided to remount it read only. Of course this meant that it was almost entirely wedged. I tried the reboot command, the init command and everything would lockup my terminal. Not having console or physical access to the machine I couldn’t simply hit the power button, so I used the Linux magic commands:

echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger

Of course the disk errors meant that it was unable to boot but ‘The Big Hammer’ struck me as something extremely useful.

Oh My God – I broke my LVM

So today I did about the stupidest thing I could have done at the time. I was planning on clearing my USB hard drive so I could start my new backup plan on it. Of course any Linux geek knows the easy way to erase a hard drive is to do a ‘dd if=/dev/zero of=/dev/sdb1’. On almost all my computer there is only one hard drive which maps to /dev/sda. Of course you know exactly where I’m going here don’t you? So this is my home server with two hard drive combines into one volume group. The first hard drive is /dev/sda, the second /dev/sdb and the USB hard drive got mapped to /dev/sdc. So in my case that command obliterated the first 125Mb of my second drive before I noticed.

My machine was still running so I knew I hadn’t wiped anything immediately important. The first thing that I thought of doing was checking what exactly it was that I had wiped and what chance I had of backing up anything before bailing out. Looking at the LVM layout revealed that I’d probably just destroyed the file system I stored my local Fedora repository on, something I could do without. So I umounted it, removed it from /etc/fstab and did a lvremove. This is exactly where I realised the gravity of the situation. LVM was complaining that it couldn’t locate one of the physical volumes. Of course it couldn’t, I’d just blown away all the metadata for it.

Did you know LVM keeps backups of the metadata? Yes, it keeps them in /etc/lvm/backup (for slightly older copies see /etc/lvm/archive) and you can use this to recover the metadata. I thought a good place to do this would be now, before the reboot that could end it all. Try as I might it was refusing to create a volume that already existed and it also complained about the device being in use. I count myself extremely lucky to be able to do what I did next. To me it felt incredible but when you really think about it it makes sense.

I downloaded the Fedora 11 Live CD and burned it to CD. Yep that’s right, while knocking on deaths door my machine managed to launch a torrent client, download a 700Mb ISO and burn it to a CD. After that I backed up the /etc/lvm folder to the USB hard drive that caused this mess. Finally I rebooted into the Live environment. The very next step was to recreate the partition table with fdisk.

Then I recreated the physical volume metadata that was destroyed with the following command:

pvcreate -ff -u DsuvMV-1HVj-SQOU-wZkT-N9M0-LMZd-gPws1U \
 --restorefile /media/usbdisk/lvm/backup/Volgroup00 /dev/sdb1

This forces the creation of a pv with a specific uuid, ignoring any pvs that exist with the same uuid. It also restores the metadata stored in the restorefile. Follow up with this command to restore the full metadata.

vgcfgrestore -f /media/usbdisk/lvm/backup/Volgroup00 -v VolGroup00

Now our LVM metadata is all correct, but at this point we still need to activate the logical volumes.

vgchange -ay

Finally you should fsck your logical volumes to make sure everything is working properly and you don’t get any nasty surprises later. All that is left then is to reboot into your recovered system.

Now thats something they don’t teach you in RHCE!

Random thought: Who needs enemies when I have my own stupidity to contend with?