Technology & Science

Network RAID Woes

11.20.11 | Permalink | 2 Comments

About 5 years ago, I made the decision to centralize all of the media that we have for our home on one server. I built the server from scratch and used Fedora as the OS complete with RAID 5, logical volumes, etc. It worked brilliantly until it was time to upgrade drives, and then was a pain in the ass. So, I decided to move from my hobby project to a real product, a live network attached RAID system, selecting the Western Digital ShareSpace 4Tb model.

The ShareSpace has configurable RAID levels, you can selectively update physical disks, it has 1 Gb ethernet and is in an appliance form factor. What it doesn’t have, is the ability to recover when there are 2 sequential power outages where the second outage happens before the filesystem check completes from the first outage. UPS (yes, I have one, only 15 min I’m afraid) isn’t sufficient to protect from this when the filesystem check takes hours to complete.

This exact condition happened Friday night and when I got around to looking at things, my RAID 5 volume was failed on the ShareSpace and unrecoverable from the web-based utility. I was bummed. I still had all the physical media, but it would take weeks to rip and transfer back to whatever I replace the ShareSpace with eventually. So, I got back to my roots to do a little hacking on the appliance, after all, what did I have to lose?

To make a long story short, it appears that WD used a very similar approach in building out ShareSpace that I used to build my own hobby server lo those many years ago. All my meta-device, physical volume, and logical volume experience looked like it could be useful here. So, to help anyone else who might find themselves in this position, here are steps you can take to get to a place where you can mount the network disk and move files off after one of these episodes. This worked for me, YMMV. If you don’t have Unix experience, offer a friend a six-pack and a pizza to help out 😉

  1. Establish a ssh session to the device. This requires that in the web interface as admin that you enable ssh access. Using a terminal where ssh is supported (if you use a mac, just open the terminal tool.) If you’re on a PC, something like PuTTY could serve. Point ssh toward your ShareSpace. i.e. $ ssh admin@192.168.1.100 substituting the correct IP address. The default password is welc0me.
  2. Next, check your DataVolume status, attempt to access it by cd /DataVolume -> if there’s data there, don’t proceed, you have different problem. If there’s nothing there, then you can choose to proceed at your own risk. Right now, a professional might be able to recover your data. If you do the following steps, that option will no longer exist. BE SURE YOU WANT TO TAKE THAT RISK, or STOP.
  3. Let’s create /dev/md2. Use: mdadm –assemble –force /dev/md2 /dev/sda4 /dev/sdb4 /dev/sdc4 /dev/sdd4
  4. Next create the physical volume. Use: pvcreate /dev/md2
  5. Next create the volume group. Use: vgcreate lvmr /dev/md2 >
  6. Next create the logical volume. Use: lvcreate -l 714218 lvmr -n lvm0 (the 714218 value works for 4Tb RAID 5)
  7. Now, if we’re lucky, we can fix the file system. Use: fsck.ext3 /dev/lvmr/lvm0 -y (Note, this can take hours…..be patient.)
  8. If the file system comes back clean, then mount it up. Use: mount -t ext3 /dev/lvmr/lvmo /DataVolume -o rw
  9. cd /DataVolume and ls -l it, your data should be available
  10. Network mount device and move your data off the ShareSpace to safe keeping
  11. If you want to continue to use ShareSpace, you’ll need to nuke it and reestablish the RAID etc.

Well, that’s it. Now I have to find a place large enough to park these files until I establish the long-term solution. It’s too bad cloud services are so expensive at the scale I need….I have a full 2Tb of data, suggestions welcome for alternatives here or on Twitter @mah1.