Tuesday, March 31, 2009

Want Cheap Storage @ Home?

Okay, I've been asked several times what I use for bulk data storage at home. A lot of the tools I use for production systems are exactly the same and saving money by using commodity hardware at the office is just as easy. (ie if I can create 12TB single-fault reliable array at home for under $2k, then why spend $80k at the office???)

I bought a Norco RPC-4020 ($290) case which has 20x3.5" SATA drive bays with unfortunately flimsy trays in which I've mounted a cheap dual 2GHz AMD Opteron server motherboard with 4 PCI-X (133MHz/64bit) slots from eBay ($120), search "Monarch". The eBay deal included 4GB ECC RAM and a 120GB IDE HD. Then I bought 8x1.5TB Seagate drives ($130/ea for $1060 delivered). I added that with my old set of 6xHitachi 500GB, 6xWD 750GB and 2xSuperMicro AOC-SAT2-MV8 8 Port SATA controllers for $100 ea. After a bit of "Aggie Engineering" getting the Opterons cooled properly in the case (without buying new heatsinks) I assembled the parts was off to the races.

I downloaded and installed OpenSolaris 2008.11 for x86 on the 120GB and then created 3 RAIDZ pools and then created ZFS filesystems on them and shared via CIFS to my Windows and Mac machines. If you're brave, you can stripe/cat your raidzs together, but I left them separate so I can more easily upgrade the 300GB drives to 2TB drives later this year (because you can't remove devices from a zpool).

Total bill after incidentals, about 2 large. Total usable storage space, about 12TB. Total days of nerd fun, about 5.

Next on my project list is to rebuild my Vista Gaming PC for a Solid State disk drive boot and host an iSCSI target on the storage array. Been wanting to do that for a year now...

Notes:
  1. I do NOT recommend the 1.5TB Seagate drives. The failure rate on them is high (thanks for noticing RAIDZ). I've RMAd two and updated the firmware to CC1H on the others. (Google '1.5 TB Seagate Freeze') They also do NOT work with the Adaptec 21610SA.
  2. Some consumer level SATA drives don't like long cables. Even though the SATA spec allows it, when I was using a really nice 3U 12 bay external drive enclosure with 3xInfiniband connections to 2xLSI 3800X SAS controllers but I kept getting intermittent errors. That set me back 2 weeks and almost $800 to figure out. I still don't know if that's exactly it or a problem with the LSI3800X and the Tyan Thunder server MB but I had to scratch and restart on the controller. BTW, I still have those two 8 channel LSI controllers, an Adaptec 21610SA and a Norco DS-1220 laying around.
Links:

Backup your ZFS files to Mac's HFS+ over a WAN

It makes sense to keep multiple copies of your critical files not only on different computers, but in multiple physical locations. But how do you keep them all in sync? rsync(1) of course! I debated on zfs snapshots, but that doesn't really let me access my files locally on Leopard (10.5) so I decided that keeping a replicated filesystem works best right now. But, there are a few caveats and hidden obstacles. Let me cut to the chase and show you how I do it.

So I have a Solaris server at home (lets call it 'storage' with 10TB of storage on a ZFS filesystem which I share via CIFS to all of my other computing devices. (zpool create tank1 raidz1 blah blah; zfs create -o casesensitivity=mixed -o nbmand=on tank1/files; svcadm enable -r smb/server; zfs set sharesmb=name=files tank1/files; sharemgr show -vp)

I want to back up a portion of those files to my personal external USB drive at the office attached to my MacBook Pro.

I created a HFS+ partition on the mac using 'Disk Utility', formatted, mounted. I did my initial copy while I was attached to my local LAN using a basic rsync command (rsync -avz -e "ssh -l timk" timk@storage:/tank1/files/ /Volumes/Personal/files/)

Now, back at the office, I want to receive incremental updates. I went back through my history and started again with my basic command, added --delete-after (to remove files from my external backup drive which were removed or renamed on my master copy at home) but I was seeing files which had not changed get transferred. This was not right!

2009/03/31 10:05:50 [6346] receiving file list
2009/03/31 10:06:10 [6346] 31752 files to consider
2009/03/31 10:06:10 [6350] >f+++++++ Data/Rebecca's Personal/My Documents/Personal.old/Recipes/Shrimp Etouffeé.doc
2009/03/31 10:06:11 [6350] >f+++++++ Data/Rebecca's Personal/My Documents/Personal/Recipes/Shrimp Etouffeé.doc
2009/03/31 10:06:11 [6346] *deleting Data/Rebecca's Personal/My Documents/Personal.old/Recipes/Shrimp Etouffeé.doc
2009/03/31 10:06:11 [6346] *deleting Data/Rebecca's Personal/My Documents/Personal/Recipes/Shrimp Etouffeé.doc
2009/03/31 10:06:11 [6346] ^M2009/03/31 10:06:11 [6350] sent 64 bytes received 777331 bytes 345
50.89 bytes/sec
2009/03/31 10:06:11 [6350] total size is 63159427722 speedup is 81244.96


After some research, I realized it was the UTF-8 in the filename throwing off rsync due to HFS+'s munging and by adding an --iconv=UTF8-MAC,UTF-8 option, I could force the character set conversion of the filenames between my Mac's HFS+ and the ZFS on the storage server. But alas, life is never so easy:

rsync: on remote machine: --iconv=UTF8-MAC: unknown option

Oh, OS X 10.5 ships with rsync 2.6.9 and Solaris 11 (snv_101b) ships with 2.6.9 but the iconv option is only available in rsync 3.x (rsync.samba.org)

tim-kieschnicks-macbook-pro:~ timk$ /usr/bin/rsync --version
rsync version 2.6.9 protocol version 29
Copyright (C) 1996-2006 by Andrew Tridgell, Wayne Davison, and others.

Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
inplace, IPv6, 32-bit system inums, 64-bit internal inums

rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
tim-kieschnicks-macbook-pro:~ timk$

timk@stor:~$ /usr/bin/rsync --version
rsync version 2.6.9 protocol version 29
Copyright (C) 1996-2006 by Andrew Tridgell, Wayne Davison, and others.

Capabilities: 64-bit files, socketpairs, hard links, symlinks, batchfiles,
inplace, no IPv6, 64-bit system inums, 64-bit internal inums

rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
timk@stor:~$


So, a quick update of rsync via macports (sudo port install rsync) and I was good to go on the client. On Solaris, it took a few more minutes to download the packages and dependencies from sunfreeware.com. I needed the following packages.

% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/rsync-3.0.5-sol10-x86-local.gz
% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/popt-1.14-sol10-x86-local.gz
% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/libiconv-1.11-sol10-x86-local.gz
% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/db-4.2.52.NC-sol10-intel-local.gz
% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/libintl-3.4.0-sol10-x86-local.gz
% wget ftp://ftp.sunfreeware.com/pub/freeware/intel/10/libgcc-3.4.6-sol10-x86-local.gz

Uncompressed the whole lot (gzip -d *.gz) and then I did a pkgadd -d pkgname one at a time and once I verified I was now good to go.

timk@stor:~$ which rsync
/usr/local/bin/rsync
timk@stor:~$ rsync --version
rsync version 3.0.5 protocol version 30
Copyright (C) 1996-2008 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, no IPv6, batchfiles, inplace,
append, ACLs, no xattrs,
iconv, no symtimes

rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
timk@stor:~$

tim-kieschnicks-macbook-pro:backup_stor timk$ which rsync
/opt/local/bin/rsync
tim-kieschnicks-macbook-pro:backup_stor timk$ rsync --version
rsync version 3.0.5 protocol version 30
Copyright (C) 1996-2008 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
64-bit files, 32-bit inums, 32-bit timestamps, 64-bit long ints,
socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
append, ACLs, xattrs,
iconv, symtimes, file-flags

rsync comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions. See the GNU
General Public Licence for details.
tim-kieschnicks-macbook-pro:backup_stor timk$



But... one more hurdle to overcome:

rsync: on remote machine: --iconv=UTF-8: unknown option
rsync error: syntax or usage error (code 1) at main.c(1318) [server=2.6.9]
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.5]


My storage server's rsync was still using the old version 2.6.9. No problem, I can specify the server's rsync path using --rsync-path=/usr/local/bin/rsync.

Now, finally, I can keep my files in sync!!!

#!/bin/bash

tgt=/Volumes/Personal/
src_host=10.0.0.XX

ping -t 5 -c 1 $src_host

if [ $? -ne 0 ]; then
echo Ping failed, using remote host.
src_host=storage.mydomain.com
fi

if [ ! -d ${tgt} ]; then
echo Personal volume not mounted, exiting;
exit 1;
fi

/opt/local/bin/rsync -avzi --delete-after --progress --iconv=UTF8-MAC,UTF-8 --rsync-path=/usr/local/bin/rsync --log-file=$HOME/tmp/`basename $0`/files-$$.log -e "ssh -l timk" timk@${src_host}:/tank1/files/ ${tgt}/files/