Recent developments in the Australian internet marketplace have changed the way you manage your digital assets. This article discusses how you can use these cheap internet plans to set up the ultimate back up system, using two separate sites (e.g. home and office, or between you and a friend/colleague).
You will need:
Long term data backup is all about redundancy of data in geographically disparate locations. Put simply, having a minimum of two copies, stored in distinctly different locations, is the only real way of achieving a really high reliable back up that can survive all of life's misadventures - such as failing hardware, or theft, or bushfire etc. It's not enough to have your backups stored somewhere else in your house or elsewhere on your own property as unfortunately as far too many bush fire victims have discovered. Even if you have an excellent, reliable backup system like the Drobo, that alone is not enough to prevent total loss through theft.
The first thing you must have is two physically separate locations such as your home and office, or you and a friend's house. For this to work easily and automatically, these two locations must both be connected to the internet using reliable ISPs that do not have quota restrictions - in this day of digital cameras, it is easy for just one wedding job to result in gigabytes worth of files. The back ups will run overnight, slowly but surely, so the speed is not nearly as important as the quota capacity of the system, unless perhaps you are generating gigabytes upon gigabytes every day, which very few people do in practice. Ideally both connections would offer a static IP address - they can be with different companies, it doesn't matter. This just makes things simpler as you always know what magic numbers you need to talk to either machine.
To give a concrete example, TPG offer unlimited internet plans over ADSL2+ and we have this at home and at work. The cost is $75 a month for each account, although this will almost certainly come down further soon. We get good speeds from these - downloads of up to 1.7Mb/s at work and 1Mb/s at home - work is very near to the exchange, home is about 3km and ADSL speeds are very sensitive to cable length from the exchange. Uploads are far less satisfying but 100kb/s is common. This isn't a big issue as we run a server computer at either end 24 hours a day, 365 days a year, so the speed of the system isn't that relevant as we just run the back ups after 7pm once the day's work is done. The per year running cost of a typical PC for a year is about $100 to $150 in electricity at time of writing - and of course are servers are used for lots of other tasks as well, so this is not really a big expense in the scheme of things.
Once you have your two servers up and running, and connected to the internet, you can move on to setting up your back up system. This is where it can get a little complicated, but hopefully this guide will get you going in the right direction.
The first thing to do is do a manual backup of all your data from one site to the other.This means you won't have to do an initial back up of terabytes of legacy data across the internet. This is a one-off establishment process that insures each site starts with a complete backup copy. From now on the backups will be automatic across the internet.
Next, we need to install the software. The easiest way is to designate one machine as the backup management machine (we'll call it the client), and the other as the FTP server (FTP = file transport protocol). You will still be able to back up both machines to the other this way, it just reduces the complexity and software costs. But you will need access to the client machine to manage the backups in this scenario - if you want to be able to manage the backups at either end, you simply duplicate the set up we're about to describe, running both the client software and server software on each machine.
This machine requires the backup/synchronisation software. We use synchronisation software here as it makes the files very easily accessible at either end, but you can go for a more traditional back up approach if you prefer. A synchronisation system creates a mirror image of your file system on the destination machine which makes it super easy to retrieve files if you need to, where as a backup system would create a 'backup volume' on the destination system which would generally require special software to retrieve files from if you need to.
Note: Synchronisation on its own is actually a very bad
backup system, for example, if you accidentally delete an important file
on one end but don't notice till the next day, overnight the
synchroniser will have propagated this change to your backup machine
and deleted it there as well - not great! So if you do use
synchronisation, make sure your software supports 'versioning',
including for deleted files - these systems generally offer a
fail-safe, keeping a copy of changed or deleted files for some period
of time in case you realise you've made a mistake and want to retrieve
We use SyncBackSE - it's a bit complicated but offers pretty much every option you could possibly want. GoodSync is an easier to set up alternative. It's important that whatever software you choose supports secure FTP for both your client backup software and your FTP server software. That is, it must support encrypted FTP, so that as you send your files over the public internet, they can't be snooped on.
In SyncBackSE, we simply define the backup jobs and SyncBackSE can define jobs where either end is the source and either end is the destination, which is why we need it on only one of the machines. The setup is typically easy - define a new synchronisation job with versioning and just point it at the source directory on one machine and the matching destination directory on the other machine. Tell it to skip any unimportant files that don't need backing up, and then give your profile a test run to see that the results will be as expected.
We get SyncBackSE to email us daily logs for checking - we can see which new files were copied across, which were deleted and so on. This means we spend 2 minutes a day doing a basic daily check that nothing has gone wrong and everything is working as expected. If it has, we can always retrieve the old files if we need to as we have versioning enabled.
This machine simply has secure FTP software on it. We use Gene6 which is inexpensive, very secure, and has an easy administration interface. You just create an authorised FTP user, place some limits on them if you need to, such as allow connections from only the client's IP address, only allow secure FTP etc, and give them access to the appropriate source and destination directories on this machine. Basically it just sits there and waits for connections and pushes files in and out as needs be.
This system is about as safe as you can get - as long as you do read the daily log emails that come through. We've been using this system for more than 3 years now - at first with just really important files, and now with multiple terabytes wirth of data. In that time we've been able to recover from several very near mishaps, with failed hard drives being the most common incident. We are completely protected against fire, theft, flooding,anything really - even if one side of the system is utterly destroyed or goes AWOL, we have a perfect 'live backup' we can restore from at any time.
One issue that is not really catered for by this system as presented is so called 'disk rot' with older files/disks. This is a phenomenon where you have legacy files on an old disk, fully backed up, but a sector or group of sectors fail on the disk and because you're not accessing the file regularly, the system never really picks it up. Fortunately, disk rot is actually pretty uncommon, and modern self correcting file systems will often transparently pick it up and fix the issue. You'd have to be very unlucky for the same sectors on both the main copy and the backup to fail, so you should easily be able to restore from the backup. You could of course run some sort of regular checksum algorithm over your files at both ends to make sure they remain perfectly intact as an extra layer of security.
Convenience is excellent as well after the initial complexity of set up. You can essentially have a mirrored file system at either end, which makes restoring files at either end trivially easy, with no need to load special software to restore any files, just copy the files directly across, and the files are super easy to find as they're in the same place on both sides of the system. The backups are transparent and run overnight, with no negative impact on the performance of the machines involved. In fact, apart from the daily email check, the entire system simply disappears from view but carries on doing excellent work for you in the background - exactly how computers should work!
If you don't have an appropriate second site accessible, then you can use an online backup service. If you have unlimited internet access, this can become a viable option although for large data volumes, although it might mean an initial back up period of several weeks to upload your files. There will of course be ongoing fees for these sorts of services.
For the client software, SuperFlexibleFileSynchroniser is a popular Mac equivalent to SyncBackSE. For the server, OSX actually has a built in FTP server - here's a simple set up guide. But the built in server is a bit too basic for my taste - Pure ftpd is more fully featured with a user interface guide here.
- Greg -
Just a quick note to say thanks for completing the profiles for my i9950 printer so quickly. The results are amazing to say the least! The profiles are far superior to those provided by Canon and I am pleased to say that, in cooperation with a newly calibrated monitor, my struggle with colour management has finally ended.