Please note Image Science will be shut on Tuesday 1st November for the Melbourne Cup public holiday. 
We are open on Monday but with reduced staff, so therefore please note that turnarounds next week will be a tad slower than usual.

Build the Ultimate Back Up System

21st September 2015 Digital Asset Management

Recent developments in the Australian internet marketplace have changed the way you manage your digital assets. This article discusses how you can use these cheap internet plans to set up the ultimate back up system, using two separate sites (e.g. home and office, or between you and a friend/colleague).

What You Will Need

You will need:

  • Two separate geographic locations. Long term data safety is all about multiple copies in physically different locations.
  • A medium to high speed, high bandwidth internet account at either end. Something like TPG's 'ADSL2+ Unlimited' package, or similar packages offered by many others. You can check for plans on offer in your area here.
  • Two computers with sufficient hard drive space for your back ups that you're happy to leave on 24/7
    One computer at one end, and a NAS offering remote secure FTP at the other - like the excellent and very easy to set up DroboFS with an appropriate FTP app.
  • At one end - backup/synchronisation software (like GoodSync or SyncBackSE) (about $50)
  • At one end - secure FTP server software (like Gene6 or your NAS's FTP software) (about $50)

The Set Up Guide - Quick Summary

  • Once you have your two machines set up and connected to the internet, you will need to set up the software at either end.
  • One end must have the secure FTP server software, the other must have the backup/synchronisation software.
  • Set up your backup jobs in the backup software, and make sure you have versioning enabled if you are using a synchronisation strategy.
  • Schedule the jobs to run nightly, so that the most you can ever lose is a day's work.
  • Check the email log daily to make sure everything is working as expected.

The Set Up Guide - In Detail

Long term data backup is all about redundancy of data in geographically disparate locations. Put simply, having a minimum of two copies, stored in distinctly different locations, is the only real way of achieving a really high reliable back up that can survive all of life's misadventures - such as failing hardware, or theft, or bushfire etc. It's not enough to have your backups stored somewhere else in your house or elsewhere on your own property as unfortunately as far too many bush fire victims have discovered. Even if you have an excellent, reliable backup system like the Drobo, that alone is not enough to prevent total loss through theft.

The first thing you must have is two physically separate locations such as your home and office, or you and a friend's house. For this to work easily and automatically, these two locations must both be connected to the internet using reliable ISPs that do not have quota restrictions - in this day of digital cameras, it is easy for just one wedding job to result in gigabytes worth of files. The back ups will run overnight, slowly but surely, so the speed is not nearly as important as the quota capacity of the system, unless perhaps you are generating gigabytes upon gigabytes every day, which very few people do in practice. Ideally both connections would offer a static IP address - they can be with different companies, it doesn't matter. This just makes things simpler as you always know what magic numbers you need to talk to either machine.

To give a concrete example, TPG offer unlimited internet plans over ADSL2+ and we have this at home and at work. The cost is $75 a month for each account, although this will almost certainly come down further soon. We get good speeds from these - downloads of up to 1.7Mb/s at work and 1Mb/s at home - work is very near to the exchange, home is about 3km and ADSL speeds are very sensitive to cable length from the exchange. Uploads are far less satisfying but 100kb/s is common. This isn't a big issue as we run a server computer at either end 24 hours a day, 365 days a year, so the speed of the system isn't that relevant as we just run the back ups after 7pm once the day's work is done. The per year running cost of a typical PC for a year is about $100 to $150 in electricity at time of writing - and of course are servers are used for lots of other tasks as well, so this is not really a big expense in the scheme of things.

Once you have your two servers up and running, and connected to the internet, you can move on to setting up your back up system. This is where it can get a little complicated, but hopefully this guide will get you going in the right direction.

The first thing to do is do a manual backup of all your data from one site to the other.This means you won't have to do an initial back up of terabytes of legacy data across the internet. This is a one-off establishment process that insures each site starts with a complete backup copy. From now on the backups will be automatic across the internet.

Next, we need to install the software. The easiest way is to designate one machine as the backup management machine (we'll call it the client), and the other as the FTP server (FTP = file transport protocol). You will still be able to back up both machines to the other this way, it just reduces the complexity and software costs. But you will need access to the client machine to manage the backups in this scenario - if you want to be able to manage the backups at either end, you simply duplicate the set up we're about to describe, running both the client software and server software on each machine.

Setting up The Client (managing the backups)

This machine requires the backup/synchronisation software. We use synchronisation software here as it makes the files very easily accessible at either end, but you can go for a more traditional back up approach if you prefer. A synchronisation system creates a mirror image of your file system on the destination machine which makes it super easy to retrieve files if you need to, where as a backup system would create a 'backup volume' on the destination system which would generally require special software to retrieve files from if you need to.

Note: Synchronisation on its own is actually a very bad backup system, for example, if you accidentally delete an important file on one end but don't notice till the next day, overnight the synchroniser will have propagated this change to your backup machine and deleted it there as well - not great! So if you do use synchronisation, make sure your software supports 'versioning', including for deleted files - these systems generally offer a fail-safe, keeping a copy of changed or deleted files for some period of time in case you realise you've made a mistake and want to retrieve the file.

We use SyncBackSE - it's a bit complicated but offers pretty much every option you could possibly want. GoodSync is an easier to set up alternative. It's important that whatever software you choose supports secure FTP for both your client backup software and your FTP server software. That is, it must support encrypted FTP, so that as you send your files over the public internet, they can't be snooped on.

In SyncBackSE, we simply define the backup jobs and SyncBackSE can define jobs where either end is the source and either end is the destination, which is why we need it on only one of the machines. The setup is typically easy - define a new synchronisation job with versioning and just point it at the source directory on one machine and the matching destination directory on the other machine. Tell it to skip any unimportant files that don't need backing up, and then give your profile a test run to see that the results will be as expected.

We get SyncBackSE to email us daily logs for checking - we can see which new files were copied across, which were deleted and so on. This means we spend 2 minutes a day doing a basic daily check that nothing has gone wrong and everything is working as expected. If it has, we can always retrieve the old files if we need to as we have versioning enabled.

Setting up The Server (with only an FTP server)

This machine simply has secure FTP software on it. We use Gene6 which is inexpensive, very secure, and has an easy administration interface. You just create an authorised FTP user, place some limits on them if you need to, such as allow connections from only the client's IP address, only allow secure FTP etc, and give them access to the appropriate source and destination directories on this machine. Basically it just sits there and waits for connections and pushes files in and out as needs be.

How safe is this? How convenient is this?

This system is about as safe as you can get - as long as you do read the daily log emails that come through. We've been using this system for more than 3 years now - at first with just really important files, and now with multiple terabytes wirth of data. In that time we've been able to recover from several very near mishaps, with failed hard drives being the most common incident. We are completely protected against fire, theft, flooding,anything really - even if one side of the system is utterly destroyed or goes AWOL, we have a perfect 'live backup' we can restore from at any time.

One issue that is not really catered for by this system as presented is so called 'disk rot' with older files/disks. This is a phenomenon where you have legacy files on an old disk, fully backed up, but a sector or group of sectors fail on the disk and because you're not accessing the file regularly, the system never really picks it up. Fortunately, disk rot is actually pretty uncommon, and modern self correcting file systems will often transparently pick it up and fix the issue. You'd have to be very unlucky for the same sectors on both the main copy and the backup to fail, so you should easily be able to restore from the backup. You could of course run some sort of regular checksum algorithm over your files at both ends to make sure they remain perfectly intact as an extra layer of security.

Convenience is excellent as well after the initial complexity of set up. You can essentially have a mirrored file system at either end, which makes restoring files at either end trivially easy, with no need to load special software to restore any files, just copy the files directly across, and the files are super easy to find as they're in the same place on both sides of the system. The backups are transparent and run overnight, with no negative impact on the performance of the machines involved. In fact, apart from the daily email check, the entire system simply disappears from view but carries on doing excellent work for you in the background - exactly how computers should work!

A more expensive and more limited, but much easier alternative

If you don't have an appropriate second site accessible, then you can use an online backup service. If you have unlimited internet access, this can become a viable option although for large data volumes, although it might mean an initial back up period of several weeks to upload your files. There will of course be ongoing fees for these sorts of services.

Mac software

For the client software, SuperFlexibleFileSynchroniser is a popular Mac equivalent to SyncBackSE. For the server, OSX actually has a built in FTP server - here's a simple set up guide. But the built in server is a bit too basic for my taste - Pure ftpd is more fully featured with a user interface guide here.