Q: When is a backup not backup?

A: When it’s an archive.

Now say it like a mantra:

A backup is not an archive. A backup is not an archive. A backup is not an archive.

We all know this surely? Well, it turns out we don’t. Lots of products without archive features will tell you that archive is dead, just use a long term backup and shove it off to S3, or tape, or something. In this particular case, my rage has been triggered by a customer (who shall remain nameless) who needed a quick fix for regulatory compliance and started “archiving” their backups by switching retention to infinite. Simple. Except, now they now have somewhere in the region on 14PB of infinite retention file and application backups and want to move to some shiny new backup software. Good luck with that.

Yes, there are options, you can migrate, even automatically, but it is really, really difficult. There are a world of edge cases and take it from me – I spent four years designing automated migration for a company acquired by IBM. It’s edge cases all the way down. There are companies who will take your tapes or a copy of your disk and provide restores as and when you need, but they rely on unsupported reverse engineered implementations of backup products and this is hardly something you want to bet your bottom dollar on.

Ok, so don’t do infinite retention backups, you don’t like them, but why?

File level is fine, actually, do keep file level backup for long periods, you’re pretty much always going to be able to get file level backups back. But don’t keep application backups, array snapshots or – worst of all – NDMP backups for anything other than short to medium retention operational recovery.

Let’s take NDMP as an extreme example – If you have long term retention backups in NDMP and you want to change your NAS provider, or leave NAS altogether and use Windows/Linux/UNIX servers to provide your file services, you have a problem. The problem is simply this: The data encapsulated within an NDMP data stream is proprietary. In order to restore NDMP, you need to keep the hardware that supplied it hanging around, tested, supported and operational, just in case. If that hardware were to go out of support and break, you either have to buy new hardware (if suitable hardware is available, and it’s a big IF) or just abandon your NDMP backups. You could migrate by restoring the data and backing it up again at file level using an SMB/NFS mount on a server, however this will be tedious and surprisingly difficult to automate (IBM can help here!). It would be far better to backup snapshots at the file level in the first place, using various features which are available such as simple SMB/NFS mounts (doesn’t really scale!), NetApp’s Snapdiff backups or MAGS automated high speed multistream file level backups. These processes can typically also be used to archive files as well. Or you may chose to perform operational backups/restores with NDMP and file level for long term.

Ok, so NDMP is an extreme case, but it’s the same for pretty much all other applications – VMware and Hyper-V need to be restored to their own hypervisors and hypervisor virtual hardware and backup processes do change. Exchange, MSSQL, Oracle, DB2 and all the rest all backup in one proprietary format or another, this is fine – even desirable – for backups, but how do you protect these systems long term? You need proper archive, get the data out and store it in a format which can be pulled back as and when you need it – this is typically some sort of plain old file.

Another case – an Exchange backup from 10-15 years ago is next to useless, you don’t know which version of exchange it was from (your backup software likely won’t tell you) and even if you do build a suitable Exchange server getting the version of Exchange correct, what other support infrastructure is required? Does the backup product even still support the older versions of Exchange? (There’s more than one case where the answer is a flat out: No, you just can’t do this.) You’re probably thinking that you’ll just do an item level restore for the mails you want? The problem is you’ll need to mount the Exchange database on a server that can read the database version, in order to do this… However if your Exchange mails are stored in an archive as individual objects you can use your email archive software to pull these files back trivially easily, with no problem at all.

In summary: If you need to keep it long term, it needs to be an archive of a plain old file.