There’s new features in the multi-award winning IBM Cloud Object Storage. Along with the standard S3 access protocols, SMB and NFS are being added via a VM gateway. This will enable customers with legacy storage requirements, which can’t be retro-fitted for S3 to access the platform and the massive economies of scale it delivers.
A: When it’s an archive.
Now say it like a mantra:
A backup is not an archive. A backup is not an archive. A backup is not an archive.
We all know this surely? Well, it turns out we don’t. Lots of products without archive features will tell you that archive is dead, just use a long term backup and shove it off to S3, or tape, or something. In this particular case, my rage has been triggered by a customer (who shall remain nameless) who needed a quick fix for regulatory compliance and started “archiving” their backups by switching retention to infinite. Simple. Except, now they now have somewhere in the region on 14PB of infinite retention file and application backups and want to move to some shiny new backup software. Good luck with that.
Yes, there are options, you can migrate, even automatically, but it is really, really difficult. There are a world of edge cases and take it from me – I spent four years designing automated migration for a company acquired by IBM. It’s edge cases all the way down. There are companies who will take your tapes or a copy of your disk and provide restores as and when you need, but they rely on unsupported reverse engineered implementations of backup products and this is hardly something you want to bet your bottom dollar on.
Ok, so don’t do infinite retention backups, you don’t like them, but why?
File level is fine, actually, do keep file level backup for long periods, you’re pretty much always going to be able to get file level backups back. But don’t keep application backups, array snapshots or – worst of all – NDMP backups for anything other than short to medium retention operational recovery.
Let’s take NDMP as an extreme example – If you have long term retention backups in NDMP and you want to change your NAS provider, or leave NAS altogether and use Windows/Linux/UNIX servers to provide your file services, you have a problem. The problem is simply this: The data encapsulated within an NDMP data stream is proprietary. In order to restore NDMP, you need to keep the hardware that supplied it hanging around, tested, supported and operational, just in case. If that hardware were to go out of support and break, you either have to buy new hardware (if suitable hardware is available, and it’s a big IF) or just abandon your NDMP backups. You could migrate by restoring the data and backing it up again at file level using an SMB/NFS mount on a server, however this will be tedious and surprisingly difficult to automate (IBM can help here!). It would be far better to backup snapshots at the file level in the first place, using various features which are available such as simple SMB/NFS mounts (doesn’t really scale!), NetApp’s Snapdiff backups or MAGS automated high speed multistream file level backups. These processes can typically also be used to archive files as well. Or you may chose to perform operational backups/restores with NDMP and file level for long term.
Ok, so NDMP is an extreme case, but it’s the same for pretty much all other applications – VMware and Hyper-V need to be restored to their own hypervisors and hypervisor virtual hardware and backup processes do change. Exchange, MSSQL, Oracle, DB2 and all the rest all backup in one proprietary format or another, this is fine – even desirable – for backups, but how do you protect these systems long term? You need proper archive, get the data out and store it in a format which can be pulled back as and when you need it – this is typically some sort of plain old file.
Another case – an Exchange backup from 10-15 years ago is next to useless, you don’t know which version of exchange it was from (your backup software likely won’t tell you) and even if you do build a suitable Exchange server getting the version of Exchange correct, what other support infrastructure is required? Does the backup product even still support the older versions of Exchange? (There’s more than one case where the answer is a flat out: No, you just can’t do this.) You’re probably thinking that you’ll just do an item level restore for the mails you want? The problem is you’ll need to mount the Exchange database on a server that can read the database version, in order to do this… However if your Exchange mails are stored in an archive as individual objects you can use your email archive software to pull these files back trivially easily, with no problem at all.
In summary: If you need to keep it long term, it needs to be an archive of a plain old file.
Here is the announcement for Spectrum Protect 8.1.7 and Spectrum Protect Plus 10.1.3. To go GA on 22nd/Feb.
A few bits that strike me personally are:
Spectrum Protect Plus gains:
- Exchange Support! Yay!
- Protect Plus Server HA support
- Much improved long term/DR offload to Spectrum Protect and S3 directly, including IBM COS retention lock for immutability. Now supporting all workloads from all sources (Hyper-v, VMware and Windows, Linux, UNIX Applications)
- Mongo DB support for virtual and physical servers including data re-use.
Spectrum Protect gains:
- Retention sets!!! The long time Achiles Heel of progressive incremental has been the inability to store individual long retention backups from a single agent/client, requiring two or more instances of an agent. No Longer! Individual backups can now be retained longer to allow for different retentions for weekly, monthly, yearly, backups.
I’ve just been speaking with one of our storage sellers on the subject of immutability (Cyber Resiliency is the new black, dontchaknow?), the subject of “nearline” came up and they couldn’t understand why this may help or even differentiate from normal disk storage. After talking around the subject, it turned out that the seller was primarily a block storage sales person and we were talking about tape.
So what’s the difference?
“Nearline” when talking about removable media is a term to differentiate media which are stored offline but can be brought back online automatically. The most common technology to do this would be a tape library, where robotics move a piece of media from a shelf to a drive. Optical disk autoloaders would also be a nearline technology.
“Offline” when talking about removable media requires human interaction to bring the media online, this would be tape on a shelf, optical platter, disk cartridge or even your CDs/Vinyl (I may be showing may age with that last one…)
Now, when we’re talking about disk technologies “Nearline” is almost totally different. A Nearline SAS disk AKA “NL-SAS” is essentially mechanically SATA hardware with a controller that presents as SAS. They are less performant than SAS disks and also SSD/Flash storage but more dense, so individual disks come in larger sizes. These disks are, however, more performant than traditional SATA equivalent and an existing array can accept them without modification.
It seems that the use of the name “Nearline” for these disks comes to distinguish them from the traditional primary, high performance online storage tier.
It is important to make sure that anyone you’re speaking to on the subject of Cyber Resiliency, Immutability and more general data protection, fully understand what’s being spoken about when we talk about Nearline. Many companies have good cause to muddy the waters when comparing these technologies, what with having poor-to-no truly nearline or offline storage options.
Ok, so you think you know the Spectrum Storage portfolio? Guess Again! I’d like to introduce the newest member of the portfolio “Spectrum Discover“.
So, what is Spectrum Discover?
Simply put, it’s a metadata analysis system, for analysing Spectrum Scale and IBM Cloud Object Storage datastores, it comfortably operates at Exabyte scale. So, you’ve not got an Exabyte? well, it’s still good for you, it operates from the smallest to the largest datastore equally well.
Event-notifications and policy-based workflows to automate metadata ingestion and metadata indexing at exabyte-scale
Fine-grained views of storage consumption based on a wide range of system and custom metadata.
Fast, efficient search through exabytes of data, resulting in highly relevant results for large-scale analytics.
Ability to quickly differentiate mission-critical business data from data that can either be deleted or moved to a cheaper, colder tier.
Policy-based custom tagging that enables organizations to classify and categorize data and align this data with the needs of the business.
A Software Developers Kit (SDK) to build Action Agents that extract metadata from file headers and content, automate data movement and provide integration to open source software, such as Apache Spark, Apache Tika, PyTorch, Caffe and TensorFlow, which facilitates data identification and speeds large-scale data processing.
Pretty Cool huh?!
Find out more over at the IBM Spectrum Discover web site.
Here it is, a dual release of Spectrum Protect 8.1.6 and Spectrum Protect Plus 10.1.2.
There are oodles of new features, the details of which you can find at the following links:
I’ll just list a few of the new features that are of interest to me:
- Tiering of inactive data to Object storage, with granular age/exception rules
- Ransomware detection has been beefed up and now covers virtual machines as well as physical machines
Spectrum Protect Plus:
- New GUI, much better drill down to data from GUI elements, etc.
- DB2 is now supported for data protection and reuse, physical and virtual
- Encryption at rest in vSNAP
You can download a free 30 day trial of Spectrum Protect Plus from the IBM web site, it installs for PoC use cases in under 15mins, dive in, have a go:
Ok, we’ve got a whole load of Spectrum and IBM storage related announcements.
Here’s a blog from Eric Herzog over at LinkedIn: https://www.linkedin.com/pulse/ibm-storage-delivers-new-solutions-your-multi-cloud-eric-herzog/
However The Register did a nice job in bullet pointing everything we’ve just announced, take a deep breath and plough on through:
- Spectrum Protect’s automated tiering has been extended to a object storage tier,
- Spectrum Protect Plus (SPP) will have vSnap repository encryption,
- SPP has expanded item-level support in application environments including newly-added IBM Db2, with existing SQL server and Oracle databases,
- Cloud Object Storage (COS) can now be a backup/archive target for mainframe z/OS,
- COS can be target for DS8800 arrays with in-flight encryption,
- hardware vendors can self-certify their COS support, sending verification test results to IBM for validation, meaning a faster certification process,
- COS v3.14 will support Lenovo SR630 and SR650 servers,
- Spectrum Scale (parallel filer software) v5.0.2 now has AWS support with bring-your-own licence functionality and AWS QuickStart for rapid deployment,
- Spectrum Scale has added file audit logging, a watch folder, an improved user interface and greater network resilience,
- IBM’s Elastic Storage Server (ESS) has preconfigured systems and implementation services to support NAS protocols,
- IBM Cloud Private has added Spectrum Access Blueprint support for IBM Z,
- there are new multi-cloud IBM storage options for for SAP and EPIC Electronic Health Records, and (whew)
- the FlashSystem 9100 array now offers VDI.
Phew. And that’s not to mention the converging of a Z Server with Flash storage in a single 19″ rack, the new DS8882F and an upgrade path for older DS arrays.