Photographee.eu - stock.adobe.co
Photographee.eu - stock.adobe.co
If organisations depend on their data, then they need robust backups – but making sure backup is effective is a challenge. Backing up business data has become easier over the past decade, through improvements in backup technologies, better-performing storage systems and the option to backup to the cloud. Firms no longer rely solely on cumbersome and potentially fragile physical backup media.
But these developments have added complexity at the same time, as firms face ever-greater volumes of data. To keep to recovery time objectives (RTO), IT teams need to ensure backups actually work. And to ensure that, backup infrastructure needs maintenance.
In its 2022 data protection survey, supplier Veeam found that organisations plan to increase spending on backup and disaster recovery by 5.9%. This is unsurprising: the previous year’s survey found that 37% of backup jobs, and 34% of recoveries, failed.
First of all, organisations need to understand where their data is, and its size and format. Structured data held in a database on an on-premise server is fairly easy to manage, for example.
Unstructured data, split across local NAS devices, private and public clouds, is much harder to track. And firms also need to consider data in software-as-a-service (SaaS) applications, in virtual machines (VMs) and potentially in containers, which can be much harder.
Staff planning backup and recovery also need to consider file and volume sizes. This applies to backup locally and to the cloud. Impacts here may include time constraints on backup windows and being able verify backups, but it also includes recovery. A cheap, cloud-based backup will be a false economy if it takes too long to recover files.
This links back to the organisation’s recovery time objective and recovery point objective (RPO). If recovery plans are not checked, they might not operate as planned. And if backup systems are not maintained, growth in data volumes, the size of VMs or even the number of containers could make recovery within the RTO impossible.
Ensure all data is backed up
Backup procedures often fail, not because of a technical fault or data corruption, but because a critical piece of data or even an entire application or VM is missed.
“For a backup strategy to work, and be both efficient and effective, an organisation needs to understand the data they’re working with, where it is, and the size of the datasets,” says Stephen Young, a director at AssureStor, which sells backup technology into the IT channel. “Taking time and resources to compile a comprehensive data map will help with your backup strategy.”
He adds that firms need to think about where data is backed up. On-premise backups are quick, but do not protect against disruption at the local site. Cloud backup will, but it relies on internet connectivity.
And, although companies now make more use of automated backup management and even artificial intelligence (AI), these are not infallible. “Unfortunately, there is no backup tool that detects infrastructure components that are not backed up,” says Alex MacDonald, chair of SNIA EMEA.
Failures can happen because resources are installed but not included in the backup policy, or because a new resource is installed and the owner does not consider backup. To fix this, all new systems should be covered in the backup policy by default.
Maintain and optimise backups
Up-to-date backup software is generally dependable, and solid-state storage is reliable. But spinning disk media can fail, tape has a recommended number of times it can be rewritten, and even SSDs have a finite lifespan.
IT departments should monitor hardware lifecycles and plan for replacements, but they should also use monitoring and reporting tools to maintain a view of the backup system.
“Backup technologies have gone a long way in reliability enhancements, and the move from tape to disk, including deduplication platforms have improved backup reliability tremendously,” says SNIA’s MacDonald. “Failures may still occur, but it is very unlikely to be attributed to storage.”
Reporting will give teams details of the last successful backup, and of any errors. They can then use this to pinpoint any areas of risk. Firms can use the ITIL Incident Management process to track failures.
Read more about backups
- Create your data backup strategy: A comprehensive guide. This data backup guide will help you if you’re starting the planning process, looking for a refresh or seeking new options. Backup plans are critical in today’s environment.
- Backup failure: Four key areas where backups go wrong. We look at the key ways that backups can fail – via software issues, hardware problems, trouble in the infrastructure and good old human error – and suggest ways to mitigate them.
SNIA is also seeing a move to use AI in backup applications, “to relieve administrators from reviewing thousands of backup jobs nightly”. Making multiple copies of important data further reduces the risks.
And effective maintenance needs to extend to virtual environments, including VMs and containers, too. If these services have changed since the backup software was installed, they might not be backed up properly.
“Nowadays, when it is so easy to spin up virtual machines, things can be missed from the backup,” says Adrian Moir, principal engineer at IT management firm Quest. “Provisioning methodology that automatically picks up new sources and includes notes on notifying whoever is responsible for a backup can improve backup efficiency.”
Organisations should keep the volume of data they backup under review. For on-premise systems this is important when optimising and upgrading hardware. It is even more important with the cloud. Cloud storage’s elastic nature allows firms to store ever more data. This adds to costs, and can make restores impractical.
Last, for physical backups, consider access to media. Where are tapes held, and how quickly can they be retrieved? Firms should assess the security of off-site and main site locations.
Test, and test again
No backup and recovery plan is effective without testing, and testing is a key part of the maintenance cycle.
IT teams need to test that backups work and, critically, whether they can recover from them. This includes restores from and potentially to the cloud, for example, by spinning up not just storage but compute instances, too.
“Some organisations perform periodic test restores to ensure backups are working correctly, while others track production restores and validate that each production instance or resource is restored at least once a year,” says Alex MacDonald at SNIA.
Testing is the most effective way to spot configuration errors, faults, corrupted backups and failures in the backup plan. Testing can be disruptive, especially to production systems, but it is worth it.
Even though backup software is generally reliable, misconfiguration, including files being open during the backup process, or a firewall between a client and backup server, does cause failures.
“Companies must create a backup and recovery testing policy, to make sure that everything runs and restores smoothly,” says Adrian Moir at Quest. “If a backup solution is not set up to notify you about it automatically, then only testing can reveal that.”
Plan for backup product upgrades
Finally, although not strictly a maintenance issue, IT teams should plan for backup software upgrades.
Backup systems do need maintenance and patching. Firms need to deploy supplier updates, as well as operating system updates and patches. This is even more important given the prevalence of ransomware. Ransomware will seek out vulnerabilities, including in backup tools.
Vendors are adding new capabilities, too, including ransomware protection with support for immutable backups, better support for VMs and for containers.
Backup tools, like any other software, is prone to “technical debt”, becoming less efficient over time. Older software might be slower, less robust or have poorer reporting. This is in addition to security patches.
IT teams should keep on top of supplier upgrade cycles, so they can plan updates around their own workloads and ensure there is enough time for testing.