Everything about Data Backup, Data Restore & Data Archiving
This backup and archive page was originally written in 2010 when cloud and flash storage didn’t exist and tape ruled, we have updated the information to reflect the technology and times in 2020, enjoy!
Time and time again I hear the word “backup” and the problems associated with it. Well this document will hopefully convince people and companies that if they archive more, they can backup less!
Why do we need to perform a data backup?
- In the event of a disaster we can restore to a point in time prior to the event (RTO).
- To restore data in the event of malicious or accidental data erasure.
- To protect against virus’s and hackers.
- Restore data from one system to another.
- Just in case anything goes wrong with a system upgrade.
Data Backup Issues
The world today is generating information faster than ever before. The issue is that we are not going to get more hours in a day; the backup window is ever decreasing, and companies are fast running out of ideas to back up live data within the time-frame they have.
This is where backup software that supports CDP continuous data protection can really help with the diminishing backup window.
Increasing Data Volumes
Data is typically growing faster than current backup systems can cope. Today some companies are selectively backing up data daily, in order to at least have 1 copy every 48 hours; this issue is not going to get easier and should be resolved.
In 2020 44 ZB (Zettabytes) of data will be created and it’s increasing. By 2025 IDC estimates we will be creating 463 EB (Exabytes) of data daily or 168 ZB annually, this is 4-5x the increase over 2020 estimates.
I have lost count as to how many times people have mentioned “the backup failed again last night”. They can fail for so many reasons and it’s not all down to human error. With the ever-increasing data sprawl, trying to keep a handle on this problem is always going to end in tears. In order to manage our backups we firstly need to manage our data more efficiently by prioritising the data in to more manageable chunks. Rather than create a single backup job to backup everything, try to prioritise and create clearly defined backup jobs. At least this way it should help to find out the individual causes of backup failure, rather than reading a huge report that lists everything, the reports are easier to read and analyse. Ensure you then resolve the route cause and run the backup again.
Things to consider about data backup:
- Do you need to backup desktops or personal data?
- Do you have the correct agents to backup your operating systems or platforms?
- Should you be looking at an alternative supplier or vendor?
- When did you last replace your backup tapes?
- When did you last clean your tape drive?
- Do you need to backup everything?
- Have you removed or added additional servers or applications?
- Do you have networking issues?
- Do you have the correct access permissions to backup the servers?
- Who has access to the backups and what level can they see?
- When was the last time you performed a restore?
If you backup everything fantastic, but what if you need to restore everything if disaster strikes? Time is precious, backup is just a matter of housekeeping and continual management, by reducing data volumes will dramatically increase your chances of achieving a 100% success rate.
If you want to read our latest page on backup click here.
Every so often a new faster technology comes along to hopefully alleviate some of the issues we are experiencing with backup.
These technologies only provide a short-term fix to a long-term problem.
- LTO-8 – Much greater tape capacity 12TB native, and much faster (360MB/s vs 300MB/s) LTO-7
- VTL – A disk array with software that mimics a tape library
- Deduplication – Looks for common blocks and creates a hash table
- Compression – Gives us more for less, providing the data is compressible
- RAID Array – Cheaper than a VTL and uses the backup software to create the virtual tape library
- When a new version of backup software comes out. Do we upgrade or forklift upgrade to something else?
- Does the SAN and virtualised environment need additional backup agents?
These are direct costs associated with backup and always increase yearly due to the following:
- Purchase new or upgrade existing technology
- Software and hardware maintenance renewals
- Purchase additional backup software and agents
- Purchase additional tapes, disk storage
- Increased offsite storage costs for DR or cloud
The indirect costs associated with backup are as follows:
- Purchase / upgrade tier 1 storage to cope with increased data volumes
- Slower performance due to increased disk fragmentation
- Greater deployment of servers and management
- Increased running costs
- Increased staffing costs
A backup agent connects your servers and or desktops to allow backups to be performed. These agents might also be application based such as Exchange, SQL, Oracle, VMware etc or Open File agents that can backup databases without the need to shut down the applications. To ensure a 100% successful backup night after night is nigh impossible. There will always be something that goes wrong, permissions, missing file, volumes offline etc. As the number of applications grows, so does the complexity.
Data Deduplication reduces disk space by using algorithms to cleverly remove duplicate data blocks. Let’s explain how data deduplication works.
Data deduplication is a technology that initially creates a full backup of your data, but not as files as hash tables. These data deduplication use processes such as pre, post or in-band let me explain.
- Pre – The data is analysed and hashed before landing on the disk. Usually requires a large amount of cache and fast processors.
- Post – A backup is taken, and the file is stored on disk, once the backup is finished the hashing takes place. Requires more disk than the Pre process, but typically faster.
- In-band – As the files are backed up, they are on the fly hashed and then placed on disk. Requires fast processors and not as fast as the previous two.
In addition to the above they can also be:
- File level – data deduplication typically provides compression rates up to 5:1 and removes repetitive files
- Fixed block – data deduplication provides compression rate up to 10:1. Views the blocks of a given fixed size and removes the repetitive blocks.
- Variable block – data deduplication by far the best and most flexible technology providing compression ratios of 50:1 or more, although in real world applications you will get between 20- 30:1
Now whatever data deduplication process we deploy, they all effectively do the same job. We now have a full backup and know it consists of repetitive blocks. The next time we perform the backup, we only need to backup the changed blocks and append the hash tables accordingly. The diagram below will hopefully explain how this works in more detail.
Data deduplication is a great technology providing your data isn’t video based, audio, images or other media that is already compressed. Efficiencies improve the more data you backup, therefore if you backup 5 Windows Server 2016 machines and then another 5, you are backing up the same files across 10 machines and hence increased compression ratios.
Hard disk, tape or cloud?
This depends on your corporate compliance and governance legislation.
Advantages of Cloud:
- Unlimited storage capacity
- Flexible cloud storage tiers
- Perfect for second backup copy
Disadvantages of Cloud:
- Increasing costs over time
- WAN bandwidth
Advantages of hard disk:
- High capacity (16TB 2020)
- High speed
- Low cost
- Great for restores
- Perfect for on-premise to DR or Cloud replication
Disadvantages of hard disk:
- Not portable
- Energy consumption
Advantages of tape:
- Tapes are portable
- Tapes are low cost
- Tapes are energy efficient
- Tapes can backup huge amounts of information
Disadvantages of tape:
- Not particularly suited to individual file restores
- Require the correct tape drive and backup software to read them
- They degrade over time
- Not suitable for archiving
Encrypting data will introduce a backup and restore overhead. There are many stories of encryption keys lost or stolen, but there are also stories about businesses that did encrypt their data and can’t recover it.
This could be lost encryption keys, faulty systems etc. Police Forces do not like encryption as it adds a layer of complexity that causes more headaches than it solves. If you have procedures in place to protect your encryption systems, then this is a good idea. If, however you change systems, software or lose the encryption keys what will you do with all your historical backup data?
When synchronising data across wide area links it is an important consideration to find out the available bandwidth for synchronisation purposes. Below is a table of typical transfer speeds. Another thing to consider is when the synchronisation will be taking place, day/night/weekends, quality of the link and the actual available bandwidth that is available, excluding other normal network traffic. All of this can have a dramatic effect on our ability to store data offsite.
Below is a simple table showing how much data can be transferred across a WAN link.
No matter what backup software a company chooses, daily management reports are run to ensure the backups ran successfully. The more data, servers and applications we add, the more time and complexity our backups become. Companies employ staff to manage their backups full-time why? It’s all down to paranoia, we add more of everything and our worries grow that something somewhere is going to fail.
When we founded Fortuna Data 26 years ago, staff would spend 1-2 hours a day on checking data backups, so what went wrong, surely today systems and technology are more reliable than they were? Yes, systems and software are more reliable and advanced, but the problem is we have far more data and servers performing tasks. Businesses that virtualise their infrastructures have 50-300 servers running on much bigger and more powerful hardware, if someone pulls the plug on this, we lose a large number of virtual machines and not just one! So, the worry here is how to make the virtualised infrastructure highly available against such an event occurring.
Regulation & Compliance
Today more and more businesses are facing legislation to secure their data for regulatory, GDPR compliance or corporate governance. Trying to identify what should be kept or deleted becomes more difficult every day and for how long do we need to keep it? The problem is backup software isn’t designed to identify aged, changed or modified data. All it is designed to do is reset the archive bit, for the next backup. The challenge companies are going to face is where and how to store this legal information.
Data retention is defined by corporate or legal requirements. How long we have to archive and retain data is defined by these rules and policies.
How long do we need to keep our backups? Typically, an organisation would create a backup schedule as follows:
- Daily backups – Kept for 2 weeks
- Weekly backup – Kept for 1 month
- Monthly backups – Kept for 1 year
- Yearly backup – Kept indefinitely
A huge emphasis is placed on backup with little thought for the primary focus of backup and this is “RESTORE”. You have vaults full of backup tapes, but it’s no good if you changed your backup software 18 months ago! How many companies keep their backup tapes and never bother to see if they can perform a restore on their legacy data, if they implement new hardware/software? The answer is not that many and who is going to admit to this oversight.
A restore of your information should be performed at least quarterly on a variety of different data types backed up from various servers and not the same server every time. At least this way should something serious occur you can be sure at least the restores will work as planned.
Over the century’s humans have tried to keep a history of events, discoveries and tragedies. Initially man carved etchings into stone or wood, then man discovered paper for almost 2000 years this was the way knowledge was passed on for future generations. Now in the 21st century a need to keep our digital history is becoming a huge headache for many businesses.
“The latest figures indicate that during 2007-2010, humans created more data than in the previous 2000 years!”
In the world we live in today we are constantly generating information in the form of video, sound, e-mails, texts, music, databases, images, office documents, PDF’s, and many other varieties of data are created daily. Some of this information is kept for a day whilst much of this information needs to be kept for a month a year or more than 50 years. We keep this information for several reasons; it could be legislation, regulation, insurance, assurance, research, intellectual property, personal, historical, medical or many others.
I have been involved in the data storage industry for 38 years, during that time I have seen numerous technologies succeed and then fail because their development roadmap was to ambitious or someone designs, develops, invents something that appears better, cheaper, faster, easier to use and more reliable than what is currently available.
We are human and because of this we are seduced into purchasing these new revolutionary technologies. Unfortunately we do not always consider the long-term downsides of adopting these new products and it isn’t until we come to upgrade, maintain and enhance this product that we realise it is not as revolutionary as first thought and we are locked in to a solution that cannot evolve with our day to day business demands.
As mentioned throughout this site the issues of backup are becoming increasingly problematic. A failure to address the fundamental problem that data volumes are overwhelming the primary storage systems will inevitably result in data loss.
80% of stored data is inactive after 60 days
Numerous businesses have a huge on-going investment in purchasing data storage systems and this investment is increasing year on year as the demands to store more information increases. After 3-5 years this equipment is then replaced.
With the implementation of a Data Archive solution we can actually extend the life of this investment by moving the data to a secure active archive, thereby freeing up valuable disk space on high performing storage solutions and slowing down the necessary and ongoing investment of more storage space giving a huge ROI benefit. An additional benefit with a tiered data archive is that you might be able to utilise your existing older storage systems to archive data.
As a consequence, the size of online data volumes are spiralling out of control and storage management has become an ever-increasing challenge.
- Server performance and data access are diminishing
- Business legislation and user demands are requiring companies to increase disk space to alleviate the problem
- Data management puts high overheads on networks
- DR policies for recovery take longer as all data needs to be restored rather than the most recent and essential
- The annual cost of managing this data over its lifetime is more than 5x the initial purchase price of the equipment
- Backup windows are diminishing
Buying bigger and faster storage systems is not solving the problem of data identification. How we classify our data is going to become more of an issue than people first thought. Now we know tools exist to move data by date, last modified, file size, extension, user, file type etc, but how many companies are using this technology?
Once we have an archive, whether this is static, near line or offline you need to consider whether the archive will remain on the same site as the primary data or relocate the archive to another office or even replicate the archive regularly to the DR site. In the event of a disaster your primary concern should be to restore your most recent information first and then restore the archive if required.
All information that we generate has a cost, companies often mention that the latest data residing on tier 1 storage has the most value, I do not agree with this statement for simple reason that important architectural / engineering drawings, films, health or legal documents etc are important long after they were initially created and if anything are more valuable than when first stored.
Archiving will achieve the following:
- Save tier 1 disk space
- Reduce management costs
- Free up valuable system resources
- Increase the ROI on existing storage
- Reduce capital expenditure by deferring upfront purchases, energy and prolong equipment life
- Reduces the number of backup tapes or disk space required
- Enable you to backup less
- Perform faster restores
- Allow you to achieve greater RPO (Recovery Point Objectives)
- Create a bigger backup window
- Enforce and distinguish business data from personal data
- Protect the data that is most important to the business
Current archive storage technologies
Below is a list of differing storage technologies which we could utilise to provide an archive. Storage technologies and archive life of media.
Access speeds and energy consumption of the devices.
Reliance on other technologies to read the data.
No matter what type of archiving technology you choose it is important to remember that the media is more important than the device. If you need to keep media for a considerable time, always buy branded media from a known entity and ask for a certificate of conformance. These are sometimes difficult to obtain but should state that the media is certified for “x” years archive life. It might cost you a few pence more in the beginning, but it could save you thousands over the lifetime of the archive.
To ensure a long-term archive you should ideally make three copies of the media and store it in three separate locations, if possible on three differing types of technology in the hope that in 50 years one of the technologies is still around to read the stored information.
When purchasing any media for backup or archive it is important to remember that the technology is only as good as the amount of money you have available to protect the data. For example, a backup tape might have a manufacturer’s life of 30 year these claims are made because the tape has been tested in perfect conditions and accelerated tests to achieve the results. After 29 years and 364 days the tape fails you will be sent a new tape from the manufacturer, the manufacturer’s warranty only covers the tape and not the contents i.e. the data!
Unless you are prepared to keep your tape in perfectly controlled conditions, you will never get a 30-year life out of any tape.
Many years ago, I worked for a company that sold tape drives and media to the Middle East and for some unknown reason these drives started to fail dramatically. The request was made to return 50 of drives back to the UK for investigation and another 50 promptly dispatched. When we received the drives back and examined the problem, we saw the issue was down to rust! The tape drives and media were being used in an environment that had high humidity and no air conditioning.
The cost of storing information increases yearly, this could be storage space, media costs, energy consumption, replacement costs etc. Clearly the archive media we choose needs to provide greatest long-term costs savings for preserving the information and this is the challenge.
If you want to know more about Active Archiving click here.
With any technology you should perform the following:
- Periodically do a sample check on your archived media to ensure that the integrity of the media and data is still intact.
- Perform an analysis of current technologies and decide whether to keep or transfer your archive to a newer technology
- Ensure that the equipment is fully maintained and that you continue to purchase branded archive grade media
- Should you decide to change operating systems, always check compatibility with your archived data
- Ensure that the technology you are purchasing has a long track record and is not in the process of selling off products to please investors
- Do not de-dupe or encrypt archived data as ideally if you do need to restore the information in 20 years you want it in native format.
- If you are required to store data for 10+ years put it on optical it’s far safer and will save you £’s in the long run
Originally this document was wrote in 2010 and it is surprising how much is relevant today. In 2010 the cloud never existed, virtualisation was taking shape and all flash storage was a pipe dream.
We spoke quite a bit about tape drives being the main reason for backing up large volumes of data, today this can be handled with 16TB disk drives. Object storage and high-speed Fibre Channel Networks of 32Gb/s and Ethernet Networks 10/25/20/100GbE are appearing to increase data throughput speeds and unlimited storage space!
Deduplication now utilises flash storage to perform instant restores of data. The problems of using both deduplication and encryption is it just doesn’t work.
Most backup software today has agent-less backup, although for some physical servers and applications a physical backup agent may still be required.
Replicating backup data to the cloud, makes perfect sense providing cloud cost guidelines are adhered.
Archiving systems can now migrate data directly to the cloud and no longer need to leave a stub.
The main change in backup software over the last 10 years is in management and simplicity, especially if your environment is virtualised!
Backup Software Simplified
Backup software today is far easier to manage, deploy and use. No longer should someone have to monitor backups for hours, the software once set should provide easy to read reports and the graphical user interface is easy to use, simply create a backup job, set the frequency of backup, apply it to a policy and it can automatically add machines when they are created.
Software licensing in the past was based on a number of factors i.e. capacity under license, number of VM’s, sockets, agents etc.
We work with backup software vendors that can provide a single price for your entire enterprise estate!
There is no Silver Bullet to the issue of Backup or Archiving. By careful planning a happy medium should be achieved to ensure your most recent data is backed up daily and archived information retained for “X” days, months, years.
An archive needn’t be permanent; it could be a SATA disk array or cloud storage for data that only needs to be backed and archived up once a month. Just deploying this affordable storage would free up your tier 1 disk space, removing non-corporate data from the network would free up disk space, reduce network traffic and reduce your backup window. These simple strategies can be easily achieved with a little time, thought and money. We can provide a solution that, really will Backup Less and Archive More.