This data backup and data archiving page was originally written in 2010 when cloud and flash storage didn’t exist and tape ruled, we have updated the information to reflect the technology and times in 2020, enjoy!
Time and time again I hear the word “backup” and the problems associated with it. Well this document will hopefully convince people and companies that if they archive more, they can backup less!
The world today is generating information faster than ever before. The issue is that we are not going to get more hours in a day; the backup window is ever decreasing, and companies are fast running out of ideas to back up live data within the time-frame they have.
This is where backup software that supports CDP continuous data protection can really help with the diminishing backup window.
Data is typically growing faster than current backup systems can cope. Today some companies are selectively backing up data daily, in order to at least have 1 copy every 48 hours; this issue is not going to get easier and should be resolved.
In 2020 44 ZB (Zettabytes) of data will be created and it’s increasing. By 2025 IDC estimates we will be creating 463 EB (Exabytes) of data daily or 168 ZB annually, this is 4-5x the increase over 2020 estimates.
I have lost count as to how many times people have mentioned “the backup failed again last night”. They can fail for so many reasons and it’s not all down to human error. With the ever-increasing data sprawl, trying to keep a handle on this problem is always going to end in tears. In order to manage our backups we firstly need to manage our data more efficiently by prioritising the data in to more manageable chunks. Rather than create a single backup job to backup everything, try to priorities and create clearly defined backup jobs. At least this way it should help to find out the individual causes of backup failure, rather than reading a huge report that lists everything, the reports are easier to read and analyse. Ensure you then resolve the route cause and run the backup again.
If you backup everything fantastic, but what if you need to restore everything if disaster strikes? Time is precious, backup is just a matter of housekeeping and continual management, by reducing data volumes will dramatically increase your chances of achieving a success rate.
Every so often a new faster technology comes along to hopefully alleviate some of the issues we are experiencing with backup.
These technologies only provide a short-term fix to a long-term problem.
A backup agent connects your servers and or desktops to allow backups to be performed. These agents might also be application based such as Exchange, SQL, Oracle, VMware etc or Open File agents that can backup databases without the need to shut down the applications. To ensure a successful backup night after night is nigh impossible. There will always be something that goes wrong, permissions, missing file, volumes offline etc. As the number of applications grows, so does the complexity.
Data Deduplication reduces disk space by using algorithms to cleverly remove duplicate data blocks. Let’s explain how data deduplication works.
Data deduplication is a technology that initially creates a full backup of your data, but not as files as hash tables. These data deduplication use processes such as pre, post or in-band let me explain.
Now whatever data deduplication process we deploy, they all effectively do the same job. We now have a full backup and know it consists of repetitive blocks. The next time we perform the backup, we only need to backup the changed blocks and append the hash tables accordingly. The diagram below will hopefully explain how this works in more detail.
Data deduplication is a great technology providing your data isn’t video based, audio, images or other media that is already compressed. Efficiencies improve the more data you backup, therefore if you backup 5 Windows Server 2016 machines and then another 5, you are backing up the same files across 10 machines and hence increased compression ratios.
This depends on your corporate compliance and governance legislation.
Encrypting data will introduce a backup and restore overhead. There are many stories of encryption keys lost or stolen, but there are also stories about businesses that did encrypt their data and can’t recover it.
This could be lost encryption keys, faulty systems etc. Police Forces do not like encryption as it adds a layer of complexity that causes more headaches than it solves. If you have procedures in place to protect your encryption systems, then this is a good idea. If, however you change systems, software or lose the encryption keys what will you do with all your historical backup data?
When synchronising data across wide area links it is an important consideration to find out the available bandwidth for synchronisation purposes. Below is a table of typical transfer speeds. Another thing to consider is when the synchronisation will be taking place, day/night/weekends, quality of the link and the actual available bandwidth that is available, excluding other normal network traffic. All of this can have a dramatic effect on our ability to store data offsite.
Below is a simple table showing how much data can be transferred across a WAN link.
No matter what backup software a company chooses daily management reports are run to ensure the backups ran successfully. The more data, servers and applications we add, the more time and complexity our backups become. Companies employ staff to manage their backups full-time why? It’s all down to paranoia, we add more of everything and our worries grow that something somewhere is going to fail.
When we founded Fortuna Data 27 years ago, staff would spend 1-2 hours a day on checking data backups, so what went wrong, surely today systems and technology are more reliable than they were? Yes, systems and software are more reliable and advanced, but the problem is we have far more data and servers performing tasks. Businesses that virtualised their infrastructures have 50-300 servers running on much bigger and more powerful hardware, if someone pulls the plug on this, we lose a large number of virtual machines and not just one! So, the worry here is how to make the virtualised infrastructure highly available against such an event occurring.
Today more and more businesses are facing legislation to secure their data for regulatory, GDPR compliance or corporate governance. Trying to identify what should be kept or deleted becomes more difficult every day and for how long do we need to keep it? The problem is backup software isn’t designed to identify aged, changed or modified data. All it is designed to do is reset the archive bit, for the next backup. The challenge companies are going to face is where and how to store this legal information.
Data retention is defined by corporate or legal requirements. How long we have to archive and retain data is defined by these rules and policies.
How long do we need to keep our backups? Typically, an organisation would create a backup schedule as follows:
A huge emphasis is placed on backup with little thought for the primary focus of backup and this is “RESTORE”. You have vaults full of backup tapes, but it’s no good if you changed your backup software 18 months ago! How many companies keep their backup tapes and never bother to see if they can perform a restore on their legacy data, if they implement new hardware/software? The answer is not that many and who is going to admit to this oversight.
A restore of your information should be performed at least quarterly on a variety of different data types backed up from various servers and not the same server every time. At least this way should something serious occur you can be sure at least the restores will work as planned?
Over the century’s humans have tried to keep a history of events, discoveries and tragedies. Initially man carved etchings into stone or wood, then man discovered paper for almost 2000 years this was the way knowledge was passed on for future generations. Now in the 21st century a need to keep our digital history is becoming a huge headache for many businesses.
In the world we live in today we are constantly generating information in the form of video, sound, e-mails, texts, music, databases, images, office documents, PDF’s, and many other varieties of data are created daily. Some of this information is kept for a day whilst much of this information needs to be kept for a month a year or more than 50 years. We keep this information for several reasons; it could be legislation, regulation, insurance, assurance, research, intellectual property, personal, historical, medical or many others.
I have been involved in the data storage industry for 40 years, during that time I have seen numerous technologies succeed and then fail because their development roadmap was to ambitious or someone designs, develops, invents something that appears better, cheaper, faster, easier to use and more reliable than what is currently available.
We are human and because of this we are seduced into purchasing these new revolutionary technologies. Unfortunately we do not always consider the long-term downsides of adopting these new products and it isn’t until we come to upgrade, maintain and enhance this product that we realise it is not as revolutionary as first thought and we are locked in to a solution that cannot evolve with our day to day business demands.
As mentioned throughout this site the issues of backup are becoming increasingly problematic. A failure to address the fundamental problem that data volumes are overwhelming the primary storage systems will inevitably result in data loss.
of stored data is inactive after 60 days
Numerous businesses have a huge on-going investment in purchasing data storage systems and this investment is increasing year on year as the demands to store more information increases. After 3-5 years this equipment is then replaced.
With the implementation of an Active Data Archive solution we can actually extend the life of this investment by moving the data to a secure active archive, thereby freeing up valuable disk space on high performing storage solutions and slowing down the necessary and ongoing investment of more storage space giving a huge ROI benefit. An additional benefit with a tiered data archive is that you might be able to utilise your existing older storage systems to archive data.
As a consequence, the size of online data volumes are spiralling out of control and storage management has become an ever-increasing challenge.
Buying bigger and faster storage systems is not solving the problem of data identification. How we classify our data is going to become more of an issue than people first thought. Now we know tools exist to move data by date, last modified, file size, extension, user, file type etc, but how many companies are using this technology?
Once we have an archive, whether this is static, near line or offline you need to consider whether the archive will remain on the same site as the primary data or relocate the archive to another office or even replicate the archive regularly to the DR site. In the event of a disaster your primary concern should be to restore your most recent information first and then restore the archive if required.
All information that we generate has a cost, companies often mention that the latest data residing on tier 1 storage has the most value, I do not agree with this statement for simple reason that important architectural / engineering drawings, films, health or legal documents etc are important long after they were initially created and if anything are more valuable than when first stored.
Archiving will achieve the following:
Below is a list of differing storage technologies which we could utilise to provide an archive. Storage technologies and archive life of media.
Access speeds and energy consumption of the devices.
Reliance on other technologies to read the data.
No matter what type of archiving technology you choose it is important to remember that the media is more important than the device. If you need to keep media for a considerable time, always buy branded media from a known entity and ask for a certificate of conformance. These are sometimes difficult to obtain but should state that the media is certified for “x” years archive life. It might cost you a few pence more in the beginning, but it could save you thousands over the lifetime of the archive.
To ensure a long-term archive you should ideally make three copies of the media and store it in three separate locations, if possible on three differing types of technology in the hope that in 50 years one of the technologies is still around to read the stored information.
When purchasing any media for backup or archive it is important to remember that the technology is only as good as the amount of money you have available to protect the data. For example, a backup tape might have a manufacturer’s life of 30 year these claims are made because the tape has been tested in perfect conditions and accelerated tests to achieve the results. After 29 years and 364 days the tape fails you will be sent a new tape from the manufacturer, the manufacturer’s warranty only covers the tape and not the contents i.e. the data!
Unless you are prepared to keep your tape in perfectly controlled conditions, you will never get a 30-year life out of any tape.
Many years ago, I worked for a company that sold tape drives and media to the Middle East and for some unknown reason these drives started to fail dramatically. The request was made to return 50 of drives back to the UK for investigation and another 50 promptly dispatched. When we received the drives back and examined the problem, we saw the issue was down to rust! The tape drives and media were being used in an environment that had high humidity and no air conditioning.
The cost of storing information increases yearly, this could be storage space, media costs, energy consumption, replacement costs etc. Clearly the archive media we choose needs to provide greatest long-term costs savings for preserving the information and this is the challenge.
If you want to know more about Active Archiving click here.
With any technology you should perform the following:
Originally this document was wrote in 2010 and it is surprising how much is relevant today. In 2010 the cloud never existed, virtualisation was taking shape and all flash storage was a pipe dream.
We spoke quite a bit about tape drives being the main reason for backing up large volumes of data, today this can be handled with 16TB disk drives. Object storage and high-speed Fibre Channel Networks of 32Gb/s and Ethernet Networks 10/25/20/100GbE are appearing to increase data throughput speeds and unlimited storage space!
Deduplication now utilises flash storage to perform instant restores of data. The problems of using both deduplication and encryption are it just doesn’t work.
Most backup software today has agent-less backup, although for some physical servers and applications a physical backup agent may still be required.
Replicating backup data to the cloud makes perfect sense providing cloud cost guidelines are adhered.
Archiving systems can now migrate data directly to the cloud and no longer need to leave a stub.
The main change in backup software over the last 10 years is in management and simplicity, especially if your environment is virtualised!
Backup software today is far easier to manage, deploy and use. No longer should someone have to monitor backups for hours, the software once set should provide easy to read reports and the graphical user interface is easy to use, simply create a backup job, set the frequency of backup, apply it to a policy and it can automatically add machines when they are created.
Software licensing in the past was based on a number of factors i.e. capacity under license, number of VM’s, sockets, agents etc.
Depending on your backup software vendor, your software licensing could utilise some or all of below.
We work with backup software vendors that can provide a single price for your entire enterprise estate!
There is no Silver Bullet to the issue of Data Backup or Data Archiving. By careful planning a happy medium should be achieved to ensure your most recent data is backed up daily and archived information retained for “X” days, months, years.
Using data archiving needn’t be permanent; it could be a SATA disk array or cloud storage for data that only needs to be backed and archived up once a month. Just deploying this affordable storage would free up your tier 1 disk space; removing non-corporate data from the network would free up disk space, reduce network traffic and reduce your backup window. These simple strategies can be easily achieved with a little time, thought and money. We can provide a solution that, really will Backup Less and Archive More.