All Data is not Equal

A business today generates vast amounts of data daily, and it needs to act upon this information.  The generated data has a value, whether it is a technical drawing or an invoice.  This data is normally stored on the fastest performing storage and 9 times out 10 this is where it remains.  Gradually new storage is bought and added in order to meet the constant flow of data, IT purchase a larger backup solution in order to backup the data.  The data after 90 days may never be accessed again, but it is still remains on expensive, high performing storage, why?  Businesses do not have the correct data management systems in place in order to analyse the data to determine what to do with the information or where to store it.

Active Data Archiving

What to keep and what to delete

Performing a backup is hard enough and now we have to archive, why bother backup is good enough.  Technically any business generating large amounts of data should have in place 100% data protection including storage, backup and archiving.  Sadly, this isn’t always the case and archiving is normally the last part to be considered and yet it could save a business many thousands of pounds annually.

There are compelling articles from the IDC, Gartner, Forrester etc informing us that we will producing more data during the next 5 years than any other time in history and storing this information will create problems.  If we are to believe these reports two things are going to make things difficult for any IT department.  The backups will now need to run 7×24 and the backup time available will disappear, unless we start doing something now.  I mentioned in my ” Let’s get tough on data ” article that the network bandwidth available to businesses to send data to the cloud will not exist for 10+ years in the UK.

Active Data Archiving

So, what’s the answer, quite simply we need to get smarter with data management and this involves the creation of an “Active Archive”.  Archiving used to be sold as a method of replacing the file data with a “stub” to an alternate location.  This worked for some, but when the storage was replaced, or a system crashed it caused problems when restoring the stubs.  An active archive on the other hand allows users to create policies based on a number of rules these could be for example “all financial data should be retained for 7 years – then delete”.  The policy could also include a rule “only archive financial data older than 12 months”.  The active archive data can be searched, is fully indexable, legal search, eDiscovery and provides results in seconds.

What can you store in Active Archive?

  • Video – This can be searched for audio or video sequences
  • Scanned documents – Perform full text searches
  • Archive over a 1,000 file types including Microsoft Office, Office 365, Google docs etc.
  • Audio – Full search on words or phrases
  • Translate documents from foreign languages
  • Email – Scan and identify messages and attachments

Questions to ask when considering active archiving

  1. Is the data legally required for compliance, legislation or governance?
  2. Can the data cause the business embarrassment and fines?
  3. Does the data have a value to the business but can’t be identified?
  4. Can we archive scanned documents going back decades?
  5. Ingest Email PST files containing years of conversations.
  6. How much unstructured data do we have?
  7. How many copies of the same file do we have?
  8. On which systems and data storage platforms does the information reside?
  9. When was it created?
  10. When it was last accessed?
  11. What size is the file data?
  12. Who owns the files?
  13. When it was last modified?
  14. Is the data relevant to the business?
  15. How many copies do we have?
  16. Do the files need to be archived?
  17. Should the data be restricted?
  18. Who is generating this data?
  19. Is the data ours?

Backup vs Archiving

Firstly, these are two very different technologies although they are mentioned many times that they are the same.  It’s a bit like Copper and Brass, they are both metals but have completely different use cases.

Backup – The ability to restore lost data to a point in time when the last backup was taken.  The backup software may write this data in a proprietary format.

Archive – The ability to restore data that the business deems important to any point in the past.  This data is normally read only as it may be used in court and the original file is un-edited.  This data should remain in its native file format.

How to store archive data

This is an active archive that is going to be used for eDiscovery, GDPR and may be used as evidence in court.  Therefore, any data residing in the archive should be in its native file format and retain all the original file information including “created, last modified, accessed etc”.  There is a danger that if this information is compressed or deduplicated to reduce space, the original file format and file information could be lost when it is recreated.  Some archiving vendors perform exactly this, and should you want to move your archive later could prove problematic and costly.

Stake Holder Discussions

Any data can be archived, will involve various discussions with the business stake holders to understand how long each departments data needs to be retained.  From these planning meetings you will be able to create a detailed analysis as per below:

Accounting – Keep records of financial information for 7 years.  This could be backup copies of your accounting package, payroll, bank statements, document scans, email, created by, modified by, sent by.  This is where businesses tend to fall into a trap, they want to keep all the information relating to accounting including weekly, monthly and yearly backups.  In an archive it needs to be treated differently as explained below:

Example Archive rules

1.

Accounting Data – Month 1

Retain for 12 months – then delete

2.

Accounting Data – Month 2

Retain for 11 months – then delete

3.

Accounting Data – Month 3

Retain for 10 months – then delete

 

   “                                    “

          “                                     “

4.

Accounting Data – Month 12

Retain for 1 month – then delete

5.

Accounting Data – Year End

Retain for 7 years – then delete

After 1 year you would have 13 archive copies of your data.  Depending on how you created your archive you could simplify it even more, if your accounting data was rolled up into the following month.

1.

Accounting Data – Month 1

Retain for 1 month – then delete

2.

Accounting Data – Month 2

Retain for 1 month – then delete

3.

Accounting Data – Month 3

Retain for 1 month – then delete

 

   “                                    “

          “                                     “

4.

Accounting Data – Month 12

Retain for 1 month – then delete

5.

Accounting Data – Year End

Retain for 7 years – then delete

After 1 year you would have 2 archive copies of your data.

Remember this isn’t a backup, it’s an archive of information that should only have “READ” access.  By following the above you should be able to access any financial data created over 7 years and find out the transactions that occurred on any given day.

This type of approach needs to be done with every department including marketing, legal, manufacturing, sales, HR, R&D etc.  Once you have written up the requirements for the departments, get the stake holders to sign off on the data retention times for the archived information and review this every quarter.

Why use an active data archive?

It isn’t so much about why the need to archive, it is more about understanding the types of information you are archiving and then turning that data into something useful using “Data Analytics”.  Many businesses have 100’s TB’s of data that is unstructured.  Unstructured data is everything that isn’t stored in a database, because this data is random information it isn’t easy to monitor, move or control and yet it has a value to the business or does it?

unstructured data

Due to the sheer number of files types that create unstructured data it is difficult for businesses to make informed decisions on what to do with it.  Any legal case requires many hours of searching through files, scans, email, voice, CCTV etc in order to put a case together ready to go to court.  Lawyers charge upwards of £250 per hour to carry out these searches in order to find specific information relating to a case.

If the business had an active archive it would find all the information relevant to a case in minutes and save the business thousands of pounds in legal fees.  This is one example that would save a business huge sums of money by deploying an active archive to create a structure of unstructured data.

Decreasing backup window

As businesses generate more data, protecting this data takes longer, eventually the amount of available time to perform a backup of the data will disappear.  If by deploying an active archive you reduce your backups by 80% this has to be a good thing from an OPEX perspective i.e.

  1. Sweat your assets for longer.
  2. Reduced licensing and running costs of backup hardware and software.
  3. Reduced hardware maintenance due to less “wear n tear” on equipment.
  4. Faster restore of systems and data in the event of a failure.
  5. Reduced backup complexity.
  6. Provides the business with greater data insights.

Summary

Let’s start by downloading and completing this simple form and asking us to help you instigate an active archive, call us on 01256 331614 or talk to us about some other infrastructure issue.  You could remain comfortable in which case don’t bother doing anything, alternatively you could feel uncomfortable make the call and take the first steps to building a better infrastructure for your business.  It’s your choice, thanks for reading.

Written by: Ray Quattromini – 18/12/19

Email: solutions@data-storage.uk