Getting Data Out of the Cloud Before Disaster
Cloud data rains back down at a drizzle, when you really need a downpour.
Editorial By Olin Coles
As the size of information stored in the cloud grows increasingly larger, IT managers must plan on getting data out of the cloud when it’s critically needed during disaster recovery. For some businesses, the cloud is a place to deposit a second copy of data already retained locally. For others the cloud is primary storage, where unique data is created and modified. Problems arise in both cases: when local data is lost due to fire, flood, or theft, when the data is too large for a timely transfer across limited Internet bandwidth, or when a cloud provider shuts down. This all begs the question: is redundant data in place?
This point was driven home in 2014 when British hosting company CodeSpaces.com was driven out of business due to an entire loss of all its cloud data. An anonymous hacker gained access into their Amazon S3 control panel, and locked administrators out of their own data. The hacker then bribed the company, but rather than pay Code Spaces attempted to recover control themselves. The hacker deleted all their data, including the copies stored in redundant data centers. According to Code Spaces officials: “he had removed all EBS snapshots, S3 buckets, all AMI’s, some EBS instances and several machine instances. In summary, most of our data, backups, machine configurations and offsite backups were either partially or completely deleted.”
A few cloud providers offer redundant data security, albeit for a premium. So long as time is not a factor, this policy could work for those using the cloud for secondary storage. However, when the aforementioned problems arise, getting data out of the cloud quickly becomes a necessity. Retrieving full data sets can take weeks, penalizing profit and productivity at a time when recovery delays are unacceptable. This is especially true when getting big data out of cloud. Some keen planners have taken to ‘reverse cloud backup’ strategies to solve this problem, storing big data to their cloud provider and then backing up locally for quicker access during recovery. This is especially important for data that is created in the cloud.
Hosted email, CMS websites, and websites such as salesforce.com are all obvious examples of primary data that is created and lives in the cloud. It is well documented that salesforce backs up its server data, but not in such a way that makes it possible for an end-user to restore files when needed. If a customer mistakenly deletes data on salesforce, some claim it costs $10,000 or more to restore. Even so, the entire data set would be rolled back to the last good backup completed, eliminating all recently added data. The lesson here is that you should always treat data as your responsibility, even when it’s located with a trusted cloud vendor. I recently discussed this topic in a separate post: Reverse Backup from Cloud to Local Disks, which explains the process for Amazon S3, Google Cloud Storage, DreamHost DreamCloud, and Dropbox.
A leading strategy has been to use a local server backup appliance to both push/pull data to the cloud. An online service like Dropbox may seem to be self-replicating because its copies are synchronized to every device with access to the account, however this method is not always beneficial. It might make sense to pull data from the cloud on a less instantaneous schedule, say once per week, but then it also makes sense to pull this delayed Dropbox data from multiple accounts in the enterprise to one central location. This scheduled replication ensures that data corruption and viruses are given discovery time before they’ve been propagated everywhere. Another reason to use a reverse cloud backup appliance is to improve the Recovery Time Objective (RTO) of a full restore.
In the server backup space it’s well known that retrieving Terabytes of data can take weeks, which explains why retrieving big data from the cloud is viewed as a last resort. But if a server backup appliance is configured to regularly pull data out of the cloud, then restores can happen much more quickly. One such server backup appliance is the Netswap Plus, which can be easily configured to automatically replicate and retrieve data out of the cloud, and utilizes an incremental approach to minimize bandwidth use and ensures data is redundantly stored on highly removable local disks. These removable disks provide an extra level of security, referred to as an “air gap”, that allows data to be stored off-line and out of reach of hackers.