Share what you know with millions of people
Focus is the best place to turn what you know into remarkable content
How do you get rid of cloud data?
OK. You've made the leap to the cloud. As time goes by, you decide to move to another cloud provider. Migration of your data was a snap - good job on planning and execution!
But how do you get rid of the data that's left behind at your previous provider? Lots of techniques and options to erase and "crypto shred" data when it resides on your hardware. But what about your data that resides on a service provider's infrastructure? How do you ensure it's no longer accessible?
What steps should you take to avoid this problem?
Are some data strategies/architectures better than others when considering this problem?
Are there tools out there to address this problem?
What should you expect (demand) of your cloud provider with respect to "data residue"?
Thoughts?
Events
- Dos and Don'ts of Small Business Marketing May 29 @ 11 am PT
- Lead Nurturing 202: The Next Generation May 31 @ 11 am PT
- The Tricks to Paid Media June 6 @ 11 am PT
- Display Advertising for Brand Awareness June 20 @ 11 am PT





9 Answers
GREAT question. There are a few strategies that we recommend to ServiceMesh customers:
(1) Some cloud service providers (CSPs) have a data destruction policy. When a customer moves on, all the disk data previously used by that customer is overwritten with random bit patterns (typically per some NIST standard somewhere). This is the best option if your CSP supports it.
(2) If your CSP doesn't support it, you can try to do this yourself (for IaaS) by overwriting your own data. There are Linux and Windows programs that will follow the NIST protocols for doing so. One problem with this is that with all the indirection that happens in a modern storage array, there isn't any guarantee that overwriting a given disk block will actually result in the physical media being overwritten. The new disk block at any given logical position in the virtual disk may be written on a completely separate drive in a totally different part of the array. If you're really worried about, this solution should make you nervous. Sometimes, when a service provider claims to be doing data destruction, they are just doing this with a script or some such. It's important to ask them how they *know* that they are in fact overwriting the data. Sometimes, they don't even realize that what they are doing isn't having the effect they want. Note that there is really no good way to do this for PaaS or SaaS, so you'll have to do #1 in that case.
(3) Many of our customers use strong encryption (e.g., AES256 or better) for all disk activity in all cases. The theory here is that you're protecting your data when its at rest while you're using the provider, and then you can try to do whatever data destruction you want yourself (overwriting disk blocks). If, for whatever reason, the overwrites don't deliver what you want, at least your data is protected with strong encryption and is unlikely to be useful to anybody before another customer plops his data down. The ServiceMesh Agility Platform makes this trivial. Just click a single checkbox when creating instances and you're good to go. You can even enforce this with a high-level policy for all external clouds.
(4) For the truly paranoid, you'll have to use an internal cloud and control all the variables yourself.
To learn more about the ServiceMesh Agility Platform, see http://www.servicemesh.com/
I second what Dave is saying
1) Read the contract and insist on putting their data destruction policy on it.
2) Encrypt, Encrypt, Encrypt - Mantra for storing your data on the cloud. For starters, keep your encryption key locally :-)
It is a great question and one that is of major concern in the US Federal government.
Here's a Wikipedia page on data remanence: http://en.wikipedia.org/wiki/Data_remanence
Here's the really nasty goo of cloud storage and storage virtualization. The implementations of redundancy (e.g. RAID5/6/10) and snapshots means that your data doesn't exist just once in one place. It exists multiple times in multiple locations across multiple arrays in pieces and as concurrent chunks.
If this is a major concern, you will need to select cloud vendors that will allow allocate and entire subset of a storage area network that you will use so that the drives can be pulled when you are complete. Most cloud providers will use the overwriting features of the underlying storage system, but that only gets rid of the original copy. There's still the backups, snapshots and, if overwrite is not implemented properly, the redundant bits spread across the backup drives.
It's important to note, that in the cloud environment, it's not plausible that your drive space, once allocated to the next tenant will be able to recover any of your data. So, this really comes down to a trust issue between you and your cloud service provider, which in my opinion dramatically reduces the risk.
Besides the excellent points made so far, there's the added problem of backups. Depending on the type of cloud application, the provider may be legitimately creating some sort of scheduled backup on an aggregate level, rather than a customer level.
Depending on the data retention policy, this data might hang around for quite some time.
To the extent that you can (depending on the cloud service involved), data encryption provides some of the best mitigation against residual data, but it is incumbent on each customer to work this out with a provider right from the beginning of the relationship such that proper support for this vital concern can be baked into the system.
Otherwise, all the customer will have to rely on is trust and the idea that the provider doesn't want to use up any unnecessary space if they don't have to.
-ASB: http://XeeMe.com/AndrewBaker
=======
It's important to note, that in the cloud environment, it's not plausible that your drive space, once allocated to the next tenant will be able to recover any of your data.
=======
A bold claim! That the "new tenant" has read access to this disk is enough for them to try and read whatever was there previously. Looking for bit faults over multiple reads is an interesting technique, but only one of many.
Bottom line: if you are worried about someone accessing your data afterwards, do not put it "in the cloud". Period. That way you do not need to be concerned about data leakage afterwards.
The best method for data destruction is physical destruction of the media (i.e. grinding it to a bazillion pieces, or melting it down to a liquid). This is the only recognized method for any serious data in government.
This is an important issue which has been discussed in many forums. The solution which I can suggest is :-
a. Every CSP must have a technique to produce IMMUTABLE Logs of all activities. This would ensure that any data movement can be explicitely known if not done by the owner.
b. The data residual problem could be solved by having the Meta data of storage with the user at their premise.
c. To have a solution which gives " proof of data destruction and residue " of like we have a solution " for proof of data Possession " by a cloud service provider.
Merely encrypting it will not solve the residue problem.
d. data rseidue only could be checked by hash method . The real problem would be if cSP has made a copy of it inadvertantly or other wise.
Robin is correct. That is the only plausible way to be confident your data is managed properly. To say otherwise is a stretch. Be afraid of new technologies to a certain point. A vendor will not insure the integrity of your company. Be aware and
educate yourself up-front.
I agree with Robin Goodchild about recovery of data after being allocated to another client. Just wiping metadata or whatnot is **NOT** enough for any serious enterprise app.
Even if you can't recover files, there still might be enough information in a single disk block to make the exercise interesting (account numbers, social security numbers, credit card numbers -- think "row in a DB table"). Even things like swap space associated with a sensitive application could hold interesting information. It's trivial to write a program to sweep through all the disk space looking for credit card numbers (fixed formats with check-digits, etc.).
Good points Dave. Those exposures are there.
Answer This Question