Data data everywhere, but how much to keep?

Azure Storage spaces, Google Cloud Storage, Wasabi, BackBlaze B2 and of course, AWS S3. These are the big names in cloud storage, right? As much data as you need stored, one of them has your back. Need help to get it there? Just find any ole cloud consultant and they can get your data moved quickly as easily – although you really talk to us first. 😉 . But how much data is too much data, and how can you design a data retention strategy in the cloud?

What are you storing?

This is an actual question you should ask yourself. Do you actually know what your data is? If you don’t, what value does that data hold for you?

Assuming you do know what your data is, there are lots of categories of data, and here’s just some of the more common ones:

  • Customer data
  • Archives
  • Logs and reports
  • IoT generated data

Once you understand what data you have, you can start to plan out where to store it all. After all, each storage platform has many different products to suit any of the above broad use cases.

How long are you keeping the data?

This is a more delicate question than you might think. As we have already said, you’re not going to run out of space in the cloud, so just keep everything forever, right?

No. Please don’t do that. For one thing, your bill will get pretty silly, pretty quickly, although most of the big providers do give you suggestions to cut your bill based on assets you don’t use.

Beaty Consultancy is based here in the UK, so now we’re in to the realms of data protection, which says you must only keep data for as long as it takes to do the business you collected the data for. Data protection is a huge subject, but Wikipedia did a great job of simplifying it here.

We haven’t even started to talk about GDPR either, and frankly if we did, I think my brain would melt.

But yes, assuming you’re allowed to keep the data for as long as you need to, you will want to look at the right product to support that. Keep in mind things like;

  • Price per GB per month
  • Data access fees
  • Geo-redundancy
  • Time to first byte
  • Content distribution

Why do you need the data?

Another one to really have a think about. Presumably you’re not in the business of storing data, rather you’re in another business vertical and data might drive your company. So think about the reasons for all of your data to exist, and what benefits it brings to your overall business objectives.

If you can’t think of a clear reason for some data, I’m not saying delete it, but you might want to archive it off to super-cheap storage like AWS Glacier.

How will you access the data?

This links in to why you need the data. If you’re up to date with your cyber security certifications, and therefore you need to be able to forensically reconstruct an incident within a given timeline, let’s keep all your server logs in hot storage for a couple of months.

But if you’re storing bid documents from 5 years ago because they took a couple of months to pull together and you just can’t consider purging them, maybe keep those in a less expensive, and less heavily replicated storage tier.

Ask around!

You’re here at a cloud consultancy’s website reading this post, so you know we can help if you need us. However, there are all kinds of forums and business networking groups to attend, where most of the people in the room will be having the same problems. Data retention strategies in the cloud sounds like a problem for the techies to sort out, but really, it’s a business process problem, and the technology part only happens right at the end.

Think about your data, and be in charge of it!

3 responses

Skip to content