How can you protect important assets and data when using Amazon S3? A feature called versioning works as an excellent answer to this question.
By default when you upload an object to S3, that object is redundantly stored to provide 99.999999999% durability. This means that for 10,000 objects stored on S3, you can expect the loss of a single object once very 10,000,000 years (on average). Those are some pretty good odds, so why do we even need to answer this question? Because while the underlying infrastructure powering S3 provides serious durability, it does not protect you from overwriting your objects or even deleting those objects. Or does it? Not by default, but it does if we enable versioning.
What is Versioning?
Versioning automatically keeps up with different versions of the same object. For example, say that you have an object (object1) currently stored in a bucket. With default settings, if you upload a new version of object1 to that bucket, object1 will be replaced by the new version. Then, if you realize that you messed up and want the previous version back, you are out of luck unless you have a backup on your local computer. With versioning enabled, the old version is still stored in your bucket, and it has a unique Version ID so that you can still see it, download it, or use it in your applications.
How to Enable Versioning?
When we set up versioning, we do it at the bucket level. So instead of enabling it for individual objects, we turn it on in a bucket and all objects in that bucket automatically use versioning from that point forward.
We can enable versioning at the bucket level from the AWS console, or from SDKs and API calls.
Once we enable versioning, any new object uploaded to that bucket will receive a Version ID. This ID is used to identify that version uniquely, and it is what we can use to retrieve that object at any point in time. If we already had objects in that bucket before enabling versioning, then those objects will simply have a Version ID of “null.”
What about deleting an object? What happens when we do that with versioning? If we try to delete the object, all versions will stay in the bucket, but S3 will insert a delete marker at the latest version of that object. That means that if we try to retrieve the object, we will get a 404 Not Found error. However, we can still retrieve previous versions by specifying their IDs, so they are not totally lost.
If we want to, we do have the option to delete specific versions by specifying the Version ID. If we do that with the latest version (which is the default version), then S3 will automatically bump set the next version as the default version, instead of giving us a 404 error.
That is only one option you have to restore a previous version of an object. Say that you upload (i.e., PUT) an object to S3 that already exists. That new version will become the default version. Then say you want to set the previous version as the default. You can delete the specific version ID of the latest version (because, remember, that will not give us a 404, whereas deleting the object itself will). Alternatively, you can also COPY the version that you want back into that same bucket. Copying an object performs a GET request, and then a PUT request. Any time you have a PUT request in an S3 bucket that has versioning enabled, it triggers that object to become the latest version because it gives it a new Version ID.
So those are some of the benefits we can get by enabling versioning. We can protect our data from being deleted and also from being overwritten accidentally. We can also use this to keep different versions of logs for our own records.
There are a few things you should know before enabling versioning.
First of all, once you enable versioning, you cannot completely disable it. However, you can put the bucket in a “versioning-suspended” state. If you do that, then new objects will receive Version IDs of null. Other objects that already have versions will continue to have those versions.
Secondly, because you are storing multiple versions of the same object, your bill might go up. Those versions take space, and S3 currently charges a certain amount per GB.
There is a way to help counteract this. It’s another feature called Lifecycle Management. Lifecycle Management lets you decide what happens to versions after a certain amount of time. For example, if you’re storing important logs and you have multiple versions of those logs — depending on how much data is stored in those logs — it could take up a lot of space. Instead of storing all of those versions (some of which could be months old), you can keep logs up to a certain date, and then move them to Amazon Glacier. Amazon Glacier is much cheaper but limits how quickly you can access data, so it’s best used for data that you’re probably not going to actually use, but still need to store in case you do need it one day. By implementing this kind of policy, you can really cut back on costs if you have a lot of objects.
Also, different versions of the same object can have different properties. For example, by specifying a Version ID, we could make that version publicly accessible by anyone on the Internet, but the other versions would still be private.
At this point, if you have any questions about S3 versioning, feel free to ask in the comments. If you’d like to play around Amazon S3 versioning, and you’re a member, check out this hands-on lab. It’s at no extra cost to you and you don’t have to worry about messing anything up, because the lab can always be cleaned up and restarted.
If you’re interested in learning more about S3, like how to optimize for performance, how to use bucket policies, how to encrypt data, and much more, check out the AWS Certified Developer course.
And if you’re not currently a member? I’d love to invite you to the Linux Academy!