The Subtle Difference Between Privacy and Security in Cloud Storage—and Why it Matters
Data privacy and security are often used interchangeably. Data privacy regulations can require a certain level of security and data protection regulations require a certain level of data privacy. It can be very confusing to understand what is unique about each and what that means for developers looking to meet these regulations for data privacy and security for their cloud applications. This article breaks down the differences and explains what that means for developers specific to cloud storage.
Defining data privacy and data security
The Storage Networking Industry Association (SNIA) defines data privacy like this:
Data privacy, sometimes also referred to as information privacy, is an area of data protection that concerns the proper handling of sensitive data including, notably, personal data but also other confidential data, such as certain financial data and intellectual property data, to meet regulatory requirements as well as protecting the confidentiality and immutability of the data.
An important part of this definition is the idea that data privacy is a part of data protection. In fact, SNIA goes on to say that the three concepts that together define data protection are: 1) data privacy, 2) data security, and 3) traditional data protection (like backups). This clearly positions data privacy and data security as two separate things, but equally important in the overall protection of data.
As far as a definition of data security, IBM defines it like this:
Data security is the practice of protecting digital information from unauthorized access, corruption, or theft throughout its entire lifecycle.
Simply put, this is referring to tactics like encryption and access management used to keep data secure. So from these definitions, one could infer that data privacy is about policy and data security is about tactics. But I still don’t think that is clear enough.
I define these terms like this:
Data privacy is the data and metadata you are collecting, who it is shared with, and keeping it protected.
Data security is having a system in place to ensure only approved people have access to data that they are supposed to have access to.
Regulations of course play a big role in making decisions about metadata and data collection, sharing and security measures. But the unique part of my definition is the introduction of metadata in addition to data.
Why metadata is so critically meta
Metadata is data collected about data. That sounds granular and relatively harmless at the surface, but let’s review a few examples.
Let’s think about the data surrounding voice communications on your phone. The data is the information stated in the phone conversation. The metadata is things like who the call was with, how long the call was, how often you call that person. Most developers would spend most of their efforts protecting the private information shared on the call. But a lot of information can be inferred from the metadata as well. Let’s say you’ve had 5 calls over the past week with an oncologist, that can be very personal information that could affect your right to privacy.
Now let’s look at how metadata collection can be a critical part of an application’s success. Many “free” applications make their money off sharing metadata collected with partners. Life360 is a popular application that families or friend groups can subscribe to in order for them to get location and driving information on each other. It doesn’t take much of a leap to understand how valuable the data of knowing where people go would be valuable to retailers. And when that metadata is specific to an individual, it can be used to target ads and offers.
This is mostly a win-win with app users getting value from the app at no monetary cost, while Life360 gets the monetary value from partners. Well, it’s a win-win until people start to feel it is an invasion of their privacy or until that metadata is taken by malicious actors.
The criticality of metadata collection, sharing and protection had significant consequences for Facebook’s (now ironically called Meta) subsidiary WhatsApp. An Apple iMessage privacy update revealed to iPhone users that WhatsApp was collecting much more metadata than competitors, and they were harvesting the metadata and sharing it with Facebook. This caused an estimated millions of users to abandon the app for competitors and resulted in a $267 Million GDPR fine.
These examples show how metadata actually earns its ‘meta’ descriptor as it is a more comprehensive, or transcending type of data. And just how important it is to understand and separate from data when discussing security measures to protect it.
How to secure both data and metadata in cloud storage
Cloud storage has gone a long way to develop security measures in order to keep data secure. Data can be encrypted during transit, air-gapped, immutable copies can be made to ensure backup and recovery. That said, often these tools cost extra and are not included in the entry-level offerings, or are difficult to set up. Even with security measures in place, data breaches via cloud storage are becoming more common. Recently we’ve seen cloud storage breaches happen to notable companies like Facebook, Instagram, Capital One, State Farm, and Autoclerk (see the complete list here). Often the entry point for these breaches are through data sharing partners or due to misconfigurations. And the data was in most cases not encrypted.
In order to keep data protected in the cloud, strong encryption and access management security measures need to be taken. Centralized cloud storage, like that offered by hyperscalers like Google, Amazon, and Microsoft, do not keep data secure “out of the box”. Extra costs and significant configuration need to take place for this to happen. For small companies just getting started on building an application, this can be daunting. A great alternative to this is decentralized cloud storage. With a decentralized architecture, the object storage is designed for zero-trust and has the highest levels of encryption and access management “out of the box”—with no configuration needed by developers. This paper does a great job discussing cloud storage security risks and how decentralized cloud storage mitigates them.
One the data privacy side, centralized cloud storage is actually becoming a concern for developers. They are having a hard time trusting that Google, Amazon, and Microsoft—who have entire business models around the collection and use of our metadata—won’t collect or access that metadata when stored with them, or change policies to collect more in the future. Developers need to first carefully consider and limit the metadata collected and shared with third parties. From there, it is a matter of protecting that data from being shared with unapproved third parties as well as keeping it secure.
Takeaways on data privacy and data security for developers
Data and metadata collection, and more importantly what isn’t collected, is a critical part of a developer’s role in building a cloud application. The more you can minimize what is being collected, the more you can protect your users and as a bonus prevent unnecessary exposure to regulatory fines.
Information sharing is a key part of data privacy and security and developer’s need to speak up when the business wants to add a new partner or increase the data being shared. Furthermore, the data security used by your partners to protect the data you are sharing is also important. Consider requiring them to use the highest level of security tools their cloud storage provider offers or require them to use a cloud storage vendor that supports end-to-end encryption, which can also be a more cost-effective option.
Most developers I know really care about making a product that protects their customers. As it becomes cheaper and easier for developers to get started with centralized cloud storage, it is critical that data privacy and data security are both considered when selecting a storage provider. That’s why it is so great that decentralized cloud object storage is a viable alternative. It goes far beyond what centralized providers can do to keep metadata and data private and secure.