At Storj, we are relentlessly pursuing the best system we can. Of course, we've still got work to do. This document is intended to address the known limitations of the product (with regard to security, performance, availability, ease of use, economics, and more) that we are busy trying to address.
There are also some instances where we’ve made what we think are thoughtful compromises between competing goals (e.g., completeness vs. time to release, security vs. ease of use, decentralization vs. economics). Still, some users might disagree with those choices.
There are also some circumstances where, to support certain use cases or to work with outside systems (e.g., to support certain common S3 usage patterns), we’ve had to give users options, and some of those options represent compromises.
While we think we’ve made the right trade-offs, we don’t want to substitute our judgments for yours. So:
Read below for some important information on
We aim to be the most privacy-respecting cloud storage solution out there. We’ve tried to design a system where we can’t see or mine user data and collect the bare minimum of data, including minimizing our use of cookies. We only collect the minimum data we need to build and operate a world-class object storage service developers love, and we do not sell your data or share it with third parties. Some details:
While the above is an overview, you should acquaint yourself with our official privacy policy, rather than relying solely on summaries like this.
II. SECURITY/ENCRYPTION
We have published extensive documentation on our use of encryption and the security features for access management to enable developers to build more secure and private applications. There are three main architectural choices for users when using Storj DCS Uplink:
1) Libuplink: Our native library and the partner tools that use it (e.g., Rclone, CLI, Filezilla)
2) Gateway ST:: An S3 Gateway that you host yourself
3) Gateway MT: A multi-tenant gateway that Storj hosts. Gateway MT also underlies the Web UI Filebrowser and Shared URLs.
There are some important differences in the use of encryption between these options.
Regardless of whether you are using Libuplink, Gateway ST, or Gateway MT, data is encrypted before it goes to the Storage Nodes. Encrypted data is broken into segments. Those segments are divided into erasure-coded pieces. Pieces are distributed to Nodes (unless it is an inline segment, see below). The Nodes just see an encrypted piece. 29 pieces are needed just to recreate an encrypted segment—[See Section 4.11 of the whitepaper for more information on encryption].
We also utilize metadata masking to ensure there is a minimal amount of metadata that anyone (including Storj) can use to compromise or mine your data. However, for usability sake, certain metadata is not masked (e.g., bucket names, size, created time.) [See Section 4.9 of the whitepaper].
In the case of libuplink and Gateway ST, you generate the keys to encrypt your data at rest and only you hold those encryption keys. Storj never sees the keys. So, there is no way (even if we wanted to do so) to see your data or share it with a third party. This is called End-to-End Encryption (E2E), and it is by default the only option we support on Libuplink and Gateway ST.
In Gateway MT, we use Server Side Encryption (SSE). We do this for ease of use and to support certain S3 use cases where End to End encryption is not supported by S3. However, this means that Storj also is involved in the generation of --and has temporary access to-- your encryption keys when using Gateway MT. We take steps to make this a very secure option, and far more secure than centralized services.
For more detail, please see the GatewayMT Repo. Please also see the follow docs about choosing between E2E and SSE encryption.
This approach is the industry-standard approach to encryption with a cloud-hosted S3 compatible gateway. However, if you want E2E encryption instead of SSE, or if you want a truly trustless system, you may either i) use Libuplink or Gateway ST instead of Gateway MT, or ii) encrypt your files using other tools before sending them to Gateway MT.
We are currently working on an option that will allow you to locally generate encryption keys and encrypt data before sending it to Gateway MT, but that will require you to run some code locally. Stay tuned.
Storj has implemented a very secure and highly flexible mechanism for sharing files, portions of folders, etc., relying on mechanisms such as macaroon-based access grants. An Access Grant is a security envelope that contains a Satellite address, a restricted authorization token, and a restricted path-based encryption key—everything an application needs to locate an object on the network, access that object, and decrypt it.
Access Grants coordinate two parallel constructs—encryption and authorization-- in a way that makes it easy to share data without having to manage access control lists or use complex encryption tools. Both of these constructs work together to provide a client-side access management framework that’s secure and private, as well as extremely flexible for application developers. These mechanisms are described in greater detail in our product documentation.
As you might imagine, there are security implications in the choice of creating access grants in the command line interface vs. the browser interface without Gateway MT vs. the browser interface with Gateway MT. Again, if Gateway MT is involved, the Satellite does have temporary access to the access grant. This topic is addressed more fully here.
For additional information on encryption, see this post as well as our product documentation.
We believe that a decentralized, know-nothing approach to data storage offers far better security than traditional, centralized approaches. However, many older data security and privacy standards (e.g., HIPAA, GDPR), require making specific representations about the physical location of data (e.g, requiring that data only be stored in certain countries or in data centers that meet certain standards). Therefore, if your application is subject to those standards, you should make sure that our solution is right for you. Our roadmap includes a project around geotagging (i.e., making sure that data only goes to nodes in certain countries out of the 100+ where we currently have nodes). As our list of node operators includes both individuals and data centers, once we reach a critical mass of node operators who are operating compliant data centers, we also intend to add an option for customers to only store data on nodes in compliant data centers should they need to meet storage compliance requirements. However, neither of these two important capabilities for compliant storage is live as of the last publication date.
III. SATELLITES AS POINTS OF FAILURE
We designed our system to be extraordinarily resilient to data loss. The decentralized nature of our Nodes provides exceptional resilience against a lot of things that cause data loss in conventional storage systems (Node failure, fire, floods, power outages, etc.). [See Sections 2.5 and 3.2 of the whitepaper]
However, the Satellite peer class, which holds the metadata used to distribute and recover the data located on the Nodes, is currently less decentralized and distributed than our nodes.. [See Section 4.10 of the whitepaper and our product documentation.] Catastrophic satellite failure could cause data loss or data unavailability.
Any “Satellite” is a collection of multiple servers, including multiple instances of our satellite API endpoints and multiple instances of our distributed database, CockroachDB. At this point, all production Satellites are run multi-region (i.e., every Satellite has redundancy across multiple data centers and geographic regions). Of course, we try to follow best practices in terms of backing up and snapshotting the metadata in satellites. Nevertheless:
We also try to follow best practices for simulating catastrophic failures and recovering from them. (e.g., simulating Chaos Monkey and Chaos Gorilla). We are also implementing a write-ahead log so that we will not be subject to data loss even between snapshots, but this is not live as of the last publication date of this document.
IV. INLINE SEGMENTS
Very small files (under 4 KB) are encrypted but follow a different erasure coding and storage scheme than normal files. [See Section 4.10 of the whitepaper].
V. S3 COMPATIBILITY
An important aspect of Storj DCS is ensuring that we are compatible with the existing, de facto standard for cloud object storage, S3. As S3 is a vast, 15-year-old API, while we have aimed to support the most important capabilities---which cover the overwhelming majority of cloud storage-based applications--there is a long tail list of lesser-used functions that are not yet supported. We have benefited from the efforts of Minio in building out S3 compatibility. Our next major effort involves supporting a range of server-side S3 functions for Gateway MT users. We listen to the community and users when prioritizing additional S3 functionality.
VI. PRICING
We aim to be the most economical service, with prices that are approximately 20% of the major cloud providers, with no hidden fees, with multi-region at no additional price. However, you should always make sure that you know our latest pricing, and we recognize that other providers’ prices may change as well and/or be less expensive for certain use cases. Our system does work best when segments are larger (64 MB). So, if you choose to use the S3 default segment size of 5 MB rather than the Storj default segment size of 64 MB, you may incur some extra fees. See pricing for details. Pricing is subject to change.
VII. PERFORMANCE
We have tried to design our system to be very performant. But, the decentralized and encrypted nature of our service means that certain use cases are more performant than others. For example, you shouldn’t try to store a live database using decentralized storage. (Backups and archives of databases, on the other hand, are good use cases). Like all services, we are constantly tuning performance. With Gateway MT (which is hosted in multiple Equinix locations), we should get much better throughput performance when moving data from locations that have peering relationships with Equinix (e.g., most centralized clouds, many enterprises). We also have a major project to reduce latency, which is scheduled to go live in Q2 of 2021.
When performing Reed Solomon erasure coding client-side, there is an additional throughput impact due to an expansion factor in the amount of data being sent up. With Gateway MT, the Reed Solomon erasure coding is server-side. The result is better performance, especially in environments with low bandwidth available. We are working on projects to enable Gateway MT performance with client-side encryption to get the best of both solutions so that customers don’t have to choose between performance and trustless security.
VIII. DATA DISCLOSURE
We know that customers, community members, and Node Operators alike want up-to-date information on things like the size of our network, the performance of our network, durability, availability, and token flows. We’ve tried to be good about publishing those stats periodically, e.g., in town halls and our quarterly token report. We also know that some community members have starting publishing stats, which is awesome (but we can’t guarantee their accuracy). We have a project underway to make significantly more data available on a near real-time and programmatic basis. Stay tuned!