Twitter has been in the news a lot lately – but not for good reasons. The social media platform has been plagued with issues and an uptick in deactivations. Businesses and individuals are exploring alternative platforms such as Mastodon. Why Mastodon? Unlike Twitter, Mastodon is not a platform controlled by a single entity. It is made up of numerous servers, which are run by individual administrators so users have better control of their content. It is decentralized.
For the last decade or so, “decentralization” has become fairly conflated with blockchain-based technology. Blockchains have been game changing! Prior to Satoshi’s initial publication of the Bitcoin paper, no one had been able to figure out how to make digital cash work, though many tried and some came close. It’s understandable that blockchains have made such a huge impact.
But blockchains are not the only sort of decentralization. A federated approach (as typified by Mastodon, Email, XMPP, and others) has proven remarkably good at bringing the best elements of decentralization (no single points of failure and security) with scalability, performance, and other factors.
While Mastodon and the broader ActivityPub “Fediverse” (Friendica, PeerTube, Pixelfed, Pleroma, FunkWhale, Diaspora, NextCloud, etc) are growing due to the Twitter exodus, it seemed to us like now is a great time to talk about the different kinds of decentralization and where Storj fits in.
Different kinds of decentralization
Every simplification is in some sense wrong, but broadly, we believe most internet-wide decentralized services can be categorized along two spectrums: whether or not the operation of the service is centralized, and whether or not the service’s data structures (ledgers, activity logs, objects, metadata, etc.) are centralized.
Most conversations around decentralization talk about whether or not the operation of the service is centralized, but I’d like to focus on the other dimension for this blog post - whether the data structures are centralized.
Centralized data structures in a decentralized network
What do I mean when I say a decentralized network has centralized data structures? Perhaps the best example of this is a blockchain, where the entire network has to agree on the blockchain’s order and structure. Each entity in the network, no matter how many different operators there are, must have software that agrees with every other entity in the network about the structure and content of the blockchain, or the whole system falls apart. In this light, a blockchain is a centralized data structure within a decentralized system. In a decentralized network, centralized data structures can have great uses! They are especially good for something like a shared global ledger, but they are not good for things like reducing coordination conflict or increasing performance.
It’s worth underscoring this performance cost with some numbers. While Visa can handle 24,000 TPS, or transactions per second, Ethereum can handle 20, and Bitcoin can handle 7. That’s right, the entire, global Bitcoin network can handle at most 7 transactions per second! And that’s generous! This chart says that it’s been more in the realm of 3 for the last few months. The reason for this is the difficulty of coordinating permissionless changes to the global shared blockchain. Some blockchains try to achieve faster throughputs, such as Solana, which promises upwards of 3,500 TPS, and breakthroughs such as zero-knowledge rollups promise to improve the transaction throughput of Ethereum significantly, but even so, the best figures at the leading edge of blockchain performance still can’t touch Visa.
Moreover, there are always two measures to speed: throughput and latency. Transactions per second is a measurement of throughput, but what a given user cares more about is latency, or how long their transaction takes. Ethereum takes more than 10 minutes before a transaction is executed (“transaction finality” or latency), whereas Bitcoin can take a full hour before you can be certain your transaction is complete!
What do the experts say?
This isn’t just some accident that making your decentralized system coordinate on shared data structures makes your system slow - it’s part and parcel of it.
Here’s what James Hamilton, SVP and distinguished engineer at Amazon, with hair that even exceeds Scott Meyers, said about scalability:
The first principle of successful scalability is to batter the consistency mechanisms down to a minimum, move them off the critical path, hide them in a rarely visited corner of the system, and then make it as hard as possible for application developers to get permission to use them.
This is a hard-earned lesson from many battles. Coordination is costly. When two processes need to coordinate, at least one inherently requires waiting. As Adrian Coyler points out, coordination is the dominant term in the Universal Scalability Law. There is no better way to improve performance of a distributed system than to eliminate coordination. Consider the key-value store Anna; it was able to get a two-orders-of-magnitude speed-up through coordination elimination. That’s huge!
If you’re interested in reading more, a gentle introduction to the literature around coordination avoidance is the relatively recent 2019 summary of the CALM theorem, by Hellerstein and Alvaro.
So, if you’re trying to compete with AWS, which allows for a throughput of 3,500 new objects per second, per prefix, per bucket, per region, per customer, by default, and a “transaction finality” time measured in dozens of milliseconds, coordinating on a centralized data structure simply can’t work.
Decentralized data structures
So, if blockchains aren’t the way to achieve decentralized performance, what ways are?
To answer that question, we only need to look as far as other decentralized systems that have had extreme success: BitTorrent, RSS, VoIP, SMS, email, the web, the internet itself, etc., etc. These systems are so successful that some don’t even get capital letters anymore. They are so ubiquitous that it’s hard to see that “email” was once a nascent idea. They have achieved full success.
One fundamental piece of each of these systems is that there are no centralized data structures to query! For Bitcoin, we can definitely discover the transactions per second for the whole network, because there is a centralized data structure to query. It is not possible to know total emails per second, BitTorrent downloads per second, phone calls per second, web requests per second, because there is no authoritative source to even ask.
Each of these systems are composed of a collection of independent servers operating in tandem together, each managing their little neighborhood of the network, no one responsible for the whole. This design is sometimes called “federation,” and yes, it’s also what powers Mastodon and the Fediverse.
So, how’s Mastodon doing right now?
On October 27, Mastodon (one Twitter-like ActivityPub-supporting app in the growing “fediverse”) had an estimated 539 thousand active users, and on November 23 had an estimated 2.5 million. That’s 363% growth in under a month! Individual instances have had to scale up to meet this demand, but the overall network has managed this massive growth without a sweat.
As I mentioned, it’s not possible to know exactly how much traffic the Fediverse is doing, but I wrote a small tool to go and ask each Mastodon or similar instance how many posts had happened each week over the last few weeks (Mastodon servers have an API for that). The growth I found is astounding:
Federation is here to stay.
Storj has taken the federated approach
From the beginning, Storj has had world-class high performance as a design goal. As outlined in section 2.10, section 4.9, and Appendix A in our v3 whitepaper from 2018, coordination avoidance is a critical strategic decision for us, and as a result, we decided to have our metadata servers (called “Satellites”) manage their own little neighborhood of the network, just like email.
Here are the benefits of this approach we wrote about in 2018:
- Control - The user is in complete control of all of their data. There is no organizational single point of failure. The user is free to choose whatever metadata store with whatever trade-offs they prefer and can even run their own. Like Mastodon, this solution is still decentralized. Furthermore, in a catastrophic scenario, this design is no worse than most other technologies or techniques application developers frequently use (databases).
- Simplicity - Other projects have spent multiple years on shaky implementations of Byzantine-fault tolerant consensus metadata storage, with expected performance and complexity trade-offs. We can get a useful product to market without doing this work at all. This is a considerable advantage.
- Coordination Avoidance - Users only need to coordinate with other users on their Satellite. If a user has high throughput demands, they can set up their own Satellite and avoid coordination overhead from any other user. By allowing Satellite operators to select their own database, this will allow a user to choose a Satellite with weaker consistency semantics, such as Highly Available Transactions, that reduce coordination overhead within their own Satellite and increase performance even further.
In short, our approach is the most performant approach within possible decentralized options.
Welcome to the Fediverse
The Fediverse (including services like Mastodon) has recently achieved critical mass and is here to stay. We’re excited to support the Fediverse and intend to share more in the coming weeks and months about our own decentralization journey.
In the meantime, if you’re operating Mastodon, Pixelfed, Peertube, Funkwhale, or another federated platform and need affordable object storage, you can sign up today and get a free 25GB of storage on Storj! We'll also give up to 1 TB-month of storage and 1 TB of egress free if you speak to us to give us your feedback at fedi@storj.io!
See you in the Fediverse!