Storj excited to join forces with Petagene/cunoFS.

Ben Golub
October 9, 2024

Storj recently announced plans to acquire PetaGene, a dynamic company with a talented team of file storage experts based in Cambridge, UK. PetaGene is the creator of the cunoFS distributed file storage mount client. On the heels of our acquisition of Valdi, announced in July I want to take this opportunity to share more about the PetaGene acquisition in light of:

  • Storj’s evolving strategy as a distributed cloud services provider for distributed workloads
  • What PetaGene and cunoFS provide
  • What this means for distributed storage generally
  • What this means for our customers in the video, media, and entertainment industries
  • What this means for our customers in the AI/ML industries

Storj’s evolving strategy.

For those who know cloud history, Amazon Web Services launched nearly twenty years ago. One of the first services they launched was object storage (S3). AWS then expanded to file storage, elastic compute (EC2), and more. When Storj v3 launched in 2021, it included S3-compatibility. However, there were important differences. First, we built from the outset with distributed infrastructure, efficiently leveraging underutilized drives and servers around the world, rather than building out data centers. By encrypting and distributing shards of data, we offered superior performance, security, cost, and carbon efficiency. Second, we optimized our design for distributed workloads. When AWS launched, workloads were primarily centralized, and thus processing in a large, centralized data center made sense. Today, the largest and fastest growing categories involve data being collected, created, analyzed, and consumed at the edge.

Earlier this year, we began expanding the notion of distributed cloud services for distributed workloads beyond object storage and egress, by adding distributed compute and GPU services from Valdi. Like Storj had done for storage, Valdi pioneered the model of efficiently using already deployed compute and GPU resources. Like Storj, Valdi focused on markets that made the most use of distributed workloads:  AI, scientific computing, and media.

What PetaGene and cunoFS provide.

In addition to assembling an exceptionally talented team, PetaGene has pioneered some incredible technology. cunoFS is a high performance file storage mount client that allows customers to interact with object storage as if it were a fast native file system, with POSIX compatibility that can run any new or existing applications.  

cunoFS works with most major object storage systems, including AWS S3 and Azure Blob Storage, as well as on-premises object stores such as minIO, Dell ECS, and NetApp StorageGRID. Of course, cunoFS also works great with Storj! And, cunoFS supports heterogeneous combinations of these services. We are thrilled with this ability to support heterogeneous workloads. Customers will continue to be able to use cunoFS with or without Storj object storage as a back end. Furthermore, because of cunoFS’s unique design (including choosing not to have a centralized metadata server), cunoFS is extremely performant. Its speeds beat all alternatives by an order of magnitude in our tests, and achieves up to 50 Gbps per node and over 10 Tbps aggregate throughput performance. cunoFS radically changes how object storage is used, turning it into a first-class direct tier for POSIX file access, where both POSIX workloads and object-native workloads can directly access object storage. cunoFS does this without introducing any gateways and without scrambling the data — each file is directly stored as an object and each object is directly accessible as a file.

What this means for distributed storage generally.

Generally speaking, there are three main types of storage systems: block storage (primarily for databases), file storage, and object storage. By integrating cunoFS into our offerings, Storj customers across industries will now be able to use Storj for file storage based applications in addition to object storage. This vastly expands the use cases and customers for which Storj is a great choice. Because cunoFS works across a heterogeneous set of solutions, and distributed storage is inherently global and cross-data center in nature, this acquisition further expands the usefulness of our distributed storage, compute, and GPU offerings. Finally, cunoFS has Linux and Windows clients (a MacOS client is scheduled for later this year). So, customers can have an easy on-ramp to use Storj with a familiar “file and folder” based interface. (As much as we love object storage, most people are more inclined to think of files and folders with names).

Given

  1. the tremendous performance advantage that cunoFS has relative to other file mounts (see Table 2 below for cunoFS benchmarked running on AWS S3) and
  2. the tremendous global performance advantages that Storj delivers relative to hyperscalers for media workloads (see Table 3 below for Storj storage performance relative to AWS and other hyperscalers

We are tremendously excited to see what the combination means for performance overall!  

Table 1: cunoFS performance on AWS S3

Higher is better

Table 2: Storj performance relative to AWS S3

Download Speed of 1GB file versus AWS  S3 (LAION LLM)

Lower is better

What this means for customers in the video space.

Media production, media post production, and video consumption are inherently distributed workloads. So, it is no surprise that the video space has emerged as one of Storj’s two primary vertical markets. These days, remote media production is becoming the norm: any given piece of media is quite likely to be worked on by distributed teams as far flung as Burbank, Bollywood, and Berlin. Storj’s distributed object storage already meant that all of these teams could performantly access, upload, download, and edit large media files. With cunoFS, they can now do this without introducing any gateways and without a custom interface. Again, each file is directly stored as an object and each object is directly accessible as a file.

Key benefits include:

  • No content jails.
  • No proprietary file formats
  • Snappy performance. Regardless of the size of your video project.
  • Intelligent caching to save you time and money.
  • A single source of truth.

What this means for customers innovating in AI.

Although not often discussed, storage is vital in AI training. (Meta, for example, has written extensively on this topic.) As models grow to include training on large amounts of image, video, and text, the amount of data grows significantly. The integration of cunoFS into our ecosystem marks a significant milestone in our goal to revolutionize cloud infrastructure for AI. cunoFS enables performant loading with intelligent prediction of what will be needed in advance. By combining Storj’s distributed storage and GPU capabilities with cunoFS's high-performance file mount, we're creating an unparalleled platform for training and deploying large language models like LLaMA, GPT-4, and beyond.

Key benefits of the Storj-cunoFS integration for LLM training include:

  • Enhanced data processing speed, crucial for training large models efficiently
  • Improved scalability to handle the ever-growing datasets required for advanced AI
  • Cost-effective storage and compute solutions for resource-intensive AI workloads
  • Increased data security and privacy, essential for protecting valuable AI training data

More detail on cunoFS and our AI solutions will be in a forthcoming blog by our CTO, Jacob Willoughby.

Other benefits of the acquisition.

In addition to cunoFS, the PetaGene team also has a set of expertise around managing scientific workloads. Before developing cunoFS, PetaGene developed products for genomic data compression, which can reduce storage costs and data transfer times by 60 to 90%. Storj will continue to support these technologies.

PetaGene’s customers include leading research institutions, pharmaceutical companies, and hospitals, who use their products to collectively manage 100s of petabytes of data. Storj will continue to serve those customers, hopefully extending the suite of services that they use to include distributed storage and distributed compute/GPU.

All employees of PetaGene will become part of Storj. PetaGene itself will continue as a wholly owned subsidiary. No token was used in this transaction.

Share this blog post

Put Storj to the test.

It’s simple to set up and start using Storj. Sign up now to get 25GB free for 30 days.
Start your trial
product guide