In the rapidly evolving world of artificial intelligence, data is the lifeblood that fuels innovation. As AI models become more hungry for larger, more complex, multimodal, text, image, and video datasets, the need for efficient, scalable, and fast storage solutions is often overlooked. That is why I am so excited about cunoFS—a groundbreaking way to provide file system access that is redefining how AI workloads interact with data.
The storage challenge in AI workloads.
Traditional storage solutions have struggled to keep pace with the demands of modern AI applications. High-performance storage systems are often prohibitively expensive and have scalability limitations. Below are some costs per TB of solutions commonly used for storing AI data on the cloud.
* EFS pricing based on AWS-provided ML example of a mix of 100GB EFS Standard and 400GB EFS Standard-IA with 75% reads, driving 50 MB/s for 3hrs a day and 3 days a week, idle for remainder, scaled to 1TB.
EBS volumes can’t be created above 16 TB, and file storage alternatives like AWS EFS or FSx for Lustre offer more scale but fall short in terms of cost efficiency and performance (see this excellent breakdown on price versus performance of these offerings). On top of that, these lock you into their ecosystem. There is already a cost effective, performant, and scalable solution to store large amounts of data, why isn’t it used?
Object storage is the gold standard in cost efficiency and scalability, but every application must be built specifically to leverage object storage protocols, be optimized to use it effectively, and that all takes developer time. Machine learning experts build their code to train and fine tune models locally in environments they are familiar with, and when it is time to scale that or move to production you are stuck with more traditional storage that is file based. There are SDKs to change that code to use object storage, but that takes time, introduces differences and bugs between development environments and production environments, and unless done correctly can’t take full advantage of object storage.
cunoFS makes accessing any object storage faster.
Before I make the case for AI, it is helpful to understand just how fast cunoFS is compared with competitive solutions. As seen in the chart here, cunoFS outperforms solutions like AWS EFS and FSx for Lustre in both speed and cost-effectiveness.
This testing was performed using AWS S3, but results are similar with any object storage solution or region. cunoFS is significantly faster. While we are excited to offer cunoFS with Storj distributed object storage, cunoFS will remain available to use with any object storage solution.
13x faster machine learning in Python.
So how fast is it when applied to AI workloads? The workload improvements utilizing cunoFS as a high-performance data loader for machine learning are significant. cunoFS was benchmarked against the other popular options for running PyTorch workloads on S3. The chart below compares the performance of cunoFS, AWS Mountpoint, and TorchData (lower values are better).
The training split and image data (consisting of 118,287 images totalling 19.3GB) from the COCO 2017 dataset was used to test loading time. As the graph shows, the cunoFS PyTorch optimizations load data much faster than MountPoint for Amazon S3 (mount-s3) and TorchData.
It’s a common misconception that AI workloads need high IOPS. In reality, they need storage to deliver incredibly high throughput to keep GPUs fed. cunoFS uses a lot of techniques to accelerate AI workloads, including by hooking into Python and accurately predicting and prefetching data that is needed by GPUs far in advance of use. This transforms what otherwise appear to be random accesses that have traditionally relied heavily on flash and NVMe, to be fulfilled with object storage that can deliver incredible throughput, scalability and cost advantages instead.
Adding cunoFS to the distributed cloud is a breakthrough for AI development.
The distributed cloud is a new way to deliver storage and compute resources for distributed use cases. The more your data travels across regions, the more expensive traditional cloud services become because you have to add regions, configure replication, pay for egress and many more fees.
The distributed cloud is an alternative to traditional cloud services that was architected to be global by default. Spare storage and compute capacity around the world is leveraged to securely split and distribute data across nodes and regions. Yet it's incredibly fast from anywhere in the world the data pieces are sent in parallel and only a portion of the data pieces are needed to reconstitute the object. The distributed cloud is also extremely cost and carbon efficient to better support future data growth.
Adding cunoFS to the distributed cloud makes it easier to reap the benefits of object storage while saving development time and money. Specific to AI, here are the key benefits:
- Less development work - This is a scalable, drop-in solution for developers that are building, training and developing AI models locally. You can drop in cunoFS during development without additional work to use object storage SDKs. This makes the developer workflow much simpler.
- Low cost storage for local scale - When you are working locally, there are limits to how big you can go. You need to add a local storage solution and most all of them have big cost/performance tradeoffs. cunoFS allows you to get the storage you need in the cloud that acts local without the cost/performance tradeoff.
- Access to high performing GPUs - The distributed cloud has thousands of hard to find GPUs located around the world. You can spin up a GPU in seconds, only paying for what you use. And cunoFS accelerates your data transfer to those GPUs.
- Low risk innovation - cunoFS doesn’t reconfigure your data so you can still use the object tools you are used to. There are no proprietary formats and that means no lock in. That is true for storage and compute where you can pay as you go. If you decide you are done, there is no risk to innovating on the distributed cloud.
In summary, cunoFS combines the cost efficiency and limitless scalability of object storage with the convenience of standard POSIX-compatible file system access. This unique blend allows AI applications to interact with data seamlessly, as developers are used to, without the need for data staging, tiering, or adopting proprietary formats.
The distributed cloud will create a sustainable future for AI.
Everything I’ve described above is reason enough to bring cunoFS and the distributed cloud into use for AI workloads. But my excitement doesn’t end there. I believe that the distributed cloud will solve the distributed training problem and the looming energy and cost sustainability problem. Let’s face it, AI models are getting bigger, your data keeps growing to feed this, and eventually the energy demand and costs of AI are going to surpass the benefits. And it’s just too limiting and costly to have GPUs under one roof for training.
In order for AI to thrive, a single, consistent, fast cloud layer is needed to feed all these GPUs. cunoFS can be the glue that connects globally distributed GPU clusters with globally distributed object storage, supplying the capacity needed at significantly reduced costs and an 83% reduction in carbon emissions. Imagine a future where a surplus of wind power in Germany can be leveraged by easily scheduling compute to take place in that region at that time. For AI to continue, we have to start thinking about ways to more efficiently use existing resources. Which is why I’m thrilled to have cunoFS join the distributed cloud to help realize a better future for AI.