A team of researchers in the Computer Science Department from Carnegie Mellon University School of Computer Science currently hosts a large research dataset that is quickly growing beyond the multiple TB range, and generating over 80TB of bandwidth utilization monthly.
For Carnegie Mellon, this large dataset's release will facilitate future research in the related areas and needs to be both preserved for long periods of time and highly available in a public, read-only format which global researchers can access.
"Tardigrade is an ideal solution for sharing the dataset because the data stored is globally available by default," said Juncheng Yang, a CMU Ph.D. student. "Tardigrade helped us to solve our data storage and global sharing problem. The parallel architecture removes the bandwidth bottlenecks we had been facing, enabling us to upload and share our data more efficiently and quickly."
The goal of this project with CMU was to showcase an easy solution to back up and share academic datasets in a way that's persistent into the future and easily available to share or recall across global geographies, making it simple to replicate published results from academic literature.
How Carnegie Mellon University uses Tardigrade
To back up and share read-only access to their output, the team at Carnegie Mellon University uses two native integrations with Tardigrade