I saw Jason Goldschmidt (LinkedIN), DELL Distinguished Engineer, present with another at SNIA StorageAI conference on the work of the Accelerated Object (AKA, S3 over RDMA) TWG (Technical Working Group) and thought it would be great to have him and the other TWG Co-chair, Nick Connolly (LinkedIn), ARM Principal Engineer, on the show to discuss what they are up to.
Further, S3 and object storage in general is growing by leaps and bounds. Some AWS S3 statistics: 100s of Exabytes of objects stored, 500T objects, over 123 AZs, and currently AWS is doing about 200M S3 object requests/sec or ~ 1PB/sec of S3 object data transfer, world wide.
S3 objects have become the goto store for training data, model checkpoints, RAGs and just about anything else that’s big, sequential, and needs to be stored for AI processing. As such, there’s been a ton of interest in providing a higher performing S3 object store to maintain and improve accelerator (GPU) utilization during AI processing. SNIA’s Accelerated Object TWG was formed last fall to address this pressing industry need. Listen to the podcast to learn more.
Podcast: Play in new window | Download (Duration: 40:48 — 56.0MB) | Embed
Subscribe: Apple Podcasts | Spotify | RSS
Objects have been around seemly forever. I first learned of them at a Mass Storage Conference in the early 2000s, and at the time they were called BLOBs. And for a long time object storage was in search of a killer app. I thought a while back that data lakes would be that. But then AI emerged over the last decade and object storage just exploded.
Nowadays just about anything to do with AI models, i.e., training data, checkpoints, RAG databases, log files, etc. is stored as objects. And as the S3 stats above show, these files are large, over 200MB on avg (assuming 100EB of objects over 500T objects). So the number one problem has become how to get that data from storage to GPU and back again.
Enter S3 over RDMA, or as SNIA calls it, Accelerated Object Storage. For a TWG that only formed last fall, they have been making amazing progress. Which to tell the truth is indicative of industry interest.
I was at another conferenc a couple of weeks back where a major storage vendor, announced (pre-standardization) support for S3 over RDMA. And I find it hard to believe they’re the only one.
Keith asked about how accelerated objects would help inferencing and agentic AI activity in particular. It turns out that a while back AWS and others offered support for tables (or databases) on object storage. So RAGs, are a prime example where accelerated object can help.
RAG databases can be huge and searching them for prompt/query enterprise specific context can slow down agentic inferencing activity. Having a zero-copy, direct storage to GPU object data transfer can shave a lot of latency off a RAG request, as well as considerably speed up data transfer of large chunks of data.
Then there’s KV (Key:Value) cache offload. KV cache’s are programmatically accessible context store built during inference PreFill phase and used throughout inference Decode phase and grow as agent/query context grows.
But with today’s GPU memory limited, from 32GB to 192GB, there’s just not enough room to keep every prompt’s, every agentic tools context stored. This causes KV caches to be offloaded to memory, but even CPU memory has limits.
As servers start to support 100s, if not 1000s of agentic tool request, prompts and queries, even CPU memory space can run out. Prior to KV cache offload when HBM and CPU memory ran out, prompt KV caches would be jettisoned. And so when the prompt starts up again, the GPU would have to do another PreFill phase to recalculate its KV store. But with KV cache offload to storage, a prompts KV store can be written to local SSD or object storage and read back when needed.
As Nick and Jason mentioned, it’s a timing tradeof. How quickly can you retrieve a prompt’s KV cache from local SSD or accelerated object vs, how quickly can the GPU recompute the KV cache for the prompt from context tokens..
One would think this would be a no contest, computation always beats IO. But as KV stores become larger, and GPU activity becomes more multi-threaded, reading back an offloaded KV cache (directly to GPU) can be faster, sometimes much faster
We didn’t spend as much time on AI training, but the advantages of accelerated object are even more obvious here. All we need mention is training data, may no longer need to be staged to local SSD, if it can be retrieved at RDMA speeds from local object storage. And all that checkpointing that goes on during LLM training over a gaggle of GPUs can be done a lot faster using RDMA writes to local object.
Even so, S3 over RDMA really only helps data transfer speeds, by eliminating one copy (old way: read from object store to CPU memory, copy to GPU memory; new way: read directly from object store into GPU memory, hence zero-copy). There’s still has a lot of non-data transfer, or setup activity to be done by the CPU, such as, authentication, which storage, which GPU memory location, etc. but once all that’s in place the CPU can effectively step out of the picture and let (Ethernet RDMA) hardware take over.
And of course S3 over RDMA, by eliminating one copy, frees up CPU cycles as well as speeding up data transfer.
S3 over RDMA is intended to work over RoCEv1 . RoCEv2 also exists, which supports routed (internet) access to data over Ethernet. But at the moment accelerated object is an RoCEv1 targeted solution
But how can one define a standard to enhance a protocol when the underlying protocol is not a standard. As we all know, S3 is not a standard, but rather is something AWS invented for their object access. Nick and Jason said it’s not impossible, but yes they have to deal with that as part of their work.
In the mean time they and their team are quickly moving forward working on standardizing Accelerated Object and the AI world is taking notice.
And there will be more info on Accelerated Object at the next SNIA Storage Developer Conference in USA, coming up this September in CA.
Jason Goldschmidt, DELL Distinguished Engineer, SNIA Accelerated Object TWG, Co-Chair

Jason is a Distinguished Engineer at Dell Technologies, working in the Storage Chief Technology Office. His role involves leading the development of next-generation storage and network protocols for both Public and Private Clouds.
Jason also serves as Co-Chair of the SNIA Accelerated Object I/O Technical Working Group.
Throughout his career, Jason has focused on enterprise storage and meeting the needs of customers who want to expand beyond the traditional on-premises data center. He enjoys collaborating with customers and partners to understand their requirements when transforming their business.
In his personal life, Jason is passionate about running marathons and baking artisan bread. He lives in Newton, Massachusetts, with his wife and their three children.
Nick Connolly, ARM Principal Engineer, SNIA Accelerated Object TWG, Co-chair

Nick Connolly is a pioneer of software-defined storage and a Principal Software Engineer at Arm. He holds patents ranging from highly scalable algorithms through to data protection techniques and has delivered products generating over $1bn in software revenue.
Nick serves as Co-Chair of the SNIA Accelerated Object I/O Technical Working Group and has previously been a Technical Lead for the CNCF Storage Technical Advisory Group.
He is based in London.







