167: GreyBeards talk Distributed S3 storage with Enrico Signoretti, VP Product & Partnerships, Cubbit

Long time friend, Enrico Signoretti (LinkedIn), VP Product and Partnerships, Cubbit, used to be a common participant at Storage Field Day (SFD) events and I’ve known him since we first met there. Since then, he’s worked for a startup and a prominent analyst firms. But he’s back at another startup and this one looks like it’s got legs.

Cubbit offers Distributed S3 compatible object storage that offers geo-distribution and geo-fencing for object data, in which the organization owns the hardware and Cubbit supplies the software. There’s a management component, the Coordinator, which could run on your hardware or as a SaaS service they provide but other than that, IT controls the rest of the system hardware. Listen to the podcast to learn more.

Cubbit comes in 3 components:

  • One or more Storage nodes which includes their agent software running ontop of a linux system with direct attached storage.
  • One or more Gateway nodes which provides S3 protocol acces to the objects stored on storage nodes. Typical S3 access points https://S3.company_name, com/… points to either a load balancer, front end or one or more Gateway nodes. Gateway nodes provide the mapping between the bucket name/object identifier and where the data currently resides or will reside.
  • One Coordinator node which provides the metadata to locate the data for objects, manage the storage nodes, gateways and monitor the service. The Coordinator node can be a SaaS service supplied by Cubbit or a VM/bare metal node running Cubbit Coordinator software. Metadata is protected internally within the Coordinator node.

With these three components one can stand up a complete, geo-distributed/geo-fenced, S3 object storage system which the organization controls.

Cubbit encrypts data as it at the gateway and decrypts data when accessed. Sign-on to the system uses standard security offerings. Security keys can be managed by Cubbit or by standard key management systems.

All data for an object is protected by nested erasure codes. That is 1) erasure code within a data center/location over its storage drives and 2) erasure code across geographical locations/data centers..

With erasure coding across locations, customer with say 10 data center locations can have their data stored in such a fashion that as long as at least 8 data centers are online they still have access to their data, that is the Cubbit storage system can still provide data availability.

Similarly for erasure coding within the data center/location or across storage drives, say with 12 drives per stripe, one could configure lets say 9+3 erasure coding, where as long as 9 of the drives still operate, data will be available.

Please note the customer decides the number of locations to stripe across for erasure coding, and diet for the number of storage drives.

The customer supplies all the storage node hardware. Some customers start with re-purposed servers/drives for their original configuration and then upgrade to higher performing storage-servers-networking as performance needs change. Storage nodes can be on prem, in the cloud or at the edge.

For adequate performance gateways and storage nodes (and coordinator nodes) should be located close to one another. Although Coordinator nodes are not in the data path they are critical to initial object access.

Gateways can provide a cache for faster local data access.. Cubbit has recommendations for Gateway server hardware. And similar to storage nodes, Gateways can operate at the edge, in the cloud or on prem.

Use cases for the Distributed S3 storage include:

  • As a backup target for data elsewhere
  • As a geographically distributed/fenced object store.
  • As a locally controlled object storage to feed AI training/inferencing activity.

Most backup solutions support S3 object storage as a target for backups.

Geographically distributed S3 storage means that customers control where object data is located. This could be split across a number of physical locations, the cloud or at the edge.

Geographically fenced S3 storage means that the customer controls which of its many locations to store an object. For GDPR countries with multi-nation data center locations this could provide the compliance requirements to keep customer data within country.

Cubbit’s distributed S3 objects storage is strongly consistent in that an object loaded into the system at any location is immediately available to any user accessing it through any other gateway. Access times vary but the data will be the same regardless of where you access it from.

The system starts up through an Ansible playbook which asks a bunch of questions and loads and sets up the agent software for storage nodes, gateway nodes and where applicable, the coordinator node.

At any time, customers can add more gateways or storage nodes or retire them. The system doesn’t perform automatic load balancing for new nodes but customers can migrate data off storage nodes and onto other ones through api calls/UI requests to the Coordinator.

Cubbit storage supports multi-tenancy, so MSPs can offer their customers isolated access.

Cubbit charges for their service on data storage under management. Note it has no egress charges, and you don’t pay for redundancy. But you do supply all the hardware used by the system. They offer a discount for M&E customers as the metadata to data ratio is much smaller (lots of large files) than most other S3 object stores (mix of small and large files).

Cubbit is presently available only in Europe but will be coming to USA next year. So, if you are interested in geo-distributed/geo-fenced S3 object storage that you control and can be had for much cheaper than hyperscalar object storage, check it out.

Enrico Signoretti, VP Products & Partnerships

Enrico Signoretti has over 30 years of experience in the IT industry, having held various roles including IT manager, consultant, head of product strategy, IT analyst, and advisor.

He is an internationally renowned visionary author, blogger, and speaker on next-generation technologies. Over the past four years, Enrico has kept his finger on the pulse of the evolving storage industry as the Head of Research Product Strategy at GigaOm. He has worked closely and built relationships with top visionaries, CTOs, and IT decision makers worldwide.

Enrico has also contributed to leading global online sites (with over 40 million readers) for enterprise technology news.

164: GreyBeards talk FMS24 Wrap-up with Jim Handy, General Dir., Objective Analysis

Jim Handy, General Director, Objective Analysis, is our long, time goto guy on SSD and Memory Technologies and we were both at FMS (Future of Memory and Storage – new name/broader focus) 2024 conference last week in Santa Clara, CA. Lots of new SSD technology both on and off the show floor as well as new memory offerings and more.

Jim helps Jason and I understand what’s happening with NAND, and other storage/memory technologies that matter to today’s IT infrastructure. Listen to the podcast to learn more.

First off, I heard at the show that the race for more (3D NAND) layers is over. According to Jim, companies are finding it’s more expensive to add layers than it is just to do a lateral (2D, planar) shrink (adding more capacity per layer).

One vendor mentioned that the CapEx Efficiencies were degrading as they add more layers. Nonetheless, I saw more than one slide at the show with a “3xx” layers column.

Kioxia and WDC introduced a 218 layer, BICS8 NAND technology with 1Tb TLC and up to 2Tb QLC NAND per chip. Micron announced a 233 layer Gen 9 NAND chip.

Some vendor showed a 128TB (QLC) SSD drive. The challenge with PCIe Gen 5 is that it’s limited to 4GB/sec per lane and for 16 lanes, that’s 64GB/s of bandwidth and Gen 4 is half that. Jim called using Gen 4/Gen 5 interfaces for a 128TB SSD like using a soda straw to get to data.

The latest Kioxia 2Tb QLC chip is capable of 3.6Gbps (source: Kioxia America) and with (4*128 or) 512 of these 2Tb chips needed to create a 128TB drive that’s ~230GB/s of bandwidth coming off the chips being funneled down to 16X PCIe Gen5 64GB/s of bandwidth, wasting ~3/4ths of chip bandwidth.

Of course they need (~1.3x?) more than 512 chips to make a durable/functioning 128TB drive, which would only make this problem worse. And I saw one slide that showed a 240TB SSD!

Enough on bandwidth, let’s talk data growth. Jason’s been doing some research and had current numbers on data growth. According to his research, the world’s data (maybe xmitted over internet) in 2010 was 2ZB (ZB, zettabytes = 10^21 bytes), and in 2023 it was 120ZB and by 2025 it should be 180ZB. For 2023, thats over 328 Million TB/day or 328EB/day (EB, exabytes=10^18 bytes).

Jason said ~54% of this is video. He attributes the major data growth spurt since 2010 to mainly social media videos.

Jason also mentioned that the USA currently (2023?) had 5,388 data centers, Germany 522, UK 517, and China 448. That last number seems way low to all of us but they could just be very, very big data centers.

No mention on the average data center size (meters^2, # servers, #GPUs, Storage size, etc). But we know, because of AI, they are getting bigger and more power hungry,

There were more FMS 2024 topics discussed, like the continuing interest in TLC SSDs, new memory offerings, computational storage/memory, etc.

Jim Handy, General Director, Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry, including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.

He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media. 

He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

163: GreyBeards talk Ultra Ethernet with Dr J Metz, Chair of UEC steering committee, Chair of SNIA BoD, & Tech. Dir. AMD

Dr J Metz, (@drjmetz, blog) has been on our podcast before mostly in his role as SNIA spokesperson and BoD Chair, but this time he’s here discussing some of his latest work on the Ultra Ethernet Consortium (UEC) (LinkedIN: @ultraethernet, X: @ultraethernet)

The UEC is a full stack re-think of what Ethernet could do for large single application environments. UEC was originally focused on HPC, with 400-800 Gbps networks and single applications like simulating a hypersonic missile or airplane. But with the emergence of GenAI and LLMs, UEC could also be very effective for large AI model training with massive clusters doing a single LLM training job over months. Listen to the podcast to learn more.

The UEC is outside the realm of normal enterprise environments. But as AI training becomes more ubiquitous, who knows whether UEC may not find a place in the enterprise. However, it’s not intended for mixed network environments with multiple applications. It’s a single application network.

One wouldn’t think, HPC was a big user of Ethernet for their main network. But Dr J pointed out that the top 3 of the HPC 500, all use Ethernet and more are looking to use it in the future.

UEC is essentially an optimized software stack and hardware for networking used by single application environments. These types of workloads are constantly pushing the networking envelope. And by taking advantage of the “special networking personalities” of these workloads, UEC can significantly reduce networking overheads, boosting bandwidth and workload execution.

The scale of networks is extreme. The UEC is targeting up to a million endpoints, over >100K servers, with each network link >100Gbps and more likely 400-800Gpbs. With the new (AMD and others) networking cards coming out that support 4 400/800Gbps network ports, having a pair of these on each server, with 100K server cluster gives one 800K endpoints. A million is not that far away when you think of it at that scale.

Moreover, LLM training and HPC work are starting to look more alike these days. Yes there are differences but the scale of their clusters are similar, and the way work is sometimes fed to them is similar, which leads to similar networking requirements

UEC is attempting to handle a 5% problem. That is 95% of the users will not have 1M endpoints in their LAN, but maybe 5% will and for these 5%, a more mixed networking workload is unnecessary. In fact, a mixed network becomes a burden slowing down packet transmission.

UEC is finding that with a few select networking parameters, almost like workload fingerprints, network stacks can be much more optimized than current Ethernet and thereby support reduced packet overheads, and more bandwidth.

AI and HPC networks share a very limited set of characteristics which can be used as fingerprints. These characteristics are like reliable or unreliable transport, ordered or unordered delivery, multi-path packet spraying or not, etc, With a set of these types of parameters, selected for an environment, UEC can optimize a network stack to better support a million networking endpoints

We asked where CXL fits in with UEC? DrJ said it could potentially be an entity on the network but he sees CXL more as a within server or between a tight (limited) cluster of servers, solution rather than something on a UEC network.

Just 12 months ago the UEC had 10 members or so and this past week they were up to 60. UEC seems to have struck a chord.

The UEC plans to release a 1.0 specification, near the end of this year. UEC 1.0 is intended to operate on current (>100Gbps) networking equipment with firmware/software changes.

Considering the UEC was just founded in 2023, putting out their 1.0 technical spec. within 1.5 years is astonishing. But also speaks volumes to the interest in the technology.

The UEC has a blog post which talks more about UEC 1.0 specification and the technology behind it.

Dr J Metz, Chair of UEC Steering Committee, Chair of SNIA BoD, Technical Director of Systems Design, AMD

J works to coordinate and lead strategy on various industry initiatives related to systems architecture. Recognized as a leading storage networking expert, J is an evangelist for all storage-related technology and has a unique ability to dissect and explain complex concepts and strategies. He is passionate about the innerworkings and application of emerging technologies.

J has previously held roles in both startups and Fortune 100 companies as a Field CTO,  R&D Engineer, Solutions Architect, and Systems Engineer. He has been a leader in several key industry standards groups, sitting on the Board of Directors for the SNIA, Fibre Channel Industry Association (FCIA), and Non-Volatile Memory Express (NVMe). A popular blogger and active on Twitter, his areas of expertise include NVMe, SANs, Fibre Channel, and computational storage.

J is an entertaining presenter and prolific writer. He has won multiple awards as a speaker and author, writing over 300 articles and giving presentations and webinars attended by over 10,000 people. He earned his PhD from the University of Georgia.

162: GreyBeards talk cold storage with Steffen Hellmold, Dir. Cerabyte Inc.

Steffen Hellmold, Director, Cerabyte Inc. is extremely knowledgeable about the storage device business. He has worked for WDC in storage technology and possesses an in-depth understanding of tape and disk storage technology trends.

Cerabyte, a German startup, is developing cold storage. Steffen likened Cerabyte storage to ceramic punch cards that dominated IT and pre-IT over much of the last century. Once cards were punched, they created near-WORM storage that could be obliterated or shredded but was very hard to modify. Listen to the podcast to learn more.

Cerabyte uses a unique combination of semiconductor (lithographic) technology, ceramic coated glass, LTO tape (form factor) cartridge and LTO automation in their solution. So, for the most part, their critical technologies all come from somewhere else.

Their main technology uses a laser-lithographic process to imprint onto a sheet (ceramic coated glass) a data page (block?). There are multiple sheets in each cartridge.

Their intent is to offer a robotic system (based on LTO technology) to retrieve and replace their multi-sheet cartridges and mount them in their read-write drive.

As mentioned above, the write operation is akin to a lithographic data encoded mask that is laser imprinted on the glass. Once written, the data cannot be erased. But it can be obliterated, by something akin to writing all ones or it can be shredded and recycled as glass.

The read operation uses a microscope and camera to take scans of the sheet’s imprint and convert that into data.

Cerabyte’s solution is cold or ultra-cold (frozen) storage. If LTO robotics are any indication, a Cerabyte cartridge with multiple sheets can be presented to a read-write drive in a matter of seconds. However, extracting the appropriate sheet in a cartridge, and mounting it in a read-write drive will take more time. But this may be similar in time to an LTO tape leader being threaded through a tape drive, again a matter of seconds

Steffen didn’t supply any specifications on how much data could be stored per sheet other than to say it’s on the order of many GB. He did say that both sides of a Cerabyte sheet could be recording surfaces.

With their current prototype, an LTO form factor cartridge holds less than 5 sheets of media but they are hoping that they can get this to a 100 or more. in time.

We talked about the history of disk and tape storage technology. Steffen is convinced (as are many in the industry) that disk-tape capacity increases have slowed over time and that this is unlikely to change. I happen to believe that storage density increases tend to happen in spurts, as new technology is adopted and then trails off as that technology is built up. We agreed to disagree on this point.

Steffen predicted that Cerabyte will be able to cross over disk cost/capacity this decade and LTO cost/capacity sometime in the next decade.

We discussed the market for cold and frozen storage. Steffen mentioned that the Office of the Director of National Intelligence (ODNI) has tasked the National Academies of Sciences, Engineering, and Medicine to conduct a rapid expert consultation on large-scale cold storage archives. And that most hyperscalers have use for cold and frozen storage in their environments and some even sell this (Glacier storage) to their customers.

The Library of Congress and similar entities in other nations are also interested in digital preservation that cold and frozen technology could provide. He also thinks that medical is a prime market that is required to retain information for the life of a patient. IBM, Cerabyte, and Fujifilm co-sponsored a report on sustainable digital preservation.

And of course, the media libraries for some entertainment companies represent a significant asset that if on tape has to be re-hosted every 5 years or so. Steffen and much of the industry are convinced that a sizeable market for cold and frozen storage exists.

I mentioned that long archives suffer from data format drift (data formats are no longer supported). Steffen mentioned there’s also software version drift (software that processed that data is no longer available/runnable on current OSs). And of course the current problem with tape is media drift (LTO media formats can be read only 2 versions back).

Steffen seemed to think format and software drift are industry-wide problems and they are being worked on. Cerabyte seems to have a great solution for media drift. As it can be read with a microscope. And the (ceramic glass) media has a predicted life of 100 years or more.

I mentioned the “new technology R&D” problem. Historically, as new storage technology has emerged, they have always end up being left behind (in capacity), because disk-tape-NAND R&D ($Bs each) over spends them. Steffen said it’s certainly NOT B$ of R&D for tape and disk.

Steffen countered by saying that all storage technology R&D spending pales in comparison to semiconductor R&D spending focused on reducing feature size. And as Cerabyte uses semiconductor technologies to write data, sheet capacity is directly a function of semiconductor technology. So, Cerabyte’s R&D technology budget should not be a problem. And in fact they have been able to develop their prototype, with just $7M in funding.

Steffen mentioned there is an upcoming Storage Technology Showcase conference in early March where Cerabyte will be at.

Steffen Hellmold, Director, Cerabyte Inc.

Steffen has more than 25 years of industry experience in product, technology, business & corporate development as well as strategy roles in semiconductor, memory, data storage and life sciences.

He served as Senior Vice President, Business Development, Data Storage at Twist Bioscience and held executive management positions at Western Digital, Everspin, SandForce, Seagate Technology, Lexar Media/Micron, Samsung Semiconductor, SMART Modular and Fujitsu.

He has been deeply engaged in various industry trade associations and standards organizations including co-founding the DNA Data Storage Alliance in 2020 as well as the USB Flash Drive Alliance, serving as their president from 2003 to 2007.

He holds an economic electrical engineering degree (EEE) from the Technical University of Darmstadt, Germany.

161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon

We talked with Andy Warfield (@AndyWarfield), VP Distinguished Engineer, Amazon, about 10 years ago, when at Coho Data (see our (005:) Greybeards talk scale out storage … podcast). Andy has been a good friend for a long time and he’s been with Amazon S3 for over 5 years now. Since the recent S3 announcements at AWS Re:Invent, we thought it a good time to have him back on the show. Andy has a great knack for explaining technology, I suppose that comes from his time as a professor but whatever the reason, he was great to have on the show again.

Lately, Andy’s been working on S3 Express, One Zone storage, announced last November, a new version of S3 object storage with lower response time. We talked about this later in the podcast but first we touched on S3’s history and other advances. S3 and its ancillary services have advanced considerably over the years. Listen to the podcast to learn more

S3 is ~18 years old now and was one of the first AWS offerings. It was originally intended to be the internet’s file system which is why it was based on HTTP protocols.

Andy said that S3 was designed for 11-9s durability and high availability options. AWS constantly monitors server and storage failures/performance to insure that they can maintain this level of durability. The problem with durability is that when a drive/server goes down, the data needs to be rebuilt onto another drive before another drive fails. One way to do this is to have more replicas of the data. Another way is to speed up rebuild times. I’m sure AWS does both.

S3 high availability requires replicas across availability zones (AZ). AWS availability zone data centers are carefully located so that they are power-networking isolated from others data centers in the region. Further, AZ site locations are deliberately selected with an eye towards ensuring they are not susceptible to similar physical disasters.

Andy discussed other AWS file data services such as their FSx systems (Amazon FSx for Lustre, for OpenZFS, for Windows File Server, & for NetApp ONTAP) as well as Elastic File System (EFS). Andy said they sped up one of these FSx services by 3-5X over the last year.

Andy mentioned one of the guiding principles for lot of AWS storage is to try to eliminate any hard decisions for enterprise developers. By offering FSx files, S3 objects and their other storage and data services, customers already using similar systems in house can just migrate apps to AWS without having to modify code.

Andy said one thing that struck him as he came on the S3 team was the careful deliberation that occurred whenever they considered S3 API changes. He said the team is focused on the long term future of S3 and any API changes go through a long and deliberate review before implementation.

One workload that drove early S3 adoption was data analytics. Hadoop and BigTable have significant data requirements. Early on, someone wrote an HDFS interface to S3 and over time lots of data analytics activity moved to S3 object hosted data.

Databases have also changed over the last decade or so. Keith mentioned that many customers are foregoing traditional data bases to use open source database solutions with S3 as their backend storage. It turns out that Open Table Format database offerings such as Apache Iceberg, Apache Hudi and Delta Lake are all available on AWS use S3 objects as their storage

We talked a bit about Lambda Server-less processing triggered by S3 objects. This was a new paradigm for computing when it came out and many customers have adopted Lambda to reduce cloud compute spend.

Recently Amazon introduced a file system Mount point for S3 storage. Customers can now use an NFS mount point to access any S3 bucket.

Amazon also supports the Registry for Open Data, which holds just about every canonical data set (stored as S3 objects) used for AI training.

In the last ReInvent, Amazon announced S3 Express One Zone which is a high performance, low latency version of S3 storage. The goal for S3 express was to get latency down from 40-60 msec to less than 10 sec.

They ended up making a number of changes to S3 such as:

  • Redesigned/redeveloped some S3 micro services to reduce latency
  • Restricted S3 Express storage to a single zone reducing replication requirements, but maintained 11-9s durability
  • Used higher performing storage
  • Re-designed S3 API to move some authentication/verification to the beginning of object access from every object access call.

Somewhere during our talk Andy said that, in aggregate, S3 is providing 100TBytes/sec of data bandwidth. How’s that for a scale out storage.

Andy Warfield, VP Distinguished Engineer, Amazon

Andy is a Vice President and Distinguished Engineer in Amazon Web Services. He focusses primarily on data storage and analytics.

Andy holds a PhD from the University of Cambridge, where he was one of the authors of the Xen hypervisor. Xen is an open source hypervisor that was used as the initial virtualization layer in AWS, among multiple other early cloud companies. Andy was a founder at Xensource, a startup based on Xen that was subsequently acquired by Citrix Systems for $500M. Following XenSource,

Andy was a professor at the University of British Columbia (UBC), where he was awarded a Canada Research Chair, and a Sloan Research Fellowship. As a professor, Andy did systems research in areas including operating systems, networking, security, and storage.

Andy’s second startup, Coho Data, was a scale-out enterprise storage array that integrated NVMe SSDs with programmable networks. It raised over 80M in funding from VCs including Andreessen Horowitz, Intel Capital, and Ignition Partners.