NVMe SSDs – Grey Beards on Systems

April 13, 2023April 13, 2023

146: GreyBeards talk K8s cloud storage with Brian Carmody, Field CTO, Volumez

We’ve known Brian Carmody (@initzero), Field CTO, Volumez for over a decade now and he’s always been very technically astute. He moved to Volumez earlier this year and has once again joined a storage startup. Volumez is a cloud K8s storage provider with a new twist, K8s persistent volumes hosted on ephemeral storage.

Volumes currently works in public clouds (AWS & Azure( soft launch), with GCP coming soon) and is all about supplying high performing, enterprise class data services to K8s container apps. But doing this using transient (Azure ephemeral &AWS instance) storage and standard Linux. Hyperscalers offer transient storage as almost an afterthought with customer compute instances. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 48:38 — 66.8MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

It turns out that over the last decade or so, there has been a lot of time and effort devoted to maturing Linux’s storage stack and nowadays, with appropriate configuration, Linux can offer enterprise class data services and performance using direct attached NVMe SSDs. These services include thin provisioning, encryption, RAID/erasure coding, snapshots, etc., which on top of NVMe SSDs, provide IOPS, bandwidth and latency performance that boggles the mind.

However, configuring Linux sophisticated and high performing data services is a hard problem to solve..

Enter Volumez, they have a SaaS control plane, client software plus CSI drivers that will configure Linux with ephemeral storage to support any performance and data service that can be obtained from NVMe SSDs.

Once installed on your K8s cluster, Volumez software profiles all ephemeral storage, and supplies that information to their SaaS control plane. Once that’s done your platform engineers can define specific storage class policies or profiles useable by DevOps to consume ephemeral storage. .

These policies identify volume [IOPs, Bandwidth, Latency] X [read, write] performance specifications as well as data protection, resiliency and other data service requirements. DevOps engineers consume this storage using PVCs that call for these storage classes at some capacity. When it sees the PVC claim, Volumez SaaS control plane will carve out slices of ephemeral storage that can support the performance and other storage requirements defined in the storage class.

Once that’s done, their control plane next creates a network path from the compute instances with ephemeral storage to the worker nodes running container apps. After that it steps out of the picture and the container apps have a direct (network) data path to the storage they requested. Note, Volumez’s SaaS control plane is not in the container app storage data path at all.

Volumez supports multi-AZ data resiliency for PVCs. In this case, another mirror K8s cluster would reside in another AZ, with Volumez software active and similar if not equivalent ephemeral storage. Volumez will configure the container volume to mirror data between AZs. Similarly, if the policy requests erasure coding, Volumez SaaS software configures the ephemeral storage to provide erasure coding for that container volume.

Brian said they’ve done some amazing work to increase the speed of Linux snapshotting and restoring.

As noted above, the Volumez control plane SaaS software is outside the data path, so even if the K8s cluster running Volumez enabled storage loses access to the control plane, container apps continue to run and perform IO to their storage. This can continue until there’s a new PVC request that requires access to their control plane.

Ephemeral storage is accessed through special compute instances. These are not K8s worker nodes and they essentially act as a passthru or network attachment between worker nodes running apps with PVC’s and the Volumez configured Linux Logical Volumes hosted on slices of ephemeral storage.

Volumez is gaining customer traction with data platform clients, DBaaS companies, and some HPC environments. But just about anyone needing high performing data services for cloud K8s container apps should give Volumez a try.

I looked at AWS to see how they price instance store capacities and found out it’s not priced separately, but rather instance storage is bundled into the cost of EC2 compute instances.

Volumez is priced based on the number of media devices (instance/ephemeral stores) and performance (IOPs) available. They also have different tiers depending on support level requirements (e.g., community, Business hrs, 7X24) which also offers different levels of enterprise security functionality.

Brian said they have a free tier that customers can easily signup for and try out by going to their web site (see link above), or if you would like a guided demo, just contact him directly.

Brian Carmody, Field CTO, Volumez

Brian Carmody is Field CTO at Volumez. Prior to joining Volumez, he served as Chief Technology Officer of data storage company Infinidat where he drove the company’s technology vision and strategy as it ramped from pre-revenue to market leadership.

Before joining Infinidat, Brian worked in the Systems and Technology Group at IBM where he held senior roles in product management and solutions engineering focusing on distributed storage system technologies.

Prior to IBM, Brian served as a technology executive at MTV Networks Viacom, and at Novus Consulting Group as a Principal in the Media & Entertainment and Banking practices.

February 2, 2023February 2, 2023

142: GreyBeards talk scale-out, software defined storage with Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Software defined storage is a pretty full segment of the market these days. So, it’s surprising when a new entrant comes along. We saw a story on Quobyte in Blocks and Files and thought it would be great to talk with Bjorn Kolbeck (LinkedIn), Co-Founder & CEO, Quobyte. Bjorn got his PhD in scale out storage and went to work at Google on anything but storage. While there, he was amazed by Goodle’s vast infrastructure being managed by only a few people and thought this could should be commercialized, so Quobyte was born. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 45:48 — 62.9MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Quobyte is a scale out file and object storage system with mirrored metadata and data which is 3-way mirrored or erasure coded (EC). Minimum cluster is 4 nodes (fault tolerant for a single node failure.). Quobyte has current customers with ~250 nodes and ~20K clients accessing a storage cluster.

Although they support NFSv3 and NFSv4 for file (and object) access, their solution is typically deployed using host client and storage services software accessing the files with Posix or objects via S3. Objects can also be accessed as file within the file system directories.

Host client software runs on Linux, Mac or Windows machines. Storage server software runs on Linux systems bare metal or under VMs in user space. Quobyte also support containerized storage server software for K8s but their bare metal/VM storage server software option doesn’t require containers.

Quobyte is also available in the GCP marketplace and can run in AWS, Azure and Oracle Cloud.

Their metadata service is a mirrored key-value store distributed across any number of (customer configured, I believe) storage nodes. Metadata resides on flash and distribution is designed to eliminate the metadata service as a performance bottleneck.

Their data services supports (any number of) storage tiers. Storage policies determine how tiering is used for files, directories, objects, etc. For example, with 3 tiers (NVMe Flash, SSD, and disk), file data could be first landed on NVMe Flash, but as it grows, it gets moved off to SSD, and as it grows, even more, it’s moved to disk. This could also be triggered using time since last access.

Bjorn said anything in file system metadata could be used to trigger data movement across tiers. Each tier could be defined with different data protection policies, like mirroring or EC 8+3.

Backend storage is split up into Volumes. They also support thinly provisioned volumes for file creation.

Unclear how tiering and thin provisioning applies to objects with much richer metadata options but as they can be mapped to files, we suppose that anything in the object file metadata could conceivably used to trigger tiering as a bare minimum.

As for security,

Quobyte supports end to end data encryption. This is done once and the customer owns the keys. They do support external key servers. I believe this is another option that is enabled by file based policy management. It seems like different files can have different keys to encrypt them.
Quobyte supports TLS. Depending on customer requirements data may go across open networks and this is where TLS could very well be used. And Quobyte supports user X.509 certificates for users, devices and systems authentication.
Quobyte supports file access controls. They support a subset of Windows capabilities but have full support for Linux and Mac access controls.

Quobyte also supports two forms of cluster to cluster replication. One is event driven where event occurrence (i.e. file close) signals data replication and another which is time driven (i.e., every 5 minutes) but both are asynchronous.

Quobyte was designed from the start to be completely API driven. But they do support CLI and a GUI for those customers that want them.

They have a Free (forever) edition, a downloadable version of the software without 24/7 support and minus some enterprise capabilities (think encryption). This is gated at 150TB disk/30TB flash with limited number of clients and volumes.

The Infrastructure edition is their full featured solution with 7/24 enterprise support. It’s comes with a yearly service fee, priced by capacity with volume discounts.

Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Bjorn Kolbeck, Co-Founder and CEO of Quobyte attended the Technical University of Berlin and Humboldt University of Berlin.

His PhD thesis dealt with fault-tolerant replication, but he gained several years’ experience in distributed and storage systems while developing the distributed research file system XtreemFS at the Zuse Institute Berlin.

He then spent time at Google working as a Software Engineer before he and fellow Co-Founder Felix Hupfield decided to combine the innovative research from XtreemFS and the operations experience from Google to build a highly reliable and scalable enterprise-grade storage system now known as Quobyte.

August 20, 2022August 20, 2022

136: Flash Memory Summit 2022 wrap-up with Tom Coughlin, President, Coughlin Assoc.

We have known Tom Coughlin (@thomascoughlin), President, Coughlin Associates for a very long time now. He’s been an industry heavyweight almost as long as Ray (maybe even longer). Tom has always been very active in storage media, storage drives, storage systems and memory as well as active in the semiconductor space. All this made him a natural to perform as Program Chair at Flash Memory Summit (FMS)2022, so it’s great to have on the show to talk about the conference.

Podcast: Play in new window | Download (Duration: 47:29 — 65.2MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Just prior to the show, Micron announced that they had achieved 232 layer 3D NAND(in sampling methinks). Which would be a major step on the roadmap to higher density NAND. Micron was not at the show, but held an event at Levi stadium, not far from the conference center.

During a keynote, SK Hynix announced they had achieved 238 layer NAND, just exceeding Micron’s layer count. Other vendors at the show promised more layers as well but also discussed different ways other than layer counts to scale capacity, such as shrinking holes, moving logic, logical (more bits/cell) scaling, etc. PLC (5 bits/cell) was discussed and at least one vendor mentioned 6LC (not sure there’s a name yet but HxLC maybe?). Just about any 3D NAND is capable of logical scaling in bits/cell. So 200+ layers will mean more capacity SSDs over time.

The FMS conference seems to be expanding beyond Flash into more storage technologies as well as memory systems. In fact they had a session on DNA storage at the show.

In addition, there was a lot of talk about CXL, the new shared memory standard which supports shared memory over PCIe at FMS2022. PCIe is becoming a near universal connection protocol and is being used for 2d scaling of chips as a chip to chip interconnect as well as distributed storage and shared memory interconnect.

The CXL vision is that servers will still have DDR DRAM memory but they can share external memory systems. With shared memory systems in place memory, memory could be pooled and aggregated into one large repository which could then be carved up and parceled out to servers to support the workload dejour. And once those workloads are done, recarved up for the next workload to come. Almost like network attached storage only in this world its network attached memory.

Tom mentioned that CXL is starting to adopting other memory standers such as the Open Memory Interface (OMI) which has also been going on for a while now.

Moreover, CXL can support a memory hierarchy, which includes different speed memories such as DRAM, SCM, and SSDs. If the memory system has enough smarts to keep highly active data in the highest speed devices, an auto-tiering, shared memory pool could provide substantial capacities (10s-100sTB) of memory at a much reduced cost. This sounds a lot like what was promised by Optane.

Another topic at the show was Software Enabled/Defined Flash. There are a few enterprise storage vendors (e.g., IBM, Pure Storage and Hitachi) that design their own proprietary flash devices, but with SSD vendors coming out with software enabled flash, this should allow anyone to do something similar. Much more to come on this. Presumably, the hyper-scalers are driving this but having software enabled flash should benefit the entire IT industry.

The elephant in the room at FMS was Intel’s winding down of Optane. There were a couple of the NAND/SSD vendors talking about their “almost” storage class memory using SLC and other NAND tricks to provide Optane like performance/endurance using NAND storage.

Keith mentioned a youtube clip he saw where somebody talked about an Radeon Pro SSG ( (AMD GPU that had M.2 SSDs attached to it). And tried to show how it improved performance for some workloads (mostly 8k video using native SSG APIs). He replaced the old M.2 SSDs with newer ones with more capacity which increased the memory but it still had many inefficiencies and was much slower than HBM2 memory or VRAM. Keith thought this had some potential seeing as how in memory databases seriously increase performance but as far as I could see the SSG and it’s moded brethren died before it reached that potential.

As part of the NAND scaling discussion, Tom said one vendor (I believe Samsung) mentioned that by 2030, with die stacking and other tricks, they will be selling an SSD with 1PB of storage behind it. Can’t wait to see that.

By the way, if you are an IEEE member and are based in the USA, Tom is running for IEEE USA president this year, so please vote for him. It would be nice having a storage person in charge at IEEE.

Thomas Coughlin, President Coughlin Associates

Tom Coughlin, President, Coughlin Associates is a digital storage analyst and business and technology consultant. He has over 40 years in the data storage industry with engineering and senior management positions at several companies. Coughlin Associates consults, publishes books and market and technology reports (including The Media and Entertainment Storage Report and an Emerging Memory Report), and puts on digital storage-oriented events.

He is a regular storage and memory contributor for forbes.com and M&E organization websites. He is an IEEE Fellow, Past-President of IEEE-USA, Past Director of IEEE Region 6 and Past Chair of the Santa Clara Valley IEEE Section, Chair of the Consultants Network of Silicon Valley and is also active with SNIA and SMPTE.

For more information on Tom Coughlin and his publications and activities go to

June 10, 2022June 10, 2022

133: GreyBeards talk trillion row databases/data lakes with Ocient CEO & Co-founder, Chris Gladwin

We saw a recent article in Blocks and Files (Storage facing trillion-row db apocalypse), about a couple of companies which were trying to deal with trillion row database queries without taking weeks to respond. One of those companies was Ocient (@Ocient), a Chicago startup, whose CEO and Co-Founder, Chris Gladwin, was an old friend from CleverSafe (now IBM Cloud Object Storage).

Chris and team have been busy creating a new way to perform data analytics on massive data lakes. It’s has a lot to do with extreme parallelism, high core counts, NVMe SSDs, and sophisticated network and compute flow control. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 43:44 — 60.1MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

The key to Ocient’s approach involves NVMe SSDs which have become ubiquitous over the last couple of years which can be deployed to deal with large data problems. Another key to Ocient is multi-core CPUs, which again seem everwhere and if anything, are almost doubling with every new generation of CPU chip.

We let Chris wax a little too long on the SSD revolution in IOPs, especially as pertains too random 4K reads. Put a 20 or so NVMe SSDs in a server with dual 50 core CPU chips and you have one fast random IO machine.

Another key to Ocient is very sophisticated network and bus data flow management. With all this data running any query on it, involves consuming lots of data that all has to be brought into the CPU. PCIe bandwidth helps, as does NVMe SSDs, but you still need to insure that nothing gets bottlenecked moving all that data around a system/server.

Yet another key to Ocient is parallelism. With one 20 NVMe SSD server and 2-50 core CPUs you’ve got a lot of capability but when you are talking about trillion row databases you need more. So in order to respond to queries in anything a second or so, they throw a lot of NVMe servers at the problem.

I asked how they split the data across all these servers and Chris mentioned that at the moment that’s part of their secret sauce and involves professional services.

Ocient supports full ANSI SQL queries against trillion row databases and replies to those queries in a matter of seconds. And we aren’t just talking about SQL selects, Ocient can do splits, joins and updates to this trillion row database at the same time as the SQL select are going on. Chris mentioned that Ocient can be loading 100K JSON files each second, while still performing SQL queries in near real time against the trillion row database.

Ocient supports Reed-Solomon error correction on database data as well as data compression and encryption.

In addition to SQL queries, Chris mentioned that Ocient supports data load and transform activities. He said that most of this data is being generated from IoT applications and often needs to be cleaned up before it can be processed. Doing this in real time, while handling queries to the database is part of their secret sauce.

Chris said there’s probably not that many organizations that have need for trillion row databases. But ad auctions, telecom routers, financial services already use trillion row databases and they all want to be able to process queries faster on this data. Ocient is betting that there will be plenty more like this over time.

Ocient is available on AWS and GCP as a cloud service, can also be used operating in their own Ocient Cloud or can be deployed on premises. Ocient services are billed on a per core pack (500 cores, I think) subscription model.

Chris Gladwin, CEO and Co-founder, Ocient

Chris is the CEO and Co-Founder of Ocient whose mission is to provide the leading platform the world uses to transform, store, and analyze its largest datasets.

In 2004, Chris founded Cleversafe which became the largest and most strategic object storage vendor in the world (according to IDC.) He raised $100M and then led the company to over a $1.3B exit in 2015 when IBM acquired the company. The technology Cleversafe created is used by most people in the U.S. every day and generated over 1,000 patents granted or filed, creating one of the ten most powerful patent portfolios in the world.

Prior to Cleversafe, Chris was the Founding CEO of startups MusicNow and Cruise Technologies and led product strategy for Zenith Data Systems. He started his career at Lockheed Martin as a database programmer and holds an engineering degree from MIT.

June 15, 2021June 15, 2021

120: GreyBeards talk CEPH storage with Phil Straw, Co-Founder & CEO, SoftIron

GreyBeards talk universal CEPH storage solutions with Phil Straw (@SoftIronCEO), CEO of SoftIron. Phil’s been around IT and electronics technology for a long time and has gone from scuba diving electronics, to DARPA/DOD researcher, to networking, and is now doing storage. He’s also their former CTO and co-founder of the company. SoftIron make hardware storage appliances for CEPH, an open source, software defined storage system.

CEPH storage includes file (CEPHFS, POSIX), object (S3) and block (RBD, RADOS block device, Kernel/librbd) services and has been out since 2006. CEPH storage also offers redundancy, mirroring, encryption, thin provisioning, snapshots, and a host of other storage options. CEPH is available as an open source solution, downloadable at CEPH.io, but it’s also offered as a licensed option from RedHat, SUSE and others. For SoftIron, it’s bundled into their HyperDrive storage appliances. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 46:15 — 63.5MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

SoftIron uses the open source version of CEPH and incorporates this into their own, HyperDrive storage appliances, purpose built to support CEPH storage.

There are two challenges to using open source solutions:

Support is generally non-existent. Yes, the open source community behind the (CEPH) project supplies bug fixes and can possibly answer some questions but this is not considered enterprise support where customers require 7x24x365 support for a product
Useability is typically abysmal. Yes, open source systems can do anything that anyone could possibly want (if not, code it yourself), but trying to figure out how to use any of that often requires a PHD or two.

SoftIron has taken both of these on to offer a CEPH commercial product offering.

Take support, SoftIron offers enterprise level support that customers can contract for on their own, even if they don’t use SoftIron hardware. Phil said the would often get kudos for their expert support of CEPH and have often been requested to offer this as a standalone CEPH service. Needless to say their support of SoftIron appliances is also excellent.

As for ease of operations, SoftIron makes the HyperDrive Storage Manager appliance, which offers a standalone GUI, that takes the PHD out of managing CEPH. Anything one can do with the CEPH CLI can be done with SoftIron’s Storage Manager. It’s also a very popular offering with SoftIron customers. Similar to SoftIron’s CEPH support above, customers are requesting that their Storage Manager be offered as a standalone solution for CEPH users as well.

HyperDrive hardware appliances are storage media boxes that offer extremely low-power storage for CEPH. Their appliances range from high density (120TB/1U) to high performance NVMe SSDs (26TB/1U) to just about everything in between. On their website, I count 8 different storage appliance offerings with various spinning disk, hybrid (disk-SSD), SATA and NVMe SSDs (SSD only) systems.

SoftIron designs, develops and manufacturers all their own appliance hardware. Manufacturing is entirely in the US and design and development takes place in the US and Europe only. This provides a secure provenance for HyperDrive appliances that other storage companies can only dream about. Defense, intelligence and other security conscious organizations/industries are increasingly concerned about where electronic systems come from and want assurances that there are no security compromises inside them. SoftIron puts this concern to rest.

Yes they use CPUs, DRAMs and other standardized chips as well as storage media manufactured by others, but SoftIron has have gone out of their way to source all of these other parts and media from secure, trusted suppliers.

All other major storage companies use storage servers, shelves and media that come from anywhere, usually sourced from manufacturers anywhere in the world.

Moreover, such off the shelf hardware usually comes with added hardware that increases cost and complexity, such as graphics memory/interfaces, Cables, over configured power supplies, etc., but aren’t required for storage. Phil mentioned that each HyperDrive appliance has been reduced to just what’s required to support their CEPH storage appliance.

Each appliance has 6Tbps network that connects all the components, which means no cabling in the box. Also, each storage appliance has CPUs matched to its performance requirements, for low performance appliances – ARM cores, for high performance appliances – AMD EPYC CPUs. All HyperDrive appliances support wire speed IO, i.e, if a box is configured to support 1GbE or 100GbE, it transfers data at that speed, across all ports connected to it.

Because of their minimalist hardware design approach, HyperDrive appliances run much cooler and use less power than other storage appliances. They only consume 100W or 200W for high performance storage per appliance, where most other storage systems come in at around 1500W or more.

In fact, SoftIron HyperDrive boxes run so cold, that they don’t need fans for CPUs, they just redirect air flom from storage media over CPUs. And running colder, improves reliability of disk and SSD drives. Phil said they are seeing field results that are 2X better reliability than the drives normally see in the field.

They also offer a HyperDrive Storage Router that provides a NFS/SMB/iSCSI gateway to CEPH. With their Storage Router, customers using VMware, HyperV and other systems that depend on NFS/SMB/iSCSI for storage can just plug and play with SoftIron CEPH storage. With the Storage Router, the only storage interface HyperDrive appliances can’t support is FC.

Although we didn’t discuss this on the podcast, in addition to HyperDrive CEPH storage appliances, SoftIron also provides HyperCast, transcoding hardware designed for real time transcoding of one or more video streams and HyperSwitch networking hardware, which supplies a secure provenance, SONiC (Software for Open Networking in [the Azure] Cloud) SDN switch for 1GbE up to 100GbE networks.

Standing up PB of (CEPH) storage should always be this easy.

Phil Straw, Co-founder & CEO SoftIron

The technical visionary co-founder behind SoftIron, Phil Straw initially served as the company’s CTO before stepping into the role as CEO.

Previously Phil served as CEO of Heliox Technologies, co-founder and CTO of dotFX, VP of Engineering at Securify and worked in both technical and product roles at both Cisco and 3Com.

Phil holds a degree in Computer Science from UMIST.

	174: GreyBeards talk… on 174: GreyBeards talk SDN chips…
	Greybeards talk doma… on 172: Greybeards talk domain sp…
	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…