146: GreyBeards talk K8s cloud storage with Brian Carmody, Field CTO, Volumez

We’ve known Brian Carmody (@initzero), Field CTO, Volumez for over a decade now and he’s always been very technically astute. He moved to Volumez earlier this year and has once again joined a storage startup. Volumez is a cloud K8s storage provider with a new twist, K8s persistent volumes hosted on ephemeral storage.

Volumes currently works in public clouds (AWS & Azure( soft launch), with GCP coming soon) and is all about supplying high performing, enterprise class data services to K8s container apps. But doing this using transient (Azure ephemeral &AWS instance) storage and standard Linux. Hyperscalers offer transient storage as almost an afterthought with customer compute instances. Listen to the podcast to learn more.

It turns out that over the last decade or so, there has been a lot of time and effort devoted to maturing Linux’s storage stack and nowadays, with appropriate configuration, Linux can offer enterprise class data services and performance using direct attached NVMe SSDs. These services include thin provisioning, encryption, RAID/erasure coding, snapshots, etc., which on top of NVMe SSDs, provide IOPS, bandwidth and latency performance that boggles the mind.

However, configuring Linux sophisticated and high performing data services is a hard problem to solve..

Enter Volumez, they have a SaaS control plane, client software plus CSI drivers that will configure Linux with ephemeral storage to support any performance and data service that can be obtained from NVMe SSDs.

Once installed on your K8s cluster, Volumez software profiles all ephemeral storage, and supplies that information to their SaaS control plane. Once that’s done your platform engineers can define specific storage class policies or profiles useable by DevOps to consume ephemeral storage. .

These policies identify volume [IOPs, Bandwidth, Latency] X [read, write] performance specifications as well as data protection, resiliency and other data service requirements. DevOps engineers consume this storage using PVCs that call for these storage classes at some capacity. When it sees the PVC claim, Volumez SaaS control plane will carve out slices of ephemeral storage that can support the performance and other storage requirements defined in the storage class.

Once that’s done, their control plane next creates a network path from the compute instances with ephemeral storage to the worker nodes running container apps. After that it steps out of the picture and the container apps have a direct (network) data path to the storage they requested. Note, Volumez’s SaaS control plane is not in the container app storage data path at all.

Volumez supports multi-AZ data resiliency for PVCs. In this case, another mirror K8s cluster would reside in another AZ, with Volumez software active and similar if not equivalent ephemeral storage. Volumez will configure the container volume to mirror data between AZs. Similarly, if the policy requests erasure coding, Volumez SaaS software configures the ephemeral storage to provide erasure coding for that container volume.

Brian said they’ve done some amazing work to increase the speed of Linux snapshotting and restoring.

As noted above, the Volumez control plane SaaS software is outside the data path, so even if the K8s cluster running Volumez enabled storage loses access to the control plane, container apps continue to run and perform IO to their storage. This can continue until there’s a new PVC request that requires access to their control plane.

Ephemeral storage is accessed through special compute instances. These are not K8s worker nodes and they essentially act as a passthru or network attachment between worker nodes running apps with PVC’s and the Volumez configured Linux Logical Volumes hosted on slices of ephemeral storage.

Volumez is gaining customer traction with data platform clients, DBaaS companies, and some HPC environments. But just about anyone needing high performing data services for cloud K8s container apps should give Volumez a try.

I looked at AWS to see how they price instance store capacities and found out it’s not priced separately, but rather instance storage is bundled into the cost of EC2 compute instances.

Volumez is priced based on the number of media devices (instance/ephemeral stores) and performance (IOPs) available. They also have different tiers depending on support level requirements (e.g., community, Business hrs, 7X24) which also offers different levels of enterprise security functionality.

Brian said they have a free tier that customers can easily signup for and try out by going to their web site (see link above), or if you would like a guided demo, just contact him directly.

Brian Carmody, Field CTO, Volumez

Brian Carmody is Field CTO at Volumez. Prior to joining Volumez, he served as Chief Technology Officer of data storage company Infinidat where he drove the company’s technology vision and strategy as it ramped from pre-revenue to market leadership.

Before joining Infinidat, Brian worked in the Systems and Technology Group at IBM where he held senior roles in product management and solutions engineering focusing on distributed storage system technologies.

Prior to IBM, Brian served as a technology executive at MTV Networks Viacom, and at Novus Consulting Group as a Principal in the Media & Entertainment and Banking practices.

142: GreyBeards talk scale-out, software defined storage with Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Software defined storage is a pretty full segment of the market these days. So, it’s surprising when a new entrant comes along. We saw a story on Quobyte in Blocks and Files and thought it would be great to talk with Bjorn Kolbeck (LinkedIn), Co-Founder & CEO, Quobyte. Bjorn got his PhD in scale out storage and went to work at Google on anything but storage. While there, he was amazed by Goodle’s vast infrastructure being managed by only a few people and thought this could should be commercialized, so Quobyte was born. Listen to the podcast to learn more.

Quobyte is a scale out file and object storage system with mirrored metadata and data which is 3-way mirrored or erasure coded (EC). Minimum cluster is 4 nodes (fault tolerant for a single node failure.). Quobyte has current customers with ~250 nodes and ~20K clients accessing a storage cluster.

Although they support NFSv3 and NFSv4 for file (and object) access, their solution is typically deployed using host client and storage services software accessing the files with Posix or objects via S3. Objects can also be accessed as file within the file system directories.

Host client software runs on Linux, Mac or Windows machines. Storage server software runs on Linux systems bare metal or under VMs in user space. Quobyte also support containerized storage server software for K8s but their bare metal/VM storage server software option doesn’t require containers.

Quobyte is also available in the GCP marketplace and can run in AWS, Azure and Oracle Cloud.

Their metadata service is a mirrored key-value store distributed across any number of (customer configured, I believe) storage nodes. Metadata resides on flash and distribution is designed to eliminate the metadata service as a performance bottleneck.

Their data services supports (any number of) storage tiers. Storage policies determine how tiering is used for files, directories, objects, etc. For example, with 3 tiers (NVMe Flash, SSD, and disk), file data could be first landed on NVMe Flash, but as it grows, it gets moved off to SSD, and as it grows, even more, it’s moved to disk. This could also be triggered using time since last access.

Bjorn said anything in file system metadata could be used to trigger data movement across tiers. Each tier could be defined with different data protection policies, like mirroring or EC 8+3.

Backend storage is split up into Volumes. They also support thinly provisioned volumes for file creation.

Unclear how tiering and thin provisioning applies to objects with much richer metadata options but as they can be mapped to files, we suppose that anything in the object file metadata could conceivably used to trigger tiering as a bare minimum.

As for security, 

  1. Quobyte supports end to end data encryption. This is done once and the customer owns the keys. They do support external key servers.  I believe this is another option that is enabled by file based policy management. It seems like different files can have different keys to encrypt them.
  2. Quobyte supports TLS. Depending on customer requirements data may go across open networks and this is where TLS could very well be used. And Quobyte supports user X.509 certificates for users, devices and systems authentication. 
  3. Quobyte supports file access controls. They support a subset of Windows capabilities but have full support for Linux and Mac access controls.

Quobyte also supports two forms of cluster to cluster replication. One is event driven where event occurrence (i.e. file close) signals data replication and another which is time driven (i.e., every 5 minutes) but both are asynchronous.

Quobyte was designed from the start to be completely API driven. But they do support CLI and a GUI for those customers that want them. 

They have a Free (forever) edition, a downloadable version of the software without 24/7 support and minus some enterprise capabilities (think encryption). This is gated at 150TB disk/30TB flash with limited number of clients and volumes.

The Infrastructure edition is their full featured solution with 7/24 enterprise support. It’s comes with a yearly service fee, priced by capacity with volume discounts.

Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Bjorn Kolbeck, Co-Founder and CEO of Quobyte attended the Technical University of Berlin and Humboldt University of Berlin.

His PhD thesis dealt with fault-tolerant replication, but he gained several years’ experience in distributed and storage systems while developing the distributed research file system XtreemFS at the Zuse Institute Berlin.

He then spent time at Google working as a Software Engineer before he and fellow Co-Founder Felix Hupfield decided to combine the innovative research from XtreemFS and the operations experience from Google to build a highly reliable and scalable enterprise-grade storage system now known as Quobyte.

141: GreyBeards annual 2022 wrap-up podcast

Well it has been another year and time for our annual year end wrap up. Since Covid hit, every year has certainly been interesting. This year we have seen the start of back in person conferences which was a welcome change from the covid lockdown. We are very glad to start seeing everybody again.

From the tech standpoint, the big news this year was CXL. As everyone should recall, CXL is a new-ish PCIe hardware and protocol that supports larger memory sitting out on a PCIe bus and in the future shared memory between servers. All this is to enable a new wave of memory based computing. We spent probably half our time discussing CXL and it’s impact on IT.

The other major topic was the Cloud Native ecosystem. In the past all we talked about was K8s but nowadays the ecosystem that surrounds it is almost as important as K8s itself. The final topic was a bit of a shock earlier this year and yes it was the Broadcom’s acquisition of VMware. Jason and I spend our Explore podcast talking about it (see our 137: VMware Explore wrap-up). Keith has high hopes that the EU will shut it down but the jury’s still out on that one. Listen to the podcast to learn more.

As for CXL, it turns out that AMD have just released full support for CXL hardware and protocols with their latest round of CPU chips. But the new AMD CPUs only support DDR5 memory, (something about there’s only so much logic one can fit on a chip…) which means all those DDR4 DIMs out in the wild need somewhere to land. CXL could supply a new lease on life for DDR4 DIMs.

And it’s not just about shared memory or increased memory sizes, CXL can also provide a tiered memory hierarchy, with gobs of flash behind memory DIMs (see: 136: FMS2022 wrap up …) So, now its no longer a TB or ten of server memory but potentially 100s of TBs. What this means for SAP HANNA, AWS Aurora and other heavy-memory solutions has yet to play out.

Cloud Native won. We see this in the increasing adoption of containers and K8s in the enterprise, cloud and just about anywhere IT happens these days. But the ecosystem surrounding K8s is chaos.

Over time, many of these ecosystem solutions will die off, be purchased, or consolidated but in the mean time, it’s entirely too confusing. Red Hat’s OpenShift is one answer and VMware’s Tanzu is another. And of course all the clouds have their own K8s packaged solution. But just to cover their bets, everyone also supports native K8s and just about every software package that works with it. So, K8s’s ecosystem is in a state of flux and may take time to become a stable set of tools useable by the enterprise IT.

Finally, Broadcom’s acquisition of VMware has everyone up in arms. Customers are concerned the R&D juggernaut that VMware has been, since its very beginning, will be jettisoned in favor of profits. And HCI vendors that always felt Dell EMC had an unfair advantage will all look at Broadcom in a similar light.

Keith says there’s a major difference in how USA regulators view an acquisition and how EU regulators view one. According to Keith, EU views acquisitions in how they help or hurt the customer. USA regulators view acquisitions on show they help or hurt the competition. Will have to wait and see how this all plays for Broadcom-VMware.

On the other hand, speaking of competition, Nutanix seems to be feeling the heat as well. Rumors are it’s up for sale. Who will want it and how the regulators view both of these acquisitions may be as interesting story for 2023

2023 looks to be another year of transition for enterprise IT. The cloud players all seem to be coming around to the view that they can’t be all things to all (IT) people. And the enterprise vendors are finally seeing some modicum of staying power in the face of a relentless push to the cloud. How this plays out over the next few years will be of major interest to everybody.

Happy New Year from the GreyBeards!

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Jason Collier, Principal Member of Technical Staff, AMD

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology. He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years. He’s on LinkedIN.

137: GreyBeards talk VMware Explore 2022 Wrap-up

Jason Collier Principle Member of Technical Staff, AMD (@bocanuts), a current GreyBeardsOnStorage co-host and I both attended VMware Explore 2022 this past week and we recorded a podcast discussing VMware’s announcements on the show floor. It turns out that Keith Townsend, TheCTOAdvisor (@thectoadvisor) had brought his Airstream &studio and was exhibiting on the show floor. Keith kindly offered the use of his studio to record the podcast.

This one is a video. Let us know what you think. I clearly need a cowboy hat and Jason said (off camera) that I’m showing more grey in my beard than before. I take that as a compliment here.

Here’s the news as we saw it:

  • vSphere 8 – has a number of new features but the ones we thought important were the GA of Project Monterey. This supports new DPUs that now run ESXi out board from the CPU. They are able to offload lot’s of the CPU networking cycles to the DPU freeing up these for other (more important) work. vSphere 8 supports 2 DPUs now, the NVIDIA (Mellanox) BlueField(-2?) DPU and the AMD (Pensando) DPU. AMD recently purchased Pensando and Jason seemed to know an awful lot about this tech. VMware also announced support for concurrent ESXi upgrades which can now allow upgrading ESXi running in DPUs while hosts and clusters continue to operate. Finally, the other item of interest was vSphere is now more API driven. I guess it’s only a matter of time before all VMware functionality is API driven to make it even more cloud-like
  • vSAN 8 – also has a number of new features. The first we discussed was is a faster data path. This means more IOPS, more bandwidth and lower latency for IOs. Next, vSAN 8 now supports single tier storage pools . These will no longer require a caching layer. This should also speed up IO operations (as long as the single tier is at least as fast as the old caching layer). They also announced faster snapshots. Apparently this has been a problem in the past and they’ve done the work to speed this up considerably. Jason mentioned an AMD open source VM migration tool (from somebody else’s X86 CPUs to AMDs) that depends a lot on vSAN snapshots.
  • Cloud Flex Storage – mentioned at the show but not well explained, Jason and I speculated that this was an internal storage service available on for Cloud Foundation users on AWS where customers could subscribe to storage as-a-service in much lower increments (maybe even GB/month) than standing up more vSAN hosts to increase storage.
  • NetApp FsX (ONTAP) storage – along the same line, VMware announced support for NetApp’s FsX as yet another storage option for Cloud Foundation users on AWS. Supplying yet another storage-as-a-service option for this environment.
  • Cloud Flex Compute – also mentioned at the show was their new Compute-As-A-Service for Cloud Foundation users on AWS. This way users could subscribe to more or less compute, on an as needed basis rather than having to spin up new ESXi hosts. I later found out this allows users to run a single VM and pay for it on a subscription basis.
  • Tanzu Application Platform (TAP) – is a new VMware supplied (and supported) “development experience” for K8s on vSphere. Note, it doesn’t include any advanced Tanzu services such as Tanzu K8s Grid (TKG) so it’s a true DevOps bare bones environment.
  • Tanzu K8S Operations (TKO) – another new Tanzu based service which offers operations complete control over the Tanzu services running on vSphere. Note Tanzu Mission Control (TMC) is not part of TKO.
  • Aria management – VMware rebranded vRealize and CloudHealth, which now comes in 3 bundles, Aria Cost (CloudHealth+), Aria Operations and Aria Automation. Which are all built onto of Aria Graph that graphs all the nodes in your VMware clusters with all their connections so that Aria management can traverse this graph to find out what’s where. On top of Aria Graph are Aria Hub, Aria Insights, and Aria Guardrails (sort of like providing boundary’s where services can be deployed).

They also announced Ransomware Recovery [changed 7Sep22, the Eds] as a Service which builds on VMware’s DR-aaS announced last year and Tanzu now works with Red Hat OpenShift

We also discussed the show. I heard somewhere there were 10K people there, Jason heard somewhere between 6K and 9K. In any case much smaller than VMworlds prior to Covid (25kish). And of course the rebranding of the show seemed counter-intuitive at best.

The show floor was much smaller than usual, (not withstanding Keith’s Airstream RV exhibit). And there were a number of storage vendors not at the show?? There was less hardware on the show floor, this could be a Covid thing but there were just as many mini-white boards/class rooms per large exhibiter, so don’t think it was because of Covid.

But the elephant in the room was Broadcom’s acquisition of VMware. At one of the analyst briefings I asked an exec about attrition. He made a couple of comments but in the end said VMware has been bought and sold before and has always come out of it in better shape. This will be no different.

That’s about all from the show.

And Thanks again to Keith and his crew, for lending us his studio to record the show. It’s been a while since I’ve seen an RV on a show floor. Keith seemed to have a ball with it

Tell us how you like our video. If everyone is for it we could do something like this with a Zoom (in this case Zencastr) recording, Or just try this at the next joint conference. .

Jason Collier, Principle Member of Technical Staff at AMD

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology.

He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years.

He’s on LinkedIN. He’s currently working with AMD on new technology and he has been a GreyBeards on Storage co-host since the beginning of 2022

134: GreyBeards talk (storage) standards with Dr. J Metz, SNIA Chair & Technical Director AMD

We have known Dr. J Metz (@drjmetz, blog), Chair of SNIA (Storage Networking Industry Association) BoD, for over a decade now and he has always been an intelligent industry evangelist. DrJ was elected Chair of SNIA BoD in 2020.

SNIA has been instrumental in the evolution of storage over the years working to help define storage networking, storage form factors, storage protocols, etc. Over the years it’s been crucial to the high adoption of storage systems in the enterprise and still is.. Listen to the podcast to learn more.

SNIA started out helping to define and foster storage networking before people even knew what it was. They were early proponents of plugfests to verify/validate compatibility of all the hardware, software and systems in a storage network solution.

One principal that SNIA has upheld, since the very beginning, is strict vendor and technology neutrality. SNIA goes out of it’s way to insure that all their publications,  media and technical working group (TWGs) committees maintain strict vendors and technology neutrality.

The challenge with any evolving technology arena is that new capabilities come and go with a regular cadence and one cannot promote one without impacting another. Ditto for vendors, although vendors seem to stick around a bit longer.

One SNIA artifact that has stood well the test of time is the SNIA dictionary.  Free to download and free copies available at every conference that SNIA attends. The dictionary covers just about every relevant acronym, buzzword and technology present in the storage networking industry today as well as across its long history.

SNIA also presents and pushes the storage networking point of view at every technical alliance in the IT industry. .

In addition, SNIA holds storage conferences around the world, as well as plugfests and  hackathons focused on the needs of the storage industry. Their Storage Developer Conference (SDC), coming up in September in the USA, is a highly technical conference specifically targeted at storage system developers. 

SDC presenters include many technology inventors driving the leading edge of storage (and memory, see below) industries. So, if you are developing storage systems, SDC is a must attend conference.

As for plugfests, SNIA has held FC storage networking plugfests over the years which have been instrumental in helping storage networking adoption.

We also talked about SNIA hackathons. Apparently a decade or so back, SNIA held a hackathon on SMB (the file protocol formerly known as CIFS) where most of the industry experts and partners doing work on SAMBA (open source SMB implementation) and SMB proprietary software were present.

At the time, Jason was working for another company, developing an SMB protocol. While attending the hackathon, Jason found that he was able to develop 1-1 relationships with many of the lead SMB/SAMBA developers and was able to solve problems in days that would have taken months before.

SNIA also has technology alliances with just about every other standards body involved in IT infrastructure, software and hardware today. As an indicator of where they are headed, SNIA recently joined with CNCF (Cloud Native Computing Foundation) to push for better storage under K8s.

SNIA has TWGs focused on technological areas that impact storage access. One TWG that has been going on now, for a long time, is Swordfish, an extension to the DMTF Redfish that focuses on managing storage.

Swordfish has struggled over the years to achieve industry adoption. We spent time discussing some of the issues with Swordfish, but honestly,  IMHO, it may be too late to change course.

Given the recent SNIA alliance with CNCF, we started discussing the state of storage under K8s and containers. DrJ and Jason mentioned that storage access under K8s goes through so many layers of abstraction that IO performance is almost smothered in overhead. The thinking at SNIA is we need to come up with a better API that bypasses all this software overhead to  directly access hardware.

 SNIA’s been working on SDXI (Smart Data Acceleration Interface), a new hardware memory to memory, direct path protocol. Apparently, this is a new byte level, (storage?) protocol for moving data between memories. I believe SDXI assumes that at least one memory device is shared. The other could be in a storage server, smartNIC, GPU, server, etc. If SDXI were running in your shared memory and server, one could use the API to strip away all of the software abstraction layers that have built up over the years to accessi shared memory at near hardware speeds

DrJ mentioned was NVMe as another protocol that strips away software abstractions to allow direct access to (storage) hardware. The performance of Optane and SSDs (and it turns out disks) was being smothered by SCSI device protocols/abstrations that were the only way to talk to storage devices in the past. But NVM and NVMe came along, and stripped all the non-essential abstractions and protocol overhead away and all of a sudden sub 100 microsecond IO’s were possible. 

Dr. J Metz,  SNIA Chair & Technical Director AMD

J is the Chair of SNIA’s (Storage Networking Industry Association) Board of Directors and Technical Director for Systems Design for AMD where he works to coordinate and lead strategy on various industry initiatives related to systems architecture. Recognized as a leading storage networking expert, J is an evangelist for all storage-related technology and has a unique ability to dissect and explain complex concepts and strategies. He is passionate about the innerworkings and application of emerging technologies.

J has previously held roles in both startups and Fortune 100 companies as a Field CTO,  R&D Engineer, Solutions Architect, and Systems Engineer. He has been a leader in several key industry standards groups, sitting on the Board of Directors for the SNIA, Fibre Channel Industry Association (FCIA), and Non-Volatile Memory Express (NVMe). A popular blogger and active on Twitter, his areas of expertise include NVMe, SANs, Fibre Channel, and computational storage.

J is an entertaining presenter and prolific writer. He has won multiple awards as a speaker and author, writing over 300 articles and giving presentations and webinars attended by over 10,000 people. He earned his PhD from the University of Georgia.