Software defined storage – Grey Beards on Systems

November 4, 2024November 4, 2024

167: GreyBeards talk Distributed S3 storage with Enrico Signoretti, VP Product & Partnerships, Cubbit

Long time friend, Enrico Signoretti (LinkedIn), VP Product and Partnerships, Cubbit, used to be a common participant at Storage Field Day (SFD) events and I’ve known him since we first met there. Since then, he’s worked for a startup and a prominent analyst firms. But he’s back at another startup and this one looks like it’s got legs.

Cubbit offers Distributed S3 compatible object storage that offers geo-distribution and geo-fencing for object data, in which the organization owns the hardware and Cubbit supplies the software. There’s a management component, the Coordinator, which could run on your hardware or as a SaaS service they provide but other than that, IT controls the rest of the system hardware. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 52:27 — 72.0MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Cubbit comes in 3 components:

One or more Storage nodes which includes their agent software running ontop of a linux system with direct attached storage.
One or more Gateway nodes which provides S3 protocol acces to the objects stored on storage nodes. Typical S3 access points https://S3.company_name, com/… points to either a load balancer, front end or one or more Gateway nodes. Gateway nodes provide the mapping between the bucket name/object identifier and where the data currently resides or will reside.
One Coordinator node which provides the metadata to locate the data for objects, manage the storage nodes, gateways and monitor the service. The Coordinator node can be a SaaS service supplied by Cubbit or a VM/bare metal node running Cubbit Coordinator software. Metadata is protected internally within the Coordinator node.

With these three components one can stand up a complete, geo-distributed/geo-fenced, S3 object storage system which the organization controls.

Cubbit encrypts data as it at the gateway and decrypts data when accessed. Sign-on to the system uses standard security offerings. Security keys can be managed by Cubbit or by standard key management systems.

All data for an object is protected by nested erasure codes. That is 1) erasure code within a data center/location over its storage drives and 2) erasure code across geographical locations/data centers..

With erasure coding across locations, customer with say 10 data center locations can have their data stored in such a fashion that as long as at least 8 data centers are online they still have access to their data, that is the Cubbit storage system can still provide data availability.

Similarly for erasure coding within the data center/location or across storage drives, say with 12 drives per stripe, one could configure lets say 9+3 erasure coding, where as long as 9 of the drives still operate, data will be available.

Please note the customer decides the number of locations to stripe across for erasure coding, and diet for the number of storage drives.

The customer supplies all the storage node hardware. Some customers start with re-purposed servers/drives for their original configuration and then upgrade to higher performing storage-servers-networking as performance needs change. Storage nodes can be on prem, in the cloud or at the edge.

For adequate performance gateways and storage nodes (and coordinator nodes) should be located close to one another. Although Coordinator nodes are not in the data path they are critical to initial object access.

Gateways can provide a cache for faster local data access.. Cubbit has recommendations for Gateway server hardware. And similar to storage nodes, Gateways can operate at the edge, in the cloud or on prem.

Use cases for the Distributed S3 storage include:

As a backup target for data elsewhere
As a geographically distributed/fenced object store.
As a locally controlled object storage to feed AI training/inferencing activity.

Most backup solutions support S3 object storage as a target for backups.

Geographically distributed S3 storage means that customers control where object data is located. This could be split across a number of physical locations, the cloud or at the edge.

Geographically fenced S3 storage means that the customer controls which of its many locations to store an object. For GDPR countries with multi-nation data center locations this could provide the compliance requirements to keep customer data within country.

Cubbit’s distributed S3 objects storage is strongly consistent in that an object loaded into the system at any location is immediately available to any user accessing it through any other gateway. Access times vary but the data will be the same regardless of where you access it from.

The system starts up through an Ansible playbook which asks a bunch of questions and loads and sets up the agent software for storage nodes, gateway nodes and where applicable, the coordinator node.

At any time, customers can add more gateways or storage nodes or retire them. The system doesn’t perform automatic load balancing for new nodes but customers can migrate data off storage nodes and onto other ones through api calls/UI requests to the Coordinator.

Cubbit storage supports multi-tenancy, so MSPs can offer their customers isolated access.

Cubbit charges for their service on data storage under management. Note it has no egress charges, and you don’t pay for redundancy. But you do supply all the hardware used by the system. They offer a discount for M&E customers as the metadata to data ratio is much smaller (lots of large files) than most other S3 object stores (mix of small and large files).

Cubbit is presently available only in Europe but will be coming to USA next year. So, if you are interested in geo-distributed/geo-fenced S3 object storage that you control and can be had for much cheaper than hyperscalar object storage, check it out.

Enrico Signoretti, VP Products & Partnerships

Enrico Signoretti has over 30 years of experience in the IT industry, having held various roles including IT manager, consultant, head of product strategy, IT analyst, and advisor.

He is an internationally renowned visionary author, blogger, and speaker on next-generation technologies. Over the past four years, Enrico has kept his finger on the pulse of the evolving storage industry as the Head of Research Product Strategy at GigaOm. He has worked closely and built relationships with top visionaries, CTOs, and IT decision makers worldwide.

Enrico has also contributed to leading global online sites (with over 40 million readers) for enterprise technology news.

December 7, 2023December 6, 2023

158: GreyBeards talk software defined storage with Brian Dean, Tech. Mkt., Dell PowerFlex

Brian Dean, Dell PowerFlex Technical Marketing

Brian is a 16+ year veteran of the technology industry, and before that spent a decade in higher education. Brian has worked at EMC and Dell for 7 years, first as Solutions Architect and then as TME, focusing primarily on PowerFlex and software-defined storage ecosystems.

Prior to joining EMC, Brian was on the consumer/buyer side of large storage systems, directing operations for two Internet-based digital video surveillance startups.

When he’s not wrestling with computer systems, he might be found hiking and climbing in the mountains of North Carolina.

May 18, 2023May 18, 2023

148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex

Sponsored By:

This is one of a series of podcasts the GreyBeards are doing with Dell PowerFlex software defined infrastructure. Today, we talked with Anthony Cinelli, Sr. Director Dell Technologies and Brian Dean, Technical Marketing for PowerFlex. We have talked with Brian before but this is the first time we’ve met Anthony. They were both very knowledgeable about PowerFlex and the challenges large enterprises have today with their storage environments.

The key to PowerFlex’s software defined solution is its extreme flexibility, which comes mainly from its architecture which offers scale-out deployment options ranging from HCI solutions to a fully disaggregated compute-storage environment, in seemingly any combination (see technical resources for more info). With this sophistication, PowerFlex can help consolidate enterprise storage across just about any environment from virtualized workloads, to standalone databases, big data analytics, as well as containerized environments and of course, the cloud. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 1:02:00 — 85.1MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

To support this extreme flexibility, PowerFlex uses both client and storage software that can be configured together on a server (HCI) or apart, across compute and storage nodes to offer block storage. PowerFlex client software runs on any modern bare-metal or virtualized environment.

Anthony mentioned that one common problem to enterprises today is storage sprawl. Most large customers have an IT environment with sizable hypervisor based workloads, a dedicated database workload, a big data/analytics workload, a modern container based workload stack, an AI/ML/DL workload and more often than not, a vertical specific workload.

Each workload usually has their own storage system. And the problem with 4-7 different storage systems is cost, e.g., cost of underutilized storage. Typical to these environments, each storage system could be used at say, 60% utilization on average, but this will vary a lot between silos, leading to stranded capacity.

The main reason customers haven’t consolidated yet is because each silo has different performance characteristics. As a result, they end up purchasing excess capacity which increases cost and complexity, as a standard part of doing business.

To consolidate storage across these disparate environments requires a no-holds barred approach to IO performance, second to none, which PowerFlex can deliver. The secret to to its high levels of IO performance is RAID 10, deployed across a scale-out cluster. And PowerFlex clusters can range from 4 to 1000 or more nodes.

RAIID 10 mirrors data and spreads mirrored data across all drives and servers in a cluster or some subset. As a result, as you add storage nodes, IO performance scales up, almost linearly.

Yes, there can be other bottlenecks in clusters like this, most often networking, but with PowerFlex storage, IO need not be one of them. Anthony mentioned that PowerFlex will perform as fast as your infrastructure will support. So if your environment has 25 Gig Ethernet, it will perform IO at that speed, if you use 100 Gig Ethernet, it will perform at that speed.

In addition, PowerFlex offers automated LifeCycle Management (LCM), which can make having a 1000 node PowerFlex cluster almost as easy as a 10 node cluster. However to make use this automated LCM, one must run its storage server software on Dell PowerEdge servers.

Brian said adding or decommissioning PowerFlex nodes is a painless process. Because data is always mirrored, customers can remove any node, at any time and PowerFlex will automatically rebuild data across other nodes and drives. When you add nodes, those drives become immediately available to support more IO activity. Another item to note, because of RAID 10, PowerFlex mirror rebuilds happen very fast, as just about every other drive and node in the cluster (or subset) participates in the rebuild process.

PowerFlex supports Storage Pools. This partitions PowerFlex storage nodes and devices into multiple pools of storage used to host volume IO and data Storage pools can be used to segregate higher performing storage nodes from lower performing ones so that some volumes can exclusively reside on higher (or lower) performing hardware.

Although customers can configure PowerFlex to use all nodes and drives in a system or storage pool for volume data mirroring, PowerFlex offers other data placement alternatives to support high availability.

PowerFlex supports Protection Domains which are subsets or collections of storage servers and drives in a cluster where volume data will reside. This will allow one protection domain to go down while others continue to operate. Realize that because volume data is mirrored across all devices in a protection domain, it will take lots of nodes or devices to go down before a protection domain is out of action.

PowerFlex also uses Fault Sets, which are a collection of storage servers and their devices within a Protection Domain, that will contain one half of a volume’s data mirror. PowerFlex will insure that a primary and its mirror copy of volume’s data will not both reside on the same fault set. A fault set could be a rack of servers, multiple racks, all PowerFlex storage servers in an AZ, etc. With fault sets, customer data will always reside across a minimum of two fault sets, and if any one goes down, data is still available.

PowerFlex also operates in the cloud. In this case, customers bring their own PowerFlex software and deploy it over cloud compute and storage.

Brian mentioned that anything PowerFlex can do such as reconfiguring servers, can be done through RESTful/API calls. This can be particularly useful in cloud deployments as above, if customers want to scale up or down IO performance automatically.

Besides block services, PowerFlex also offers NFS/CIFS-SMB native file services using a File Node Controller. This frontends PowerFlex storage nodes to support customer NFS/SMB file access to PowerFlex data.

Anthony Cinelli, Sr. Director Global PowerFlex Software Defined & MultiCloud Solutions

Anthony Cinelli is a key leader for Dell Technologies helping drive the success of our software defined and multicloud solutions portfolio across the customer landscape. Anthony has been with Dell for 13 years and in that time has helped launch our HCI and Software Defined businesses from startup to the multi-billion dollar lines of business they now represent for Dell.

Anthony has a wealth of experience helping some of the largest organizations in the world achieve their IT transformation and multicloud initiatives through the use of software defined technologies.

Brian Dean, Dell PowerFlex Technical Marketing

Prior to joining EMC, Brian was on the consumer/buyer side of large storage systems, directing operations for two Internet-based digital video surveillance startups.

When he’s not wrestling with computer systems, he might be found hiking and climbing in the mountains of North Carolina.

December 7, 2021December 7, 2021

126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Keith and I had an interesting discussion with Alex Chircop (@chira001), CEO of Ondat, a kubernetes storage provider. They have a high performing system, laser focused on providing storage for k8s stateful container applications. Their storage is entirely containerized and has a number of advanced features for data availability, performance and security that developers need the run stateful container apps. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 48:39 — 66.8MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Ondat-21-12-transcript Download

We started by asking Alex how Ondats different from all the other k8s storage solutions out there today (which we’ve been talking with lately). He mentioned three crucial capabilities:

Ondat was developed from the ground up to run as k8s containers. Doing this would allow any k8s distribution to run their storage to support stateful container apps. .
Ondat was designed to allow developers to run any possible container app. Ondat supports both block as well as file storage volumes.
Ondat provides consistent, superior performance, at scale, with no compromises. Sophisticated data placement insures that data is located where it is consumed and their highly optimized data path provides low-latency access that data storage.

Ondat creates a data mesh (storage pool) out of all storage cluster nodes. Container volumes are carved out of this data mesh and at creation time, data and the apps that use them are co-located on the same cluster nodes.

At volume creation, Dev can specify the number of replicas (mirrors) to be maintained by the system. Alex mentioned that Ondat uses synchronous replication between replica clusters nodes to make sure that all active replica’s are up to date with the last IO that occurred to primary storage.

Ondat compresses all data that goes over the network as well as encrypts data in flight. Dev can easily specify that the data-at-rest also be compressed and/or encrypted. Compressing data in flight helps supply consistent performance where networks are shared.

Alex also mentioned that they support both the 1 reader/writer, k8s block storage volumes as well as multi-reader/multi-writer, k8s file storage volumes for containers.

In Ondat each storage volume includes a mini-brain used to determine primary and replica data placement. Ondat also uses desegregated consensus to decide what happens to primary and replica data after a k8s split cluster occurs. After a split cluster, isolated replica’s are invalidated and replicas are recreate, where possible, in the surviving nodes of the cluster portion that holds the primary copy of the data.

Also replica’s can optionally be located across AZs if available in your k8s cluster. Ondat doesn’t currentlysupport replication across k8s clusters.

Ondat storage works on any hyperscaler k8s solution as well as any onprem k8s system. I asked if Ondat supports VMware TKG and Alex said yes but when pushed mentioned that they have not tested it yet.

Keith asked what happens when things go south, i.e., an application starts to suffer worse performance. Alex said that Ondat supplies system telemetry to k8s logging systems which can be used to understand what’s going on. But he also mentioned they are working on a cloud based, Management-aaS offering, to provide multi-cluster operational views of Ondat storage in operation to help understand, isolate and fix problems like this.

Keith mentioned he had attended a talk by Google engineers that developed kubernetes and they said stateful containers don’t belong under kubernetes. So why are stateful containers becoming so ubiquitous now.

Alex said that may have been the case originally but k8s has come a long way from then and nowadays as many enterprises shift left enterprise applications from their old system environment to run as containers they all require state for processing. Having that stateful information or storage volumes accessible directly under k8s makes application re-implementation much easier.

What’s a typical Ondat configuration? Alex said there doesn’t appear to be one. Current Ondat deployments range from a few 100 to 1000s of k8s cluster nodes and 10 to 100s of TB of usable data storage.

Ondat has a simple pricing model, licensing costs are determined by the number of nodes in your k8s cluster. There’s different node pricing depending on deployment options but other than that it’s pretty straightforward.

Alex Chircop, CEO Ondat

Alex Chircop is the founder and CEO of Ondat (formerly StorageOS), which makes it possible to easily deploy and manage stateful Kubernetes applications with persistent data volumes. He also serves as co-chair of the CNCF (Cloud Native Computing Foundation) Storage Technical Advisory Group.

Alex comes from a technical background working in IT that includes more than 10 years with Nomura and Goldman Sachs.

November 8, 2021November 9, 2021

125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

We had some technical difficulties with Matt getting on the podcast so, Ray had to fly solo. This month we continue our investigations into K8s storage with a discussion with Tad Lebeck (@TadLebeck) US CTO, ionir, a software defined storage system that only runs under K8s. ionir Kubernetes Data Services platform is an outgrowth of Reduxio a “tin-wrapped” software defined storage system which pivoted to K8s as the environment to target and left the tin behind.

ionir offers a deduplicating, continuous data protection storage system for PVs (persistent volumes) under K8s that uses 3 way mirroring, across data nodes for data protection. Their solution offers a number of unique services that we haven’t seen in other K8s storage systems. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 41:38 — 57.2MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Ionir-21-11-transcript Download

Tad opened with a long spiel on what ionir is and we spent the next 40 minutes unpacking that to understand what exactly they were doing.

Let’s start with why stateful containers are all the rage these days. Tad had a slightly different rationale than we’ve heard before. From his perspective, it all comes from current enterprise applications that used database servers/machines. As these apps are re-factored to run as K8s containerized micro services, developers need and want their data be containerized right along with the application.

ionir constructs a block storage system across K8s data nodes or K8s worker nodes with direct attached storage. In the cloud, this storage can be ephemeral (storage that only exists as long as the compute instance operates) or normal block storage (e.g., EBS in AWS). It’s unclear how ephemeral works on-prem. But in any case, they cluster together a set of data nodes into one massive block storage and map PVs onto that. K8s data nodes can be added to the ionir cluster while it’s operating.

As mentioned earlier, they use 3-way mirroring for data protection and ionir insures the 3 copies are stored on different data nodes. As such, when one data node goes down, copies of PV data are available from the other 2 nodes and the data can then be rewritten elsewhere to insure 3-way mirroring continues. We suppose this means a minimum configuration requires at least 3 data nodes.

ionir also provides deduplicating block storage, which should theoretically reduce physical storage footprint for any PV. Data blocks are deduplicated across the cluster. ionir also has a metadata service (also 3 way replicated, to different data space) that records the manifest for all blocks associated with a PV, their hashes and (logical/physical) locations.

There was no mention of data compression or encryption so those are probably not present. We find deduplication very effective for backup storage but less effective for primary storage. Any deduplication ratio for ionir primary storage is likely specific to data being stored, i.e. columnar database, row database, text, office files, etc. Each of these would likely have different dedupe ratios for primary storage.

Furthermore, ionir supplies continuous data protection (CDP) for PV data. PV data written to ionir is immutable, i.e., never modified AND they keep previous versions of PV blocks in storage until they age out. This allows ionir to provide any prior version (well most recent ones) of a PV. ionic uses a timestamp to distinguish different PV versions. So, if ransomware attacked your site, users could ask for a PV version just prior to the time of the attack and you’d have that version of the PV to restart operations. Customer’s can limit how far back ionir saves prior versions of blocks for PVs.

Having CDP for PVs, makes DevOps qualification and testing significantly faster. Normally DevOps would need to copy production data to test environments in order to validate new app code. But ionir can easily instantiate a separate copy of any PV (at any time in their saved set) in a matter of seconds. This can take DevOps deployment testing down from days to minutes or less.

In addition, ionir can teleport PV data to other, remote K8s clusters running ionir. Essentially, this copies PV metadata and it’s “hot” blocks over to any remote ionir cluster. During teleportation, the remote cluster can access PV data as soon as all PV metadata has been copied. The remote site accesses this PV data from the originating cluster (albeit much slower than accesses within the cluster) while “hot” blocks are being copied. Any writes, at the remote site, to PV data would be considered new data, deduplicated at the remote site, and only available at the remote site. Somewhat surprisingly, all of the PV’s data is never copied to the remote system, leaving the PV in a permanent teleported access mode.

Not sure we like the implications of teleporting PVs, from a data integrity perspective. It does make for near-instant access to PV data from other clusters and offers a solution to data gravity (it takes forever to move TB of data across the web), it’s incomplete, as the data is never fully copied to the remote site. Once hot blocks have been copied, remote cluster PV access should run faster. But If there’s 20% of the requested blocks, not in the heat map, those IOs will take 100s mseclonger, depending on wire distance between the sites, to perform. And the write’s at the remote site cause the two copies (one at source site and one at remote site) of the PV to diverge.

Their storage system is priced on a per data node basis which makes it easy to price out their various deployment options. And it works on any K8s standard environment, although Tad admits they haven’t tested VMware Tanzu yet, but they have tested it on GCP, Microsoft Azure, AWS, and Red Hat OpenShift.

They offer a fully functional free trial of ionir storage, only capped at the number of data nodes in use. So, if you only need a small amount of storage (ok 3 data nodes with 24 14TB SSDs each make for large amount of storage) for your K8s environment, you can probably run forever on the free version.

Tad Lebeck, US CTO, ionir

Tad Lebeck is a global technology executive with over two decades of experience in startups and large vendors. Prior to ionir, he founded and led Nuvoloso, an innovator in Kubernetes data services. Earlier, Lebeck served as CTO at Huawei Symantec Technologies, Vice President at Symantec/Veritas, co-founder/CTO at Invio, and CTO at Legato Systems, where he helped create the modern enterprise data-protection market.

Tad was a founding member of the SNIA Technical Council. He earned an MS/CS from the University of Wisconsin, and a combined MBA from the Columbia, London, and HKU Schools of Business.

	174: GreyBeards talk… on 174: GreyBeards talk SDN chips…
	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…
	J Metz on 134: GreyBeards talk (storage)…