148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex

Sponsored By:

This is one of a series of podcasts the GreyBeards are doing with Dell PowerFlex software defined infrastructure. Today, we talked with Anthony Cinelli, Sr. Director Dell Technologies and Brian Dean, Technical Marketing for PowerFlex. We have talked with Brian before but this is the first time we’ve met Anthony. They were both very knowledgeable about PowerFlex and the challenges large enterprises have today with their storage environments.

The key to PowerFlex’s software defined solution is its extreme flexibility, which comes mainly from its architecture which offers scale-out deployment options ranging from HCI solutions to a fully disaggregated compute-storage environment, in seemingly any combination (see technical resources for more info). With this sophistication, PowerFlex can help consolidate enterprise storage across just about any environment from virtualized workloads, to standalone databases, big data analytics, as well as containerized environments and of course, the cloud. Listen to the podcast to learn more.

To support this extreme flexibility, PowerFlex uses both client and storage software that can be configured together on a server (HCI) or apart, across compute and storage nodes to offer block storage. PowerFlex client software runs on any modern bare-metal or virtualized environment.

Anthony mentioned that one common problem to enterprises today is storage sprawl. Most large customers have an IT environment with sizable hypervisor based workloads, a dedicated database workload, a big data/analytics workload, a modern container based workload stack, an AI/ML/DL workload and more often than not, a vertical specific workload.

Each workload usually has their own storage system. And the problem with 4-7 different storage systems is cost, e.g., cost of underutilized storage. Typical to these environments, each storage system could be used at say, 60% utilization on average, but this will vary a lot between silos, leading to stranded capacity.

The main reason customers haven’t consolidated yet is because each silo has different performance characteristics. As a result, they end up purchasing excess capacity which increases cost and complexity, as a standard part of doing business.

To consolidate storage across these disparate environments requires a no-holds barred approach to IO performance, second to none, which PowerFlex can deliver. The secret to to its high levels of IO performance is RAID 10, deployed across a scale-out cluster. And PowerFlex clusters can range from 4 to 1000 or more nodes.

RAIID 10 mirrors data and spreads mirrored data across all drives and servers in a cluster or some subset. As a result, as you add storage nodes, IO performance scales up, almost linearly.

Yes, there can be other bottlenecks in clusters like this, most often networking, but with PowerFlex storage, IO need not be one of them. Anthony mentioned that PowerFlex will perform as fast as your infrastructure will support. So if your environment has 25 Gig Ethernet, it will perform IO at that speed, if you use 100 Gig Ethernet, it will perform at that speed.

In addition, PowerFlex offers automated LifeCycle Management (LCM), which can make having a 1000 node PowerFlex cluster almost as easy as a 10 node cluster. However to make use this automated LCM, one must run its storage server software on Dell PowerEdge servers.

Brian said adding or decommissioning PowerFlex nodes is a painless process. Because data is always mirrored, customers can remove any node, at any time and PowerFlex will automatically rebuild data across other nodes and drives. When you add nodes, those drives become immediately available to support more IO activity. Another item to note, because of RAID 10, PowerFlex mirror rebuilds happen very fast, as just about every other drive and node in the cluster (or subset) participates in the rebuild process.

PowerFlex supports Storage Pools. This partitions PowerFlex storage nodes and devices into multiple pools of storage used to host volume IO and data Storage pools can be used to segregate higher performing storage nodes from lower performing ones so that some volumes can exclusively reside on higher (or lower) performing hardware.

Although customers can configure PowerFlex to use all nodes and drives in a system or storage pool for volume data mirroring, PowerFlex offers other data placement alternatives to support high availability.

PowerFlex supports Protection Domains which are subsets or collections of storage servers and drives in a cluster where volume data will reside. This will allow one protection domain to go down while others continue to operate. Realize that because volume data is mirrored across all devices in a protection domain, it will take lots of nodes or devices to go down before a protection domain is out of action.

PowerFlex also uses Fault Sets, which are a collection of storage servers and their devices within a Protection Domain, that will contain one half of a volume’s data mirror. PowerFlex will insure that a primary and its mirror copy of volume’s data will not both reside on the same fault set. A fault set could be a rack of servers, multiple racks, all PowerFlex storage servers in an AZ, etc. With fault sets, customer data will always reside across a minimum of two fault sets, and if any one goes down, data is still available.

PowerFlex also operates in the cloud. In this case, customers bring their own PowerFlex software and deploy it over cloud compute and storage.

Brian mentioned that anything PowerFlex can do such as reconfiguring servers, can be done through RESTful/API calls. This can be particularly useful in cloud deployments as above, if customers want to scale up or down IO performance automatically.

Besides block services, PowerFlex also offers NFS/CIFS-SMB native file services using a File Node Controller. This frontends PowerFlex storage nodes to support customer NFS/SMB file access to PowerFlex data.

Anthony Cinelli, Sr. Director Global PowerFlex Software Defined & MultiCloud Solutions

Anthony Cinelli is a key leader for Dell Technologies helping drive the success of our software defined and multicloud solutions portfolio across the customer landscape. Anthony has been with Dell for 13 years and in that time has helped launch our HCI and Software Defined businesses from startup to the multi-billion dollar lines of business they now represent for Dell.

Anthony has a wealth of experience helping some of the largest organizations in the world achieve their IT transformation and multicloud initiatives through the use of software defined technologies.

Brian Dean, Dell PowerFlex Technical Marketing

Brian is a 16+ year veteran of the technology industry, and before that spent a decade in higher education. Brian has worked at EMC and Dell for 7 years, first as Solutions Architect and then as TME, focusing primarily on PowerFlex and software-defined storage ecosystems.

Prior to joining EMC, Brian was on the consumer/buyer side of large storage systems, directing operations for two Internet-based digital video surveillance startups.

When he’s not wrestling with computer systems, he might be found hiking and climbing in the mountains of North Carolina.

142: GreyBeards talk scale-out, software defined storage with Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Software defined storage is a pretty full segment of the market these days. So, it’s surprising when a new entrant comes along. We saw a story on Quobyte in Blocks and Files and thought it would be great to talk with Bjorn Kolbeck (LinkedIn), Co-Founder & CEO, Quobyte. Bjorn got his PhD in scale out storage and went to work at Google on anything but storage. While there, he was amazed by Goodle’s vast infrastructure being managed by only a few people and thought this could should be commercialized, so Quobyte was born. Listen to the podcast to learn more.

Quobyte is a scale out file and object storage system with mirrored metadata and data which is 3-way mirrored or erasure coded (EC). Minimum cluster is 4 nodes (fault tolerant for a single node failure.). Quobyte has current customers with ~250 nodes and ~20K clients accessing a storage cluster.

Although they support NFSv3 and NFSv4 for file (and object) access, their solution is typically deployed using host client and storage services software accessing the files with Posix or objects via S3. Objects can also be accessed as file within the file system directories.

Host client software runs on Linux, Mac or Windows machines. Storage server software runs on Linux systems bare metal or under VMs in user space. Quobyte also support containerized storage server software for K8s but their bare metal/VM storage server software option doesn’t require containers.

Quobyte is also available in the GCP marketplace and can run in AWS, Azure and Oracle Cloud.

Their metadata service is a mirrored key-value store distributed across any number of (customer configured, I believe) storage nodes. Metadata resides on flash and distribution is designed to eliminate the metadata service as a performance bottleneck.

Their data services supports (any number of) storage tiers. Storage policies determine how tiering is used for files, directories, objects, etc. For example, with 3 tiers (NVMe Flash, SSD, and disk), file data could be first landed on NVMe Flash, but as it grows, it gets moved off to SSD, and as it grows, even more, it’s moved to disk. This could also be triggered using time since last access.

Bjorn said anything in file system metadata could be used to trigger data movement across tiers. Each tier could be defined with different data protection policies, like mirroring or EC 8+3.

Backend storage is split up into Volumes. They also support thinly provisioned volumes for file creation.

Unclear how tiering and thin provisioning applies to objects with much richer metadata options but as they can be mapped to files, we suppose that anything in the object file metadata could conceivably used to trigger tiering as a bare minimum.

As for security, 

  1. Quobyte supports end to end data encryption. This is done once and the customer owns the keys. They do support external key servers.  I believe this is another option that is enabled by file based policy management. It seems like different files can have different keys to encrypt them.
  2. Quobyte supports TLS. Depending on customer requirements data may go across open networks and this is where TLS could very well be used. And Quobyte supports user X.509 certificates for users, devices and systems authentication. 
  3. Quobyte supports file access controls. They support a subset of Windows capabilities but have full support for Linux and Mac access controls.

Quobyte also supports two forms of cluster to cluster replication. One is event driven where event occurrence (i.e. file close) signals data replication and another which is time driven (i.e., every 5 minutes) but both are asynchronous.

Quobyte was designed from the start to be completely API driven. But they do support CLI and a GUI for those customers that want them. 

They have a Free (forever) edition, a downloadable version of the software without 24/7 support and minus some enterprise capabilities (think encryption). This is gated at 150TB disk/30TB flash with limited number of clients and volumes.

The Infrastructure edition is their full featured solution with 7/24 enterprise support. It’s comes with a yearly service fee, priced by capacity with volume discounts.

Bjorn Kolbeck, Co-Founder & CEO, Quobyte

Bjorn Kolbeck, Co-Founder and CEO of Quobyte attended the Technical University of Berlin and Humboldt University of Berlin.

His PhD thesis dealt with fault-tolerant replication, but he gained several years’ experience in distributed and storage systems while developing the distributed research file system XtreemFS at the Zuse Institute Berlin.

He then spent time at Google working as a Software Engineer before he and fellow Co-Founder Felix Hupfield decided to combine the innovative research from XtreemFS and the operations experience from Google to build a highly reliable and scalable enterprise-grade storage system now known as Quobyte.

126: GreyBeards talk k8s storage with Alex Chircop, CEO, Ondat

Keith and I had an interesting discussion with Alex Chircop (@chira001), CEO of Ondat, a kubernetes storage provider. They have a high performing system, laser focused on providing storage for k8s stateful container applications. Their storage is entirely containerized and has a number of advanced features for data availability, performance and security that developers need the run stateful container apps. Listen to the podcast to learn more.

We started by asking Alex how Ondats different from all the other k8s storage solutions out there today (which we’ve been talking with lately). He mentioned three crucial capabilities:

  • Ondat was developed from the ground up to run as k8s containers. Doing this would allow any k8s distribution to run their storage to support stateful container apps. .
  • Ondat was designed to allow developers to run any possible container app. Ondat supports both block as well as file storage volumes.
  • Ondat provides consistent, superior performance, at scale, with no compromises. Sophisticated data placement insures that data is located where it is consumed and their highly optimized data path provides low-latency access that data storage.

Ondat creates a data mesh (storage pool) out of all storage cluster nodes. Container volumes are carved out of this data mesh and at creation time, data and the apps that use them are co-located on the same cluster nodes.

At volume creation, Dev can specify the number of replicas (mirrors) to be maintained by the system. Alex mentioned that Ondat uses synchronous replication between replica clusters nodes to make sure that all active replica’s are up to date with the last IO that occurred to primary storage.

Ondat compresses all data that goes over the network as well as encrypts data in flight. Dev can easily specify that the data-at-rest also be compressed and/or encrypted. Compressing data in flight helps supply consistent performance where networks are shared.

Alex also mentioned that they support both the 1 reader/writer, k8s block storage volumes as well as multi-reader/multi-writer, k8s file storage volumes for containers.

In Ondat each storage volume includes a mini-brain used to determine primary and replica data placement. Ondat also uses desegregated consensus to decide what happens to primary and replica data after a k8s split cluster occurs. After a split cluster, isolated replica’s are invalidated and replicas are recreate, where possible, in the surviving nodes of the cluster portion that holds the primary copy of the data.

Also replica’s can optionally be located across AZs if available in your k8s cluster. Ondat doesn’t currentlysupport replication across k8s clusters.

Ondat storage works on any hyperscaler k8s solution as well as any onprem k8s system. I asked if Ondat supports VMware TKG and Alex said yes but when pushed mentioned that they have not tested it yet.

Keith asked what happens when things go south, i.e., an application starts to suffer worse performance. Alex said that Ondat supplies system telemetry to k8s logging systems which can be used to understand what’s going on. But he also mentioned they are working on a cloud based, Management-aaS offering, to provide multi-cluster operational views of Ondat storage in operation to help understand, isolate and fix problems like this.

Keith mentioned he had attended a talk by Google engineers that developed kubernetes and they said stateful containers don’t belong under kubernetes. So why are stateful containers becoming so ubiquitous now.

Alex said that may have been the case originally but k8s has come a long way from then and nowadays as many enterprises shift left enterprise applications from their old system environment to run as containers they all require state for processing. Having that stateful information or storage volumes accessible directly under k8s makes application re-implementation much easier.

What’s a typical Ondat configuration? Alex said there doesn’t appear to be one. Current Ondat deployments range from a few 100 to 1000s of k8s cluster nodes and 10 to 100s of TB of usable data storage.

Ondat has a simple pricing model, licensing costs are determined by the number of nodes in your k8s cluster. There’s different node pricing depending on deployment options but other than that it’s pretty straightforward.

Alex Chircop, CEO Ondat

Alex Chircop is the founder and CEO of Ondat (formerly StorageOS), which makes it possible to easily deploy and manage stateful Kubernetes applications with persistent data volumes. He also serves as co-chair of the CNCF (Cloud Native Computing Foundation) Storage Technical Advisory Group.

Alex comes from a technical background working in IT that includes more than 10 years with Nomura and Goldman Sachs.

125: GreyBeard talk K8s storage with Tad Lebeck, US CTO for ionir

We had some technical difficulties with Matt getting on the podcast so, Ray had to fly solo. This month we continue our investigations into K8s storage with a discussion with Tad Lebeck (@TadLebeck) US CTO, ionir, a software defined storage system that only runs under K8s. ionir Kubernetes Data Services platform is an outgrowth of Reduxio a “tin-wrapped” software defined storage system which pivoted to K8s as the environment to target and left the tin behind.

ionir offers a deduplicating, continuous data protection storage system for PVs (persistent volumes) under K8s that uses 3 way mirroring, across data nodes for data protection. Their solution offers a number of unique services that we haven’t seen in other K8s storage systems. Listen to the podcast to learn more.

Tad opened with a long spiel on what ionir is and we spent the next 40 minutes unpacking that to understand what exactly they were doing.

Let’s start with why stateful containers are all the rage these days. Tad had a slightly different rationale than we’ve heard before. From his perspective, it all comes from current enterprise applications that used database servers/machines. As these apps are re-factored to run as K8s containerized micro services, developers need and want their data be containerized right along with the application.

ionir constructs a block storage system across K8s data nodes or K8s worker nodes with direct attached storage. In the cloud, this storage can be ephemeral (storage that only exists as long as the compute instance operates) or normal block storage (e.g., EBS in AWS). It’s unclear how ephemeral works on-prem. But in any case, they cluster together a set of data nodes into one massive block storage and map PVs onto that. K8s data nodes can be added to the ionir cluster while it’s operating.

As mentioned earlier, they use 3-way mirroring for data protection and ionir insures the 3 copies are stored on different data nodes. As such, when one data node goes down, copies of PV data are available from the other 2 nodes and the data can then be rewritten elsewhere to insure 3-way mirroring continues. We suppose this means a minimum configuration requires at least 3 data nodes.

ionir also provides deduplicating block storage, which should theoretically reduce physical storage footprint for any PV. Data blocks are deduplicated across the cluster. ionir also has a metadata service (also 3 way replicated, to different data space) that records the manifest for all blocks associated with a PV, their hashes and (logical/physical) locations.

There was no mention of data compression or encryption so those are probably not present. We find deduplication very effective for backup storage but less effective for primary storage. Any deduplication ratio for ionir primary storage is likely specific to data being stored, i.e. columnar database, row database, text, office files, etc. Each of these would likely have different dedupe ratios for primary storage.

Furthermore, ionir supplies continuous data protection (CDP) for PV data. PV data written to ionir is immutable, i.e., never modified AND they keep previous versions of PV blocks in storage until they age out. This allows ionir to provide any prior version (well most recent ones) of a PV. ionic uses a timestamp to distinguish different PV versions. So, if ransomware attacked your site, users could ask for a PV version just prior to the time of the attack and you’d have that version of the PV to restart operations. Customer’s can limit how far back ionir saves prior versions of blocks for PVs.

Having CDP for PVs, makes DevOps qualification and testing significantly faster. Normally DevOps would need to copy production data to test environments in order to validate new app code. But ionir can easily instantiate a separate copy of any PV (at any time in their saved set) in a matter of seconds. This can take DevOps deployment testing down from days to minutes or less.

In addition, ionir can teleport PV data to other, remote K8s clusters running ionir. Essentially, this copies PV metadata and it’s “hot” blocks over to any remote ionir cluster. During teleportation, the remote cluster can access PV data as soon as all PV metadata has been copied. The remote site accesses this PV data from the originating cluster (albeit much slower than accesses within the cluster) while “hot” blocks are being copied. Any writes, at the remote site, to PV data would be considered new data, deduplicated at the remote site, and only available at the remote site. Somewhat surprisingly, all of the PV’s data is never copied to the remote system, leaving the PV in a permanent teleported access mode.

Not sure we like the implications of teleporting PVs, from a data integrity perspective. It does make for near-instant access to PV data from other clusters and offers a solution to data gravity (it takes forever to move TB of data across the web), it’s incomplete, as the data is never fully copied to the remote site. Once hot blocks have been copied, remote cluster PV access should run faster. But If there’s 20% of the requested blocks, not in the heat map, those IOs will take 100s mseclonger, depending on wire distance between the sites, to perform. And the write’s at the remote site cause the two copies (one at source site and one at remote site) of the PV to diverge.

Their storage system is priced on a per data node basis which makes it easy to price out their various deployment options. And it works on any K8s standard environment, although Tad admits they haven’t tested VMware Tanzu yet, but they have tested it on GCP, Microsoft Azure, AWS, and Red Hat OpenShift.

They offer a fully functional free trial of ionir storage, only capped at the number of data nodes in use. So, if you only need a small amount of storage (ok 3 data nodes with 24 14TB SSDs each make for large amount of storage) for your K8s environment, you can probably run forever on the free version.

Tad Lebeck, US CTO, ionir

Tad Lebeck is a global technology executive with over two decades of experience in startups and large vendors. Prior to ionir, he founded and led Nuvoloso, an innovator in Kubernetes data services. Earlier, Lebeck served as CTO at Huawei Symantec Technologies, Vice President at Symantec/Veritas, co-founder/CTO at Invio, and CTO at Legato Systems, where he helped create the modern enterprise data-protection market.

Tad was a founding member of the SNIA Technical Council. He earned an MS/CS from the University of Wisconsin, and a combined MBA from the Columbia, London, and HKU Schools of Business.