148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex

Sponsored By:

This is one of a series of podcasts the GreyBeards are doing with Dell PowerFlex software defined infrastructure. Today, we talked with Anthony Cinelli, Sr. Director Dell Technologies and Brian Dean, Technical Marketing for PowerFlex. We have talked with Brian before but this is the first time we’ve met Anthony. They were both very knowledgeable about PowerFlex and the challenges large enterprises have today with their storage environments.

The key to PowerFlex’s software defined solution is its extreme flexibility, which comes mainly from its architecture which offers scale-out deployment options ranging from HCI solutions to a fully disaggregated compute-storage environment, in seemingly any combination (see technical resources for more info). With this sophistication, PowerFlex can help consolidate enterprise storage across just about any environment from virtualized workloads, to standalone databases, big data analytics, as well as containerized environments and of course, the cloud. Listen to the podcast to learn more.

To support this extreme flexibility, PowerFlex uses both client and storage software that can be configured together on a server (HCI) or apart, across compute and storage nodes to offer block storage. PowerFlex client software runs on any modern bare-metal or virtualized environment.

Anthony mentioned that one common problem to enterprises today is storage sprawl. Most large customers have an IT environment with sizable hypervisor based workloads, a dedicated database workload, a big data/analytics workload, a modern container based workload stack, an AI/ML/DL workload and more often than not, a vertical specific workload.

Each workload usually has their own storage system. And the problem with 4-7 different storage systems is cost, e.g., cost of underutilized storage. Typical to these environments, each storage system could be used at say, 60% utilization on average, but this will vary a lot between silos, leading to stranded capacity.

The main reason customers haven’t consolidated yet is because each silo has different performance characteristics. As a result, they end up purchasing excess capacity which increases cost and complexity, as a standard part of doing business.

To consolidate storage across these disparate environments requires a no-holds barred approach to IO performance, second to none, which PowerFlex can deliver. The secret to to its high levels of IO performance is RAID 10, deployed across a scale-out cluster. And PowerFlex clusters can range from 4 to 1000 or more nodes.

RAIID 10 mirrors data and spreads mirrored data across all drives and servers in a cluster or some subset. As a result, as you add storage nodes, IO performance scales up, almost linearly.

Yes, there can be other bottlenecks in clusters like this, most often networking, but with PowerFlex storage, IO need not be one of them. Anthony mentioned that PowerFlex will perform as fast as your infrastructure will support. So if your environment has 25 Gig Ethernet, it will perform IO at that speed, if you use 100 Gig Ethernet, it will perform at that speed.

In addition, PowerFlex offers automated LifeCycle Management (LCM), which can make having a 1000 node PowerFlex cluster almost as easy as a 10 node cluster. However to make use this automated LCM, one must run its storage server software on Dell PowerEdge servers.

Brian said adding or decommissioning PowerFlex nodes is a painless process. Because data is always mirrored, customers can remove any node, at any time and PowerFlex will automatically rebuild data across other nodes and drives. When you add nodes, those drives become immediately available to support more IO activity. Another item to note, because of RAID 10, PowerFlex mirror rebuilds happen very fast, as just about every other drive and node in the cluster (or subset) participates in the rebuild process.

PowerFlex supports Storage Pools. This partitions PowerFlex storage nodes and devices into multiple pools of storage used to host volume IO and data Storage pools can be used to segregate higher performing storage nodes from lower performing ones so that some volumes can exclusively reside on higher (or lower) performing hardware.

Although customers can configure PowerFlex to use all nodes and drives in a system or storage pool for volume data mirroring, PowerFlex offers other data placement alternatives to support high availability.

PowerFlex supports Protection Domains which are subsets or collections of storage servers and drives in a cluster where volume data will reside. This will allow one protection domain to go down while others continue to operate. Realize that because volume data is mirrored across all devices in a protection domain, it will take lots of nodes or devices to go down before a protection domain is out of action.

PowerFlex also uses Fault Sets, which are a collection of storage servers and their devices within a Protection Domain, that will contain one half of a volume’s data mirror. PowerFlex will insure that a primary and its mirror copy of volume’s data will not both reside on the same fault set. A fault set could be a rack of servers, multiple racks, all PowerFlex storage servers in an AZ, etc. With fault sets, customer data will always reside across a minimum of two fault sets, and if any one goes down, data is still available.

PowerFlex also operates in the cloud. In this case, customers bring their own PowerFlex software and deploy it over cloud compute and storage.

Brian mentioned that anything PowerFlex can do such as reconfiguring servers, can be done through RESTful/API calls. This can be particularly useful in cloud deployments as above, if customers want to scale up or down IO performance automatically.

Besides block services, PowerFlex also offers NFS/CIFS-SMB native file services using a File Node Controller. This frontends PowerFlex storage nodes to support customer NFS/SMB file access to PowerFlex data.

Anthony Cinelli, Sr. Director Global PowerFlex Software Defined & MultiCloud Solutions

Anthony Cinelli is a key leader for Dell Technologies helping drive the success of our software defined and multicloud solutions portfolio across the customer landscape. Anthony has been with Dell for 13 years and in that time has helped launch our HCI and Software Defined businesses from startup to the multi-billion dollar lines of business they now represent for Dell.

Anthony has a wealth of experience helping some of the largest organizations in the world achieve their IT transformation and multicloud initiatives through the use of software defined technologies.

Brian Dean, Dell PowerFlex Technical Marketing

Brian is a 16+ year veteran of the technology industry, and before that spent a decade in higher education. Brian has worked at EMC and Dell for 7 years, first as Solutions Architect and then as TME, focusing primarily on PowerFlex and software-defined storage ecosystems.

Prior to joining EMC, Brian was on the consumer/buyer side of large storage systems, directing operations for two Internet-based digital video surveillance startups.

When he’s not wrestling with computer systems, he might be found hiking and climbing in the mountains of North Carolina.

145: GreyBeards talk proactive NAS security with Jonathan Halstuch, CTO & Co-Founder, RackTop Systems

Sponsored By:

We’ve known about RackTop Systems. since episode 84, and have been watching them ever since. On this episode we, once again, talk with Jonathan Halstuch (@JAHGT), CTO and Co-Founder, RackTop Systems.

RackTop was always very security oriented but lately they have taken this to the next level. As Jonathan says on the podcast, historically security has been mostly a network problem but since ransomware has emerged, security is now often a data concern too. The intent of proactive NAS security is to identify and thwart bad actors before they impact data, rather than after the fact. Listen to the podcast to learn more.

Proactive security for NAS storage includes monitoring user IO and administrator activity and looking for anomalies. RackTop has the ability (via config options) to halt IO activity when things look wrong, that is user/application IO looks differently than what has been seen in the past. They also examine admin activity, a popular vector for ransomware attacks. RackTop IO/admin activity scanning is done in real time as IO is processed and admin commands received.

The customer gets to decide how far to take this. The challenge with automatically halting access is false positives, when say a new application starts taking off. Security admins must have an easy way to see and understand what was anomalous/what not and to quickly let that user/application return to normal activities or take it out.

In addition to just stopping access, they can also just report it to admins/security staff. Moreover, the system can also automatically take snapshots of data when anomalous behavior is detected, to give admins and security a point-in-time view into the data before bad behavior occurs.

RackTop Systems have a number of assessors that look for specific anomalous activity used to detect and act to twart malware. For example, an admin assessor is looking at all admin operations to determine if these are considered normal or not.

RackTop also support special time period access permissions. These provide temporary, time-dependent, unusual access rights to data for admins, users or applications that would normally be considered a breach. Such as having an admin copying lots of data or moving and deleting data. These are for situations that crop up where mass data deletion, movement or copying would be valid. When the time period access permission elapses, the system goes back into monitoring for anomalous behavior.

We talked about the overhead of doing all this scanning and detection in real time and how that may impact system IO performance. For other storage vendors, these sorts of activities are often done with standalone appliances, which of course add additional IO to a storage system to do offline scans.

Jonathan said, with recent Intel Xeon multi-core processors, they can readily afford the CPU cycles/cores required to do their scanning during IO processing, without sacrificing IO performance.

RackTop also supports a number of reports to show system configured data/user/application access rights as well as what accesses have occurred over time. Such reports offer admin/security teams visibility into data access rights and usage.

RackTop can be deployed in hybrid disk-flash solutions, as storage software in public clouds, in an HCI solution, or in edge environments that replicate back to core data centers. And they can also be used as a backup/archive data target for backup systems. RackTop Systems NAS supports CIFS 1.0- SMB 3.1.1, and NFSv3-v4.2.

RackTop Systems have customers in national government agencies, security sensitive commercial sectors, state gov’t, healthcare, and just about anyone subject to ransomware attacks on a regular basis. Which nowadays, is pretty much every IT organization on the planet.

Jonathan Halstuch, CTO & Co-Founder, RackTop Systems

Jonathan Halstuch is the Chief Technology Officer and co-founder of RackTop Systems. He holds a bachelor’s degree in computer engineering from Georgia Tech as well as a master’s degree in engineering and technology management from George Washington University.

With over 20-years of experience as an engineer, technologist, and manager for the federal government he provides organizations the most efficient and secure data management solutions to accelerate operations while reducing the burden on admins, users, and executives.

141: GreyBeards annual 2022 wrap-up podcast

Well it has been another year and time for our annual year end wrap up. Since Covid hit, every year has certainly been interesting. This year we have seen the start of back in person conferences which was a welcome change from the covid lockdown. We are very glad to start seeing everybody again.

From the tech standpoint, the big news this year was CXL. As everyone should recall, CXL is a new-ish PCIe hardware and protocol that supports larger memory sitting out on a PCIe bus and in the future shared memory between servers. All this is to enable a new wave of memory based computing. We spent probably half our time discussing CXL and it’s impact on IT.

The other major topic was the Cloud Native ecosystem. In the past all we talked about was K8s but nowadays the ecosystem that surrounds it is almost as important as K8s itself. The final topic was a bit of a shock earlier this year and yes it was the Broadcom’s acquisition of VMware. Jason and I spend our Explore podcast talking about it (see our 137: VMware Explore wrap-up). Keith has high hopes that the EU will shut it down but the jury’s still out on that one. Listen to the podcast to learn more.

As for CXL, it turns out that AMD have just released full support for CXL hardware and protocols with their latest round of CPU chips. But the new AMD CPUs only support DDR5 memory, (something about there’s only so much logic one can fit on a chip…) which means all those DDR4 DIMs out in the wild need somewhere to land. CXL could supply a new lease on life for DDR4 DIMs.

And it’s not just about shared memory or increased memory sizes, CXL can also provide a tiered memory hierarchy, with gobs of flash behind memory DIMs (see: 136: FMS2022 wrap up …) So, now its no longer a TB or ten of server memory but potentially 100s of TBs. What this means for SAP HANNA, AWS Aurora and other heavy-memory solutions has yet to play out.

Cloud Native won. We see this in the increasing adoption of containers and K8s in the enterprise, cloud and just about anywhere IT happens these days. But the ecosystem surrounding K8s is chaos.

Over time, many of these ecosystem solutions will die off, be purchased, or consolidated but in the mean time, it’s entirely too confusing. Red Hat’s OpenShift is one answer and VMware’s Tanzu is another. And of course all the clouds have their own K8s packaged solution. But just to cover their bets, everyone also supports native K8s and just about every software package that works with it. So, K8s’s ecosystem is in a state of flux and may take time to become a stable set of tools useable by the enterprise IT.

Finally, Broadcom’s acquisition of VMware has everyone up in arms. Customers are concerned the R&D juggernaut that VMware has been, since its very beginning, will be jettisoned in favor of profits. And HCI vendors that always felt Dell EMC had an unfair advantage will all look at Broadcom in a similar light.

Keith says there’s a major difference in how USA regulators view an acquisition and how EU regulators view one. According to Keith, EU views acquisitions in how they help or hurt the customer. USA regulators view acquisitions on show they help or hurt the competition. Will have to wait and see how this all plays for Broadcom-VMware.

On the other hand, speaking of competition, Nutanix seems to be feeling the heat as well. Rumors are it’s up for sale. Who will want it and how the regulators view both of these acquisitions may be as interesting story for 2023

2023 looks to be another year of transition for enterprise IT. The cloud players all seem to be coming around to the view that they can’t be all things to all (IT) people. And the enterprise vendors are finally seeing some modicum of staying power in the face of a relentless push to the cloud. How this plays out over the next few years will be of major interest to everybody.

Happy New Year from the GreyBeards!

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Jason Collier, Principal Member of Technical Staff, AMD

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology. He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years. He’s on LinkedIN.

140: Greybeards talk data orchestration with Matt Leib, Product Marketing Manager for IBM Spectrum Fusion

As our listeners should know, Matt Leib (@MBleib) was a GreyBeards co-host But since then, Matt has joined IBM to become Product Marketing Manager on IBM Spectrum Fusion, a data orchestration solution for Red Hat OpenShift environments. Matt’s been in and around the storage and data management industry for many years which is why we tapped him for GreyBeards co-host duties.

IBM Fusion, in its previous incarnation, came as an OpenShift software defined storage or as an OpenShift (H)CI solution. But recently, Fusion has taken on more of a data orchestration role for OpenShift stateful containerized applications. Listen to the podcast to learn more.

Fusion can run in any OpenShift deployment whether (currently AWS, Azure, & IBM) clouds, under VMware (wherever it runs), or on (x86 or IBM Z) bare metal. It supplies NFS file or S3 compatible object storage for container applications running under OpenShift. But it does more than just storage.

Beyond storage, Fusion includes backup/recovery, site to site DR and global (file & object) data access. It’s almost like someone opened up the IBM Spectrum software pantry and took out the best available functionality and cooked it up in to an OpenShift solution. IBM’s Spectrum Fusion current website (linked to above (Dec.’22)) still refers only to the software defined storage and (H)CI solution, but today’s Fusion includes all of the functions identified above.

All Fusion facilities run as containers under OpenShift. Customers can elect to run all Fusion services or pick and chose which ones they want for their environment. IBM Fusion supports an API, an API backed GUI, and CLI for its storage & data management as well as REST access. Fusion is fully compatible with Red Hat Ansible.

IBM Fusion is intended to be storage agnostic. Which means it can support its data management services for any NFS file storage as well as anyone’s S3 compatible, object storage.

Now that Red Hat software defined CEPH and ODF are under IBM product management, CEPH and ODF options will become available under Fusion. And CEPH offers block as well as file and object. We’ve talked about CEPH before, packaged in a hardware appliance, see our SoftIron podcast.

One intriguing part of the Fusion solution is its global data access. With global access, any OpenShift application can access data from any Fusion data store, across clouds, across on prem installations, or just about anywhere OpenShift is running. Matt mentioned that compute could be on AWS OpenShift, Fusion’s data control plane could be running on prem OpenShift and the data storage could be running on Azure OpenShift. All this would be glued together by Fusion global access, so that AWS compute had access to data on Azure.

There’s some sophisticated caching magic to make global access happen seamlessly and with decent levels of performance, but customers no longer have to copy whole file systems over from one cloud to another in order to move compute or data. IBM Fusion would need to run in all those locations for global access.

Keith asked if it was directly available in the AWS marketplace. Matt said not yet but you can deploy OpenShift out of the marketplace and then deploy IBM Fusion onto that.

It took us sometime to get our heads wrapped around what Fusion has to offer and throughout it all, Keith and I had a bit of fun with Matt.

Matthew Leib, Product Marketing Manager, IBM Spectrum Fusion

Matt has spent years in IT, from Engineering, to Architecture, from PreSales to analyst work, and finally to Product Marketing at IBM.

He’s spent years trying to achieve both credibility in the space, as a podcaster, blogger, and community member.

In his spare time, he’s a dad, dog owner, and amateur guitar player..

138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio

We have never talked with Alluxio before but after coming back last week from Cloud Field Day 15 (CFD15) it seemed a good time to talk with other solution providers attempting to make hybrid cloud easier to use. Adit Madan (@madanadit) , Director of Product Management, Alluxio, which is a data orchestration solution that’s available in both a free to download/use, open source, community edition (apparently, Meta is a customer ) or a licensed, closed source, enterprise edition.

Alluxio data orchestration is all about suppling local like, IO access to data that resides elsewhere for BI, AI/ML/DL, and just about any other application needing to process data residing elsewhere. Listen to the podcast to learn more

Alluxio started out at UC Berkeley’s AMPlab, which is focused on big data problems and was designed to provide local access to massive amounts of distributed data. Alluxio ends up constructing a locally accessible, federation of data sources for compute apps running elsewhere,

Alluxio software installs near where compute apps run that need access to remote data. We asked about a typical cloud bursting case where S3 object data needed by an app are sitting on prem, but the apps need to run in a cloud, e.g., AWS.

He said Alluxio software would be deployed in AWS, close to app compute and that’s all there is. There’s no Alluxio software running on prem, as Alluxio just uses normal (remote access) S3 APIs to supply data to the compute apps running in AWS.

Adit mentioned that BI was one of the main applications to take advantage of Alluxio, but AI/ML/DL learning is another that could use data orchestration. It turns out that AI/ ML/DL training’s consumption of data is repetitive and highly sequential, so caching, sequential pre-fetch and other Alluxio techniques can work well there to provide local-like access to remote data.

Adit said that enterprises are increasingly looking to avoid vendor lock-in and this applies equally well to the cloud. By supporting data access in one location, say GC,P and accessing that data from another, say Azure, data gravity need no longer limit where work is done.

Adit said what makes their solution so valuable is that instead of duplicating all data from one place to another all that Alluxio moves is just the data required/requested by the apps running there.

Keith asked whether Adit considered Alluxio a data mesh or data fabric. Keith had to explain the terms to me and said data fabrics are pipes and physical infrastructure/functionality that moves data around and data mesh is what gives clients/apps/users access to that data. From that perspective Alluxio is a data mesh.

Alluxio Caching

Adit said that caching is one of the keys to making Alluxio work. Much of the success of their solution depends on applications having a well behaved working set. He also mentioned they use pre-fetching and other techniques to minimize access latency and maximize throughput. However, the first byte of data being accessed may take some time to get to where compute executes.

Adit said it’s not unusual for them to have a 1/2PB of cache (storage) for an application with multiPBs of source data.

Keith asked how Alluxio’s performance can be managed. Adit said they (we assume enterprise edition) have a solution called Cache Insights which uses Alluxio’s extensive access pattern history to predict application IO performance with larger cache (storage), higher speed networking, higher performing/more compute cores, etc. In this way, customers can see what can be done to improve application IO performance and what it would cost.

Keith asked if Alluxio were available as a SaaS solution. Adit said, although it could be deployed in that fashion, it’s not currently a SaaS solution. When asked how Alluxio (enterprise) was priced, Adit said it’s a function of the total resources consumed by their service, i.e, storage (cache), cores, networking that runs Alluxio software etc.

As for deployment options, it turns out for Spark, Alluxio is just another lib package installed inside Spark. For K8s, Alluxio is installed as a CSI drivers and a set of containers and can be deployed as containers within a cluster that needs access to data or in an external, standalone K8s cluster, servicing IO from other clusters. Alluxio HA is supplied by using multiple nodes to provide IO access.

Alluxio also supports access to multiple data locations. In this case, the applications would just access different mount points.

Data reads are easy, writes can be harder due to data integrity issues. As such, trying to supply IO performance becomes a trade off for data integrity when data updates are supported. Adit said Alluxio offers a couple of different configuration options for write concurrency (data integrity) that customers can select from. We assume this includes write through, write back and perhaps other write consistency options.

Alluxio supports AWS, Azure and GCP cloud compute accessing HDFS, S3 and Posix protocol access to data residing at remote sites. At remote sites, they currently support MinIO, Cloudian and any other S3 compatible storage solutions as well as NetApp (ONTAP) and Dell (ECS) storage as data sources.

Adit Madan, Director of Product, Alluxio

Adit Madan is the Director of Product Management at Alluxio. Adit has extensive experience in distributed systems, storage systems, and large-scale data analytics.

Adit holds an MS from Carnegie Mellon University and a BS from the Indian Institute of Technology – Delhi.

Adit is the Director of Product Management at Alluxio and is also a core maintainer and Project Management Committee (PMC) member of the Alluxio Open Source project.