148: GreyBeards talk software defined infrastructure with Anthony Cinelli and Brian Dean, Dell PowerFlex

Sponsored By:

This is one of a series of podcasts the GreyBeards are doing with Dell PowerFlex software defined infrastructure. Today, we talked with Anthony Cinelli, Sr. Director Dell Technologies and Brian Dean, Technical Marketing for PowerFlex. We have talked with Brian before but this is the first time we’ve met Anthony. They were both very knowledgeable about PowerFlex and the challenges large enterprises have today with their storage environments.

The key to PowerFlex’s software defined solution is its extreme flexibility, which comes mainly from its architecture which offers scale-out deployment options ranging from HCI solutions to a fully disaggregated compute-storage environment, in seemingly any combination (see technical resources for more info). With this sophistication, PowerFlex can help consolidate enterprise storage across just about any environment from virtualized workloads, to standalone databases, big data analytics, as well as containerized environments and of course, the cloud. Listen to the podcast to learn more.

To support this extreme flexibility, PowerFlex uses both client and storage software that can be configured together on a server (HCI) or apart, across compute and storage nodes to offer block storage. PowerFlex client software runs on any modern bare-metal or virtualized environment.

Anthony mentioned that one common problem to enterprises today is storage sprawl. Most large customers have an IT environment with sizable hypervisor based workloads, a dedicated database workload, a big data/analytics workload, a modern container based workload stack, an AI/ML/DL workload and more often than not, a vertical specific workload.

Each workload usually has their own storage system. And the problem with 4-7 different storage systems is cost, e.g., cost of underutilized storage. Typical to these environments, each storage system could be used at say, 60% utilization on average, but this will vary a lot between silos, leading to stranded capacity.

The main reason customers haven’t consolidated yet is because each silo has different performance characteristics. As a result, they end up purchasing excess capacity which increases cost and complexity, as a standard part of doing business.

To consolidate storage across these disparate environments requires a no-holds barred approach to IO performance, second to none, which PowerFlex can deliver. The secret to to its high levels of IO performance is RAID 10, deployed across a scale-out cluster. And PowerFlex clusters can range from 4 to 1000 or more nodes.

RAIID 10 mirrors data and spreads mirrored data across all drives and servers in a cluster or some subset. As a result, as you add storage nodes, IO performance scales up, almost linearly.

Yes, there can be other bottlenecks in clusters like this, most often networking, but with PowerFlex storage, IO need not be one of them. Anthony mentioned that PowerFlex will perform as fast as your infrastructure will support. So if your environment has 25 Gig Ethernet, it will perform IO at that speed, if you use 100 Gig Ethernet, it will perform at that speed.

In addition, PowerFlex offers automated LifeCycle Management (LCM), which can make having a 1000 node PowerFlex cluster almost as easy as a 10 node cluster. However to make use this automated LCM, one must run its storage server software on Dell PowerEdge servers.

Brian said adding or decommissioning PowerFlex nodes is a painless process. Because data is always mirrored, customers can remove any node, at any time and PowerFlex will automatically rebuild data across other nodes and drives. When you add nodes, those drives become immediately available to support more IO activity. Another item to note, because of RAID 10, PowerFlex mirror rebuilds happen very fast, as just about every other drive and node in the cluster (or subset) participates in the rebuild process.

PowerFlex supports Storage Pools. This partitions PowerFlex storage nodes and devices into multiple pools of storage used to host volume IO and data Storage pools can be used to segregate higher performing storage nodes from lower performing ones so that some volumes can exclusively reside on higher (or lower) performing hardware.

Although customers can configure PowerFlex to use all nodes and drives in a system or storage pool for volume data mirroring, PowerFlex offers other data placement alternatives to support high availability.

PowerFlex supports Protection Domains which are subsets or collections of storage servers and drives in a cluster where volume data will reside. This will allow one protection domain to go down while others continue to operate. Realize that because volume data is mirrored across all devices in a protection domain, it will take lots of nodes or devices to go down before a protection domain is out of action.

PowerFlex also uses Fault Sets, which are a collection of storage servers and their devices within a Protection Domain, that will contain one half of a volume’s data mirror. PowerFlex will insure that a primary and its mirror copy of volume’s data will not both reside on the same fault set. A fault set could be a rack of servers, multiple racks, all PowerFlex storage servers in an AZ, etc. With fault sets, customer data will always reside across a minimum of two fault sets, and if any one goes down, data is still available.

PowerFlex also operates in the cloud. In this case, customers bring their own PowerFlex software and deploy it over cloud compute and storage.

Brian mentioned that anything PowerFlex can do such as reconfiguring servers, can be done through RESTful/API calls. This can be particularly useful in cloud deployments as above, if customers want to scale up or down IO performance automatically.

Besides block services, PowerFlex also offers NFS/CIFS-SMB native file services using a File Node Controller. This frontends PowerFlex storage nodes to support customer NFS/SMB file access to PowerFlex data.

Anthony Cinelli, Sr. Director Global PowerFlex Software Defined & MultiCloud Solutions

Anthony Cinelli is a key leader for Dell Technologies helping drive the success of our software defined and multicloud solutions portfolio across the customer landscape. Anthony has been with Dell for 13 years and in that time has helped launch our HCI and Software Defined businesses from startup to the multi-billion dollar lines of business they now represent for Dell.

Anthony has a wealth of experience helping some of the largest organizations in the world achieve their IT transformation and multicloud initiatives through the use of software defined technologies.

Brian Dean, Dell PowerFlex Technical Marketing

Brian is a 16+ year veteran of the technology industry, and before that spent a decade in higher education. Brian has worked at EMC and Dell for 7 years, first as Solutions Architect and then as TME, focusing primarily on PowerFlex and software-defined storage ecosystems.

Prior to joining EMC, Brian was on the consumer/buyer side of large storage systems, directing operations for two Internet-based digital video surveillance startups.

When he’s not wrestling with computer systems, he might be found hiking and climbing in the mountains of North Carolina.

147: GreyBeards talk ransomware protection with Jonathan Halstuch, Co-Founder and CTO, RackTop Systems

Sponsored By:

This is another in our series of sponsored podcasts with Jonathan Halstuch (@JAHGT), Co-Founder and CTO of RackTop Systems. You can hear more in Episode 145.

We asked Jonathan what was wrong with ransomware protection today. Jonathan started by mentioning that bad actors had been present, on average, 277 days in an environment before being detected. That much dwell time, means they could have easily corrupted most backups and snapshots, stolen copies of all your most of sensitive/proprietary data, and of course, encrypted all your storage.

Backup ransomware protection works ok if dwell time is a couple of days or even a week, but not multiple months or longer.. The only real solution to this level of ransomware sophistication is real time monitoring of IO, looking for illegal activity. Listen to the podcast to learn more

Often, any data corruption, when discovered, is just notification to an unsuspecting IT organization that they have been compromised and lost control over their systems. Sort of like having a thief ring the door bell to tell you they stole all your stuff after the fact.

The only real solution to data breaches and ransomware attacks with significant dwell time, that protects both your data and your reputation is something like RackTop Systems and their BrickStore SP storage system. BrickStore offers an ongoing, in real-time, active defense against ransomware that’s embedded in your data storage, that’s continuously looking for bad actors and their activities during IO activity, all day, every day. 

When BrickStor detects ransomware in progress it shuts it down, by halting any further access to that user/apllication and snapshots the data before corruption, to immutable snapshots. That way admins have a good copy of data.

In addition, RackTop BrickStor SP supplies run book like recovery procedures that tell IT how to retrieve good data from snapshots, without wasting valuable time searching for the “last good backup”, which could be months old.

I asked whether data at rest encryption could offer any help. Jonathan said data encryption can thwart only some types of attacks. But it’s not that useful for ransomware, as bad actors who infiltrate your system masquerade as valid users/admins and by doing so, gain access to decrypted data.  

RackTop Systems uses AI in its labs to create ransomware “assesors”, automated routines embedded in their storage data path, which continuously execute looking for bad actor IO patterns. It’s these assessors that provide the first line of defense against ransomware.

In addition to assessors, Racktop Systems supplies many reports which depict data access permissions, user/admin access permissions, data being accessed, etc. All of which help IT and security teams better understand how data is being used and provide the visibility needed to help support better cyber security

When ransomware is detected, RackTop BrickStor offers a number of different notification features that range from web-hooks and slack channels to email notices and just about everything in between to notify IT and security teams that a breach is occurring and where.

RackTop Systems BrickStor SP is available in many deployments. One new option, from HPE, uses their block storage to present LUNs to BrickStor SP. Jonathan mentioned that other enterprise class block storage vendors are starting to use BrickStor SP to supply secure NAS services for their customers as well.

Jonathan mentioned that RackTop attended the HIMSS conference in Chicago last week and will be attending many others throughout the year. So check them out at a conference near you if you get a chance.

Jonathan Halstuch, Co-Founder & CTO RackTop Systems

Jonathan Halstuch is the Chief Technology Officer and co-founder of RackTop Systems. He holds a bachelor’s degree in computer engineering from Georgia Tech as well as a master’s degree in engineering and technology management from George Washington University.

With over 20-years of experience as an engineer, technologist, and manager for the federal government he provides organizations the most efficient and secure data management solutions to accelerate operations while reducing the burden on admins, users, and executives.

146: GreyBeards talk K8s cloud storage with Brian Carmody, Field CTO, Volumez

We’ve known Brian Carmody (@initzero), Field CTO, Volumez for over a decade now and he’s always been very technically astute. He moved to Volumez earlier this year and has once again joined a storage startup. Volumez is a cloud K8s storage provider with a new twist, K8s persistent volumes hosted on ephemeral storage.

Volumes currently works in public clouds (AWS & Azure( soft launch), with GCP coming soon) and is all about supplying high performing, enterprise class data services to K8s container apps. But doing this using transient (Azure ephemeral &AWS instance) storage and standard Linux. Hyperscalers offer transient storage as almost an afterthought with customer compute instances. Listen to the podcast to learn more.

It turns out that over the last decade or so, there has been a lot of time and effort devoted to maturing Linux’s storage stack and nowadays, with appropriate configuration, Linux can offer enterprise class data services and performance using direct attached NVMe SSDs. These services include thin provisioning, encryption, RAID/erasure coding, snapshots, etc., which on top of NVMe SSDs, provide IOPS, bandwidth and latency performance that boggles the mind.

However, configuring Linux sophisticated and high performing data services is a hard problem to solve..

Enter Volumez, they have a SaaS control plane, client software plus CSI drivers that will configure Linux with ephemeral storage to support any performance and data service that can be obtained from NVMe SSDs.

Once installed on your K8s cluster, Volumez software profiles all ephemeral storage, and supplies that information to their SaaS control plane. Once that’s done your platform engineers can define specific storage class policies or profiles useable by DevOps to consume ephemeral storage. .

These policies identify volume [IOPs, Bandwidth, Latency] X [read, write] performance specifications as well as data protection, resiliency and other data service requirements. DevOps engineers consume this storage using PVCs that call for these storage classes at some capacity. When it sees the PVC claim, Volumez SaaS control plane will carve out slices of ephemeral storage that can support the performance and other storage requirements defined in the storage class.

Once that’s done, their control plane next creates a network path from the compute instances with ephemeral storage to the worker nodes running container apps. After that it steps out of the picture and the container apps have a direct (network) data path to the storage they requested. Note, Volumez’s SaaS control plane is not in the container app storage data path at all.

Volumez supports multi-AZ data resiliency for PVCs. In this case, another mirror K8s cluster would reside in another AZ, with Volumez software active and similar if not equivalent ephemeral storage. Volumez will configure the container volume to mirror data between AZs. Similarly, if the policy requests erasure coding, Volumez SaaS software configures the ephemeral storage to provide erasure coding for that container volume.

Brian said they’ve done some amazing work to increase the speed of Linux snapshotting and restoring.

As noted above, the Volumez control plane SaaS software is outside the data path, so even if the K8s cluster running Volumez enabled storage loses access to the control plane, container apps continue to run and perform IO to their storage. This can continue until there’s a new PVC request that requires access to their control plane.

Ephemeral storage is accessed through special compute instances. These are not K8s worker nodes and they essentially act as a passthru or network attachment between worker nodes running apps with PVC’s and the Volumez configured Linux Logical Volumes hosted on slices of ephemeral storage.

Volumez is gaining customer traction with data platform clients, DBaaS companies, and some HPC environments. But just about anyone needing high performing data services for cloud K8s container apps should give Volumez a try.

I looked at AWS to see how they price instance store capacities and found out it’s not priced separately, but rather instance storage is bundled into the cost of EC2 compute instances.

Volumez is priced based on the number of media devices (instance/ephemeral stores) and performance (IOPs) available. They also have different tiers depending on support level requirements (e.g., community, Business hrs, 7X24) which also offers different levels of enterprise security functionality.

Brian said they have a free tier that customers can easily signup for and try out by going to their web site (see link above), or if you would like a guided demo, just contact him directly.

Brian Carmody, Field CTO, Volumez

Brian Carmody is Field CTO at Volumez. Prior to joining Volumez, he served as Chief Technology Officer of data storage company Infinidat where he drove the company’s technology vision and strategy as it ramped from pre-revenue to market leadership.

Before joining Infinidat, Brian worked in the Systems and Technology Group at IBM where he held senior roles in product management and solutions engineering focusing on distributed storage system technologies.

Prior to IBM, Brian served as a technology executive at MTV Networks Viacom, and at Novus Consulting Group as a Principal in the Media & Entertainment and Banking practices.

145: GreyBeards talk proactive NAS security with Jonathan Halstuch, CTO & Co-Founder, RackTop Systems

Sponsored By:

We’ve known about RackTop Systems. since episode 84, and have been watching them ever since. On this episode we, once again, talk with Jonathan Halstuch (@JAHGT), CTO and Co-Founder, RackTop Systems.

RackTop was always very security oriented but lately they have taken this to the next level. As Jonathan says on the podcast, historically security has been mostly a network problem but since ransomware has emerged, security is now often a data concern too. The intent of proactive NAS security is to identify and thwart bad actors before they impact data, rather than after the fact. Listen to the podcast to learn more.

Proactive security for NAS storage includes monitoring user IO and administrator activity and looking for anomalies. RackTop has the ability (via config options) to halt IO activity when things look wrong, that is user/application IO looks differently than what has been seen in the past. They also examine admin activity, a popular vector for ransomware attacks. RackTop IO/admin activity scanning is done in real time as IO is processed and admin commands received.

The customer gets to decide how far to take this. The challenge with automatically halting access is false positives, when say a new application starts taking off. Security admins must have an easy way to see and understand what was anomalous/what not and to quickly let that user/application return to normal activities or take it out.

In addition to just stopping access, they can also just report it to admins/security staff. Moreover, the system can also automatically take snapshots of data when anomalous behavior is detected, to give admins and security a point-in-time view into the data before bad behavior occurs.

RackTop Systems have a number of assessors that look for specific anomalous activity used to detect and act to twart malware. For example, an admin assessor is looking at all admin operations to determine if these are considered normal or not.

RackTop also support special time period access permissions. These provide temporary, time-dependent, unusual access rights to data for admins, users or applications that would normally be considered a breach. Such as having an admin copying lots of data or moving and deleting data. These are for situations that crop up where mass data deletion, movement or copying would be valid. When the time period access permission elapses, the system goes back into monitoring for anomalous behavior.

We talked about the overhead of doing all this scanning and detection in real time and how that may impact system IO performance. For other storage vendors, these sorts of activities are often done with standalone appliances, which of course add additional IO to a storage system to do offline scans.

Jonathan said, with recent Intel Xeon multi-core processors, they can readily afford the CPU cycles/cores required to do their scanning during IO processing, without sacrificing IO performance.

RackTop also supports a number of reports to show system configured data/user/application access rights as well as what accesses have occurred over time. Such reports offer admin/security teams visibility into data access rights and usage.

RackTop can be deployed in hybrid disk-flash solutions, as storage software in public clouds, in an HCI solution, or in edge environments that replicate back to core data centers. And they can also be used as a backup/archive data target for backup systems. RackTop Systems NAS supports CIFS 1.0- SMB 3.1.1, and NFSv3-v4.2.

RackTop Systems have customers in national government agencies, security sensitive commercial sectors, state gov’t, healthcare, and just about anyone subject to ransomware attacks on a regular basis. Which nowadays, is pretty much every IT organization on the planet.

Jonathan Halstuch, CTO & Co-Founder, RackTop Systems

Jonathan Halstuch is the Chief Technology Officer and co-founder of RackTop Systems. He holds a bachelor’s degree in computer engineering from Georgia Tech as well as a master’s degree in engineering and technology management from George Washington University.

With over 20-years of experience as an engineer, technologist, and manager for the federal government he provides organizations the most efficient and secure data management solutions to accelerate operations while reducing the burden on admins, users, and executives.

143: GreyBeards talk Chia cypto with Jonmichael Hands, VP Storage at Chia Project

Today we interview Jonmichael Hands (@LebanonJon, LinkedIn), VP Storage at Chia Project , who has been in and around the storage business forever, mostly with Intel and their SSD team, before it was sold. He was technical marketing for NVMe. He also ran the security and crypto track at FMS2022. He recently worked on sustainability, helping to create a circular economy for disk and SSD storage. Moreover, he assisted IEEE with their new (media) sanitization standard to make reuse/recycling storage easier.

Chia was born to provide a way to take advantage of storage media for blockchains in a government compliant way so that it could be spun off as a public company someday. Chia is a crypto currency that depends on proof of space (storage space exists) and proof of time (storage space is reserved for a period of time). There have been many crypto coins based on proof of work (running hard cryptographic algorithms to come up with some specific bit pattern). And ETH was forked last year to support proof of stake (where one stakes some amount of ETH for a defined period). But few, if any, have been based on proof of space and time.

Disk and SSD commands already exist to provide “Secure Erase” (multiple passes of different bit patterns overwriting the same block) and cryptographic erasure (For encrypted drives, the encryption key is changed). Both approaches insure that customer/organization data is no longer retained on media leaving an organization’s control. And yet, many companies use secure erase/cryptographic erasure and still shred disk drives and SSDs, just to be sure that no data is retained. This is a vast waste of energy and resources.

Jonmichael said that both disk and SSD drives typically have another 5 years beyond their guaranteed (5 year) production life where both can function perfectly well as storage devices (ok may performance may not be the same as current drives). And after using them for another 5 years, they are much easier to recycle, if left un-shredded and returned to manufacturers, who can dismantle them to reuse expensive components and rare earth materials.

We didn’t spend much time on the technical underpinnings of Chia so if you are interested in that we suggest you check out Jonmichael’s FMS2022 presentation video.

But if you’re interested in a high level understanding of Chia and what one can do with it we did cover that. For example, Chia has farmers (not miners). Farmers create (~100GB) Chia plot files and store these on media.

Plot files take some amount of CPU power and memory to create but once created can stay on storage forever. What makes Chia work is that it comes out and checks to see if you have a certain plot file and if you do you get rewarded for that. Jonmichael said that with a typical Chia crypto setup, one could make $0.50/TB/Month farming Chia.

The Chia project currently has about 24EB of plots online and at their peak had over 300EB. They also have 130K farmers in their current network. Bitcoin, at its peak, had about 60K miners. Jonmichael thinks Chia crypto coin may be the most distributed crypto coin in existence today.

A couple of years back Chia accounted for a significant amount of new disk drive purchases but that has died down considerably since then. As discussed earlier, Jonmichael is working to create a circular economy for storage that could lead to media reuse for Chia farming.

Jonmichael mentioned that Chia has matured significantly since peak use. It used to be that creating Chia plot files required high end CPUs and lots of technical skills, but today Jonmichael said you can be a farmer with an RPi. He did say that they have moved to making better use of available memory in the plotting process and have reduced the write load on the storage media.

Another aspect to Chia’s maturation is that they now support Chia smart coins or smart contracts. They have created ChiaLisp, a Turing complete language, as their language to implement Chia smart coins. It turns out that Lisp and other functional languages provide a natural way to implement secure code. Jonmichael mentioned that other crypto coins are starting to move towards using ChiaLisp.

Some recent innovations in Chia smart coins include:

  • Chia Offer Management – that is anything you wish to trade can be digitally tracked and traded using this Chia Offer Management smart coins.
  • Chia NFTs (non-fungible token) Management – NFT’s have been used by other blockchains to sell digital rights to assets Chia’s support for NFTs opens Chia up to this as well. The reference implementation for Chia’s NFT management is Chia Friends, where all proceeds are being donated to the Marmot Recovery Foundation.
  • Chia Data Layer Management, a federated database – here the Chia block chain is being used to support a K-V store, where the block chain stores the Key and a hash of the Value. Users can use this Chia Data Layer to store any key-hash(value) database they wish. It’s important to realize that actual the data or value is stored external to the Chia block chain.

The Data Layer solution is currently being used to develop a way to track carbon credits by the World Bank (see: the Climate Action Data Trust).

Chia has come a long way. In its heyday it was significant consumer of new disk media but with what Jonmichael and others have planned for it is to take advantage of the longer term life of storage media and to use this for the benefit of all humanity.

Jonmichael Hands, VP Storage at Chia Project

Jonmichael Hands partners with the storage vendors for Chia optimized product development, market modeling, and Chia blockchain integration.

Jonmichael spent the last ten years at Intel in the Non-Volatile Memory Solutions group working on product line management, strategic planning, and technical marketing for the Intel data center SSDs.

In addition, he served as the chair for NVM Express (NVMe), SNIA (Storage Networking Industry Association) SSD special interest group, and Open Compute Project for open storage hardware innovation.

Jonmichael started his storage career at Sun Microsystems designing storage arrays (JBODs) and holds an electrical engineering degree from the Colorado School of Mines.