Data Protection – Grey Beards on Systems

November 4, 2024November 4, 2024

167: GreyBeards talk Distributed S3 storage with Enrico Signoretti, VP Product & Partnerships, Cubbit

Long time friend, Enrico Signoretti (LinkedIn), VP Product and Partnerships, Cubbit, used to be a common participant at Storage Field Day (SFD) events and I’ve known him since we first met there. Since then, he’s worked for a startup and a prominent analyst firms. But he’s back at another startup and this one looks like it’s got legs.

Cubbit offers Distributed S3 compatible object storage that offers geo-distribution and geo-fencing for object data, in which the organization owns the hardware and Cubbit supplies the software. There’s a management component, the Coordinator, which could run on your hardware or as a SaaS service they provide but other than that, IT controls the rest of the system hardware. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 52:27 — 72.0MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Cubbit comes in 3 components:

One or more Storage nodes which includes their agent software running ontop of a linux system with direct attached storage.
One or more Gateway nodes which provides S3 protocol acces to the objects stored on storage nodes. Typical S3 access points https://S3.company_name, com/… points to either a load balancer, front end or one or more Gateway nodes. Gateway nodes provide the mapping between the bucket name/object identifier and where the data currently resides or will reside.
One Coordinator node which provides the metadata to locate the data for objects, manage the storage nodes, gateways and monitor the service. The Coordinator node can be a SaaS service supplied by Cubbit or a VM/bare metal node running Cubbit Coordinator software. Metadata is protected internally within the Coordinator node.

With these three components one can stand up a complete, geo-distributed/geo-fenced, S3 object storage system which the organization controls.

Cubbit encrypts data as it at the gateway and decrypts data when accessed. Sign-on to the system uses standard security offerings. Security keys can be managed by Cubbit or by standard key management systems.

All data for an object is protected by nested erasure codes. That is 1) erasure code within a data center/location over its storage drives and 2) erasure code across geographical locations/data centers..

With erasure coding across locations, customer with say 10 data center locations can have their data stored in such a fashion that as long as at least 8 data centers are online they still have access to their data, that is the Cubbit storage system can still provide data availability.

Similarly for erasure coding within the data center/location or across storage drives, say with 12 drives per stripe, one could configure lets say 9+3 erasure coding, where as long as 9 of the drives still operate, data will be available.

Please note the customer decides the number of locations to stripe across for erasure coding, and diet for the number of storage drives.

The customer supplies all the storage node hardware. Some customers start with re-purposed servers/drives for their original configuration and then upgrade to higher performing storage-servers-networking as performance needs change. Storage nodes can be on prem, in the cloud or at the edge.

For adequate performance gateways and storage nodes (and coordinator nodes) should be located close to one another. Although Coordinator nodes are not in the data path they are critical to initial object access.

Gateways can provide a cache for faster local data access.. Cubbit has recommendations for Gateway server hardware. And similar to storage nodes, Gateways can operate at the edge, in the cloud or on prem.

Use cases for the Distributed S3 storage include:

As a backup target for data elsewhere
As a geographically distributed/fenced object store.
As a locally controlled object storage to feed AI training/inferencing activity.

Most backup solutions support S3 object storage as a target for backups.

Geographically distributed S3 storage means that customers control where object data is located. This could be split across a number of physical locations, the cloud or at the edge.

Geographically fenced S3 storage means that the customer controls which of its many locations to store an object. For GDPR countries with multi-nation data center locations this could provide the compliance requirements to keep customer data within country.

Cubbit’s distributed S3 objects storage is strongly consistent in that an object loaded into the system at any location is immediately available to any user accessing it through any other gateway. Access times vary but the data will be the same regardless of where you access it from.

The system starts up through an Ansible playbook which asks a bunch of questions and loads and sets up the agent software for storage nodes, gateway nodes and where applicable, the coordinator node.

At any time, customers can add more gateways or storage nodes or retire them. The system doesn’t perform automatic load balancing for new nodes but customers can migrate data off storage nodes and onto other ones through api calls/UI requests to the Coordinator.

Cubbit storage supports multi-tenancy, so MSPs can offer their customers isolated access.

Cubbit charges for their service on data storage under management. Note it has no egress charges, and you don’t pay for redundancy. But you do supply all the hardware used by the system. They offer a discount for M&E customers as the metadata to data ratio is much smaller (lots of large files) than most other S3 object stores (mix of small and large files).

Cubbit is presently available only in Europe but will be coming to USA next year. So, if you are interested in geo-distributed/geo-fenced S3 object storage that you control and can be had for much cheaper than hyperscalar object storage, check it out.

Enrico Signoretti, VP Products & Partnerships

Enrico Signoretti has over 30 years of experience in the IT industry, having held various roles including IT manager, consultant, head of product strategy, IT analyst, and advisor.

He is an internationally renowned visionary author, blogger, and speaker on next-generation technologies. Over the past four years, Enrico has kept his finger on the pulse of the evolving storage industry as the Head of Research Product Strategy at GigaOm. He has worked closely and built relationships with top visionaries, CTOs, and IT decision makers worldwide.

Enrico has also contributed to leading global online sites (with over 40 million readers) for enterprise technology news.

April 25, 2023April 25, 2023

147: GreyBeards talk ransomware protection with Jonathan Halstuch, Co-Founder and CTO, RackTop Systems

Jonathan Halstuch, Co-Founder & CTO RackTop Systems

Jonathan Halstuch is the Chief Technology Officer and co-founder of RackTop Systems. He holds a bachelor’s degree in computer engineering from Georgia Tech as well as a master’s degree in engineering and technology management from George Washington University.

With over 20-years of experience as an engineer, technologist, and manager for the federal government he provides organizations the most efficient and secure data management solutions to accelerate operations while reducing the burden on admins, users, and executives.

December 12, 2022

140: Greybeards talk data orchestration with Matt Leib, Product Marketing Manager for IBM Spectrum Fusion

As our listeners should know, Matt Leib (@MBleib) was a GreyBeards co-host But since then, Matt has joined IBM to become Product Marketing Manager on IBM Spectrum Fusion, a data orchestration solution for Red Hat OpenShift environments. Matt’s been in and around the storage and data management industry for many years which is why we tapped him for GreyBeards co-host duties.

IBM Fusion, in its previous incarnation, came as an OpenShift software defined storage or as an OpenShift (H)CI solution. But recently, Fusion has taken on more of a data orchestration role for OpenShift stateful containerized applications. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 46:59 — 64.5MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Fusion can run in any OpenShift deployment whether (currently AWS, Azure, & IBM) clouds, under VMware (wherever it runs), or on (x86 or IBM Z) bare metal. It supplies NFS file or S3 compatible object storage for container applications running under OpenShift. But it does more than just storage.

Beyond storage, Fusion includes backup/recovery, site to site DR and global (file & object) data access. It’s almost like someone opened up the IBM Spectrum software pantry and took out the best available functionality and cooked it up in to an OpenShift solution. IBM’s Spectrum Fusion current website (linked to above (Dec.’22)) still refers only to the software defined storage and (H)CI solution, but today’s Fusion includes all of the functions identified above.

All Fusion facilities run as containers under OpenShift. Customers can elect to run all Fusion services or pick and chose which ones they want for their environment. IBM Fusion supports an API, an API backed GUI, and CLI for its storage & data management as well as REST access. Fusion is fully compatible with Red Hat Ansible.

IBM Fusion is intended to be storage agnostic. Which means it can support its data management services for any NFS file storage as well as anyone’s S3 compatible, object storage.

Now that Red Hat software defined CEPH and ODF are under IBM product management, CEPH and ODF options will become available under Fusion. And CEPH offers block as well as file and object. We’ve talked about CEPH before, packaged in a hardware appliance, see our SoftIron podcast.

One intriguing part of the Fusion solution is its global data access. With global access, any OpenShift application can access data from any Fusion data store, across clouds, across on prem installations, or just about anywhere OpenShift is running. Matt mentioned that compute could be on AWS OpenShift, Fusion’s data control plane could be running on prem OpenShift and the data storage could be running on Azure OpenShift. All this would be glued together by Fusion global access, so that AWS compute had access to data on Azure.

There’s some sophisticated caching magic to make global access happen seamlessly and with decent levels of performance, but customers no longer have to copy whole file systems over from one cloud to another in order to move compute or data. IBM Fusion would need to run in all those locations for global access.

Keith asked if it was directly available in the AWS marketplace. Matt said not yet but you can deploy OpenShift out of the marketplace and then deploy IBM Fusion onto that.

It took us sometime to get our heads wrapped around what Fusion has to offer and throughout it all, Keith and I had a bit of fun with Matt.

Matthew Leib, Product Marketing Manager, IBM Spectrum Fusion

Matt has spent years in IT, from Engineering, to Architecture, from PreSales to analyst work, and finally to Product Marketing at IBM.

He’s spent years trying to achieve both credibility in the space, as a podcaster, blogger, and community member.

In his spare time, he’s a dad, dog owner, and amateur guitar player..

August 1, 2022August 27, 2022

135: Greybeard(s) talk file and object challenges with Theresa Miller & David Jayanathan, Cohesity

Theresa Miller, Director, Technology Advocacy Group, Cohesity

Theresa Miller is the Director, Technology Advocacy Group at Cohesity. She is an IT professional that has worked as a technical expert in IT for over 25 years and has her MBA.

She is uniquely industry recognized as a Microsoft MVP, Citrix CTP, and VMware vExpert. Her areas of expertise include Cloud, Hybrid-cloud, Microsoft 365, VMware, and Citrix.

David Jayanathan, Solutions Architect, Cohesity

David Jayanathan is a Solutions Architect at Cohesity, currently working on SmartFiles.

DJ is an IT professional that has specialized in all things related to enterprise storage and data protection for over 15 years.

April 12, 2022April 12, 2022

131: GreyBeards talk native K8s data protection using Veritas NetBackup with Reneé Carlisle

The GreyBeards have been discussing K8s storage services a lot over the last year or so and it was time to understand how container apps and data could be protected. Recently, we saw an article about a Veritas funded survey, discussing the need for data protection in K8s. As such, it seemed a good time to have a talk with Reneé Carlisle (@VeritasTechLLC), Staff Product Manager for NetBackup (K8S), Veritas.

It turns out that Veritas NetBackup (NBU) has just released their 2nd version of K8s data protection. It’s gone completely (K8s) native. That is, Veritas have completely re-implemented all 3 tiers of NBU as K8s micro services. Moreover, the new release still supports all other NBU infrastructure implementations, such as bare metal or VM NBU primary server/media server services. It’s almost like you have all the data protection offered by NBU for the enterprise over the years, now also available for K8s container apps. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 48:22 — 66.4MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Veritas-22-04-transcript Download

To make use of NBU K8s, backup admins establish named gold, silver, bronze backup policies selecting frequency of backups, retention periods, backup storage, etc. Then DevOps would tag a namespace, pods, containers, or PVs with those data protection policy names. Once this is done, NBU K8S will start protecting that namespace, pod, container, or PV.

In addition, backup admins can include or exclude specific K8s namespace(s), pod(s), container(s), labels (tags), or PVs to be backed up with a specific policy. When that policy is triggered it will go out into the cluster to see if those K8s elements are active and start protecting them or excluding them from protection as requested.

NBU K8s has an Operator service, Data Mover services and other micro services that execute in the cluster. That is, at least one Operator service must be deployed in the cluster (recommended to be in a separate namespace but this is optional). The Operator service is the control plane for NBU K8S services. It will spin up data movers when needed and spin them down when done.

The Operator service supports a CLI but more importantly to DevOps, a complete implemented RESTful API service. Turns out the CLI is implemented ontop of the NBU (Operator) API. With the NBU API DevOps CI/CD tools or other automation can perform all the data protection services to protect K8s.

One historical issue with backup processing is that it can consume every ounce of network/storage and sometimes compute power in an environment. The enterprise class data movers (or maybe the Operator control plane) has various mechanisms to constrain or limit NBU K8S resource consumption so that this doesn’t become a problem.

But as the Operator and its Data Mover are just micro services, if there’s need for more throughput, more can be spun up or if there’s a need to reduce bandwidth, some of them can be spun down, all with no manual intervention whatsoever.

Furthermore, NBU K8s can be used to restore/recover PVs, containers, applications or namespaces to other, CNCF compliant K8s infrastructure. So, if you wanted to say, move your K8s namespace from AKS to GKE or onprem to RedHat OpenShift, it becomes a simple matter of moving the last NBU backup to the target environment, deploying NBU K8s in that environment and restoring the namespace.

NBU K8s can also operate in the cloud just as well as on prem and works in any CNCF compatible K8s environment which includes AKS, EKS, GKE, VMware Tanzu and OpenShift.

In the latest NBU K8s they implemented new, enterprise class Data Movers as micro services in order to more efficiently protect and recover K8S resources. Enterprise class Data Movers can perform virus-scanning/ransomware detection, encryption, data compression, and other services that enterprise customers have come to expect from NBU data protection.

NBU K8S accesses PV data, container, pod and namespace data and metadata using standard CSI storage provider and normal K8s API services.

As mentioned earlier, in the latest iteration of NBU K8s, they have completely implemented their NBU infrastructure, natively as containers. That adds, K8s auto-scaling, full CI/CD automation via APIs, to all the rest of NBU infrastructure operating completely in the K8s cluster.

So, now backup admins can run NBU completely in K8s or run just the Operator and its data mover services connecting to other NBU infrastructure (primary server and media servers) executing elsewhere in the data center.

NBU K8s supports all the various, disk, dedicated backup appliances, object/cloud storage or other backup media options that NBU uses. So that means you can store your K8s backup data on the cloud, in secondary storage appliances, or anyplace else that’s supported by NBU.

Licensing for NBU K8s follows the currently available Veritas licensing such as front end TB protected, subscription and term licensing options are available.

Reneé Carlisle, Staff Product Manager, Veritas NetBackup (K8S)

Reneé (LinkedIn) has been with Veritas Technologies for eleven years in various focus areas within the NetBackup Product Management Team. In her current role she is the Product Manager responsible for the NetBackup strategic direction of Modern Platforms including Kubernetes and OpenStack. She has a significant technical background into many of the NetBackup features including Kubernetes, virtualization, Accelerator, and cloud.

Prior to working for Veritas, she was a customer running a large-scale NetBackup operation as well as a partner implementing, designing, and integrating NetBackup in many different companies.

	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…
	J Metz on 134: GreyBeards talk (storage)…
	Administrator on 68: GreyBeards talk NVMeoF/TCP…

Category: Data Protection

167: GreyBeards talk Distributed S3 storage with Enrico Signoretti, VP Product & Partnerships, Cubbit

Enrico Signoretti, VP Products & Partnerships

147: GreyBeards talk ransomware protection with Jonathan Halstuch, Co-Founder and CTO, RackTop Systems

Sponsored By:

Jonathan Halstuch, Co-Founder & CTO RackTop Systems

140: Greybeards talk data orchestration with Matt Leib, Product Marketing Manager for IBM Spectrum Fusion

Matthew Leib, Product Marketing Manager, IBM Spectrum Fusion

135: Greybeard(s) talk file and object challenges with Theresa Miller & David Jayanathan, Cohesity

Sponsored By:

Theresa Miller, Director, Technology Advocacy Group, Cohesity

David Jayanathan, Solutions Architect, Cohesity

131: GreyBeards talk native K8s data protection using Veritas NetBackup with Reneé Carlisle

Reneé Carlisle, Staff Product Manager, Veritas NetBackup (K8S)