138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio

We have never talked with Alluxio before but after coming back last week from Cloud Field Day 15 (CFD15) it seemed a good time to talk with other solution providers attempting to make hybrid cloud easier to use. Adit Madan (@madanadit) , Director of Product Management, Alluxio, which is a data orchestration solution that’s available in both a free to download/use, open source, community edition (apparently, Meta is a customer ) or a licensed, closed source, enterprise edition.

Alluxio data orchestration is all about suppling local like, IO access to data that resides elsewhere for BI, AI/ML/DL, and just about any other application needing to process data residing elsewhere. Listen to the podcast to learn more

Alluxio started out at UC Berkeley’s AMPlab, which is focused on big data problems and was designed to provide local access to massive amounts of distributed data. Alluxio ends up constructing a locally accessible, federation of data sources for compute apps running elsewhere,

Alluxio software installs near where compute apps run that need access to remote data. We asked about a typical cloud bursting case where S3 object data needed by an app are sitting on prem, but the apps need to run in a cloud, e.g., AWS.

He said Alluxio software would be deployed in AWS, close to app compute and that’s all there is. There’s no Alluxio software running on prem, as Alluxio just uses normal (remote access) S3 APIs to supply data to the compute apps running in AWS.

Adit mentioned that BI was one of the main applications to take advantage of Alluxio, but AI/ML/DL learning is another that could use data orchestration. It turns out that AI/ ML/DL training’s consumption of data is repetitive and highly sequential, so caching, sequential pre-fetch and other Alluxio techniques can work well there to provide local-like access to remote data.

Adit said that enterprises are increasingly looking to avoid vendor lock-in and this applies equally well to the cloud. By supporting data access in one location, say GC,P and accessing that data from another, say Azure, data gravity need no longer limit where work is done.

Adit said what makes their solution so valuable is that instead of duplicating all data from one place to another all that Alluxio moves is just the data required/requested by the apps running there.

Keith asked whether Adit considered Alluxio a data mesh or data fabric. Keith had to explain the terms to me and said data fabrics are pipes and physical infrastructure/functionality that moves data around and data mesh is what gives clients/apps/users access to that data. From that perspective Alluxio is a data mesh.

Alluxio Caching

Adit said that caching is one of the keys to making Alluxio work. Much of the success of their solution depends on applications having a well behaved working set. He also mentioned they use pre-fetching and other techniques to minimize access latency and maximize throughput. However, the first byte of data being accessed may take some time to get to where compute executes.

Adit said it’s not unusual for them to have a 1/2PB of cache (storage) for an application with multiPBs of source data.

Keith asked how Alluxio’s performance can be managed. Adit said they (we assume enterprise edition) have a solution called Cache Insights which uses Alluxio’s extensive access pattern history to predict application IO performance with larger cache (storage), higher speed networking, higher performing/more compute cores, etc. In this way, customers can see what can be done to improve application IO performance and what it would cost.

Keith asked if Alluxio were available as a SaaS solution. Adit said, although it could be deployed in that fashion, it’s not currently a SaaS solution. When asked how Alluxio (enterprise) was priced, Adit said it’s a function of the total resources consumed by their service, i.e, storage (cache), cores, networking that runs Alluxio software etc.

As for deployment options, it turns out for Spark, Alluxio is just another lib package installed inside Spark. For K8s, Alluxio is installed as a CSI drivers and a set of containers and can be deployed as containers within a cluster that needs access to data or in an external, standalone K8s cluster, servicing IO from other clusters. Alluxio HA is supplied by using multiple nodes to provide IO access.

Alluxio also supports access to multiple data locations. In this case, the applications would just access different mount points.

Data reads are easy, writes can be harder due to data integrity issues. As such, trying to supply IO performance becomes a trade off for data integrity when data updates are supported. Adit said Alluxio offers a couple of different configuration options for write concurrency (data integrity) that customers can select from. We assume this includes write through, write back and perhaps other write consistency options.

Alluxio supports AWS, Azure and GCP cloud compute accessing HDFS, S3 and Posix protocol access to data residing at remote sites. At remote sites, they currently support MinIO, Cloudian and any other S3 compatible storage solutions as well as NetApp (ONTAP) and Dell (ECS) storage as data sources.

Adit Madan, Director of Product, Alluxio

Adit Madan is the Director of Product Management at Alluxio. Adit has extensive experience in distributed systems, storage systems, and large-scale data analytics.

Adit holds an MS from Carnegie Mellon University and a BS from the Indian Institute of Technology – Delhi.

Adit is the Director of Product Management at Alluxio and is also a core maintainer and Project Management Committee (PMC) member of the Alluxio Open Source project.

137: GreyBeards talk VMware Explore 2022 Wrap-up

Jason Collier Principle Member of Technical Staff, AMD (@bocanuts), a current GreyBeardsOnStorage co-host and I both attended VMware Explore 2022 this past week and we recorded a podcast discussing VMware’s announcements on the show floor. It turns out that Keith Townsend, TheCTOAdvisor (@thectoadvisor) had brought his Airstream &studio and was exhibiting on the show floor. Keith kindly offered the use of his studio to record the podcast.

This one is a video. Let us know what you think. I clearly need a cowboy hat and Jason said (off camera) that I’m showing more grey in my beard than before. I take that as a compliment here.

Here’s the news as we saw it:

  • vSphere 8 – has a number of new features but the ones we thought important were the GA of Project Monterey. This supports new DPUs that now run ESXi out board from the CPU. They are able to offload lot’s of the CPU networking cycles to the DPU freeing up these for other (more important) work. vSphere 8 supports 2 DPUs now, the NVIDIA (Mellanox) BlueField(-2?) DPU and the AMD (Pensando) DPU. AMD recently purchased Pensando and Jason seemed to know an awful lot about this tech. VMware also announced support for concurrent ESXi upgrades which can now allow upgrading ESXi running in DPUs while hosts and clusters continue to operate. Finally, the other item of interest was vSphere is now more API driven. I guess it’s only a matter of time before all VMware functionality is API driven to make it even more cloud-like
  • vSAN 8 – also has a number of new features. The first we discussed was is a faster data path. This means more IOPS, more bandwidth and lower latency for IOs. Next, vSAN 8 now supports single tier storage pools . These will no longer require a caching layer. This should also speed up IO operations (as long as the single tier is at least as fast as the old caching layer). They also announced faster snapshots. Apparently this has been a problem in the past and they’ve done the work to speed this up considerably. Jason mentioned an AMD open source VM migration tool (from somebody else’s X86 CPUs to AMDs) that depends a lot on vSAN snapshots.
  • Cloud Flex Storage – mentioned at the show but not well explained, Jason and I speculated that this was an internal storage service available on for Cloud Foundation users on AWS where customers could subscribe to storage as-a-service in much lower increments (maybe even GB/month) than standing up more vSAN hosts to increase storage.
  • NetApp FsX (ONTAP) storage – along the same line, VMware announced support for NetApp’s FsX as yet another storage option for Cloud Foundation users on AWS. Supplying yet another storage-as-a-service option for this environment.
  • Cloud Flex Compute – also mentioned at the show was their new Compute-As-A-Service for Cloud Foundation users on AWS. This way users could subscribe to more or less compute, on an as needed basis rather than having to spin up new ESXi hosts. I later found out this allows users to run a single VM and pay for it on a subscription basis.
  • Tanzu Application Platform (TAP) – is a new VMware supplied (and supported) “development experience” for K8s on vSphere. Note, it doesn’t include any advanced Tanzu services such as Tanzu K8s Grid (TKG) so it’s a true DevOps bare bones environment.
  • Tanzu K8S Operations (TKO) – another new Tanzu based service which offers operations complete control over the Tanzu services running on vSphere. Note Tanzu Mission Control (TMC) is not part of TKO.
  • Aria management – VMware rebranded vRealize and CloudHealth, which now comes in 3 bundles, Aria Cost (CloudHealth+), Aria Operations and Aria Automation. Which are all built onto of Aria Graph that graphs all the nodes in your VMware clusters with all their connections so that Aria management can traverse this graph to find out what’s where. On top of Aria Graph are Aria Hub, Aria Insights, and Aria Guardrails (sort of like providing boundary’s where services can be deployed).

They also announced Ransomware Recovery [changed 7Sep22, the Eds] as a Service which builds on VMware’s DR-aaS announced last year and Tanzu now works with Red Hat OpenShift

We also discussed the show. I heard somewhere there were 10K people there, Jason heard somewhere between 6K and 9K. In any case much smaller than VMworlds prior to Covid (25kish). And of course the rebranding of the show seemed counter-intuitive at best.

The show floor was much smaller than usual, (not withstanding Keith’s Airstream RV exhibit). And there were a number of storage vendors not at the show?? There was less hardware on the show floor, this could be a Covid thing but there were just as many mini-white boards/class rooms per large exhibiter, so don’t think it was because of Covid.

But the elephant in the room was Broadcom’s acquisition of VMware. At one of the analyst briefings I asked an exec about attrition. He made a couple of comments but in the end said VMware has been bought and sold before and has always come out of it in better shape. This will be no different.

That’s about all from the show.

And Thanks again to Keith and his crew, for lending us his studio to record the show. It’s been a while since I’ve seen an RV on a show floor. Keith seemed to have a ball with it

Tell us how you like our video. If everyone is for it we could do something like this with a Zoom (in this case Zencastr) recording, Or just try this at the next joint conference. .

Jason Collier, Principle Member of Technical Staff at AMD

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology.

He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years.

He’s on LinkedIN. He’s currently working with AMD on new technology and he has been a GreyBeards on Storage co-host since the beginning of 2022

121: GreyBeards talk Cloud NAS with Peter Thompson, CEO & George Dochev, CTO LucidLink

GreyBeards had an amazing discussion with Peter Thompson (@Lucid_Link), CEO & co-founder and George Dochev (@GDochev), CTO & co-founder of LucidLink. Both Peter and George were very knowledgeable and easy to talk with.

LucidLink’s Cloud NAS creates a NAS storage system out of cloud (any S3 compatible AND Azure Blob) object storage. LucidLink is made up of client software, LucidLink SaaS (metadata service) and data on object storage. Their client software runs on any Linux, MacOS, or Windows desktop/laptop. LucidLink provides streaming, collaborative access to remote users for (file) data on object storage.

Just when 90% of the workforce was sent home for the pandemic, LucidLink emerged to provide all those users secure file access to any and all corporate data in the cloud. Peter mentioned one M&E customer who had just sent 300 video editors home with laptops and a disk drive which would last them all of 2 weeks. But they needed an ongoing solution for after that. The customer started with 300 users and ~100TB of file storage on LucidLink and a few months later, they had 1000 users with a PB+ of LucidLink data and was getting rid of all their NAS boxes. Listen to the podcast to learn more.

They are finding a lot of success in M&E, engineering design, Oil&Gas exploration, geo-spatial design firms and just about anywhere user collaboration on file data is required outside al data center.

LucidLink constructs a  FileSpace for customer file (object) data, which represents a drive letter or mount point that remote users can use to access files from the cloud. LucidLink supports a POSIX compliant file service for that data.

LucidLink data and user generated metadata is encrypted, using client owned/stored keys. So, data-at-rest (and -in-flight) can always be secure. They also support LDAP security and other standard SSO solutions to secure user access to data.

The LucidLink SaaS (metadata) service runs in a hyperscaler and links clients to file data on object storage. It also supports user distributed, byte range locking of file data.

One interesting nuance is that when a client locks a file, the system changes from an eventual to strongly consistent POSIX compliant file system. This ensures that the object storage is always the single source of truth.

The key that differentiates LucidLink from cloud gateways or file synch & share systems is that they 1) are not intended to operate in a data center, (yes, object storage can be located on prem but users are remote) and 2) don’t copy files from one user/access point to another. 

George said latency is enemy number one. LucidLink’s secret is prefetching. Each client uses a customer configured local persistent cache which can range from 5GB to a TB or more. LucidLink maintains a data and (in the next version) metadata working set for the user in their local cache.

Customer file data is split across multiple objects, that way LucidLink can stream data from all of them, in parallel, if needed. And doing so can supply extreme throughput when needed.

As for GDPR and data compliance, the customer controls who has access to the LucidLink SaaS as well as encryption keys.

LucidLink considers their solution “fault tolerant” or DR ready, because customers can load client software on any device and access any LucidLink file data. They also consider themselves “highly available” because their metadata/LucidLink SaaS service runs in a hyper scaler and object backing storage can be configured as highly available.

As mentioned earlier, LucidLink customers can use any S3 compatible or Azure Blob object storage, on prem or in the cloud. But when using cloud object storage, one pays egress charges. LucidLink’s local caching can minimize but cannot eliminate egress charges.

LucidLink offers two licensing models: 1) a BYO (bring your own) object storage and LucidLink provides the software to support your Cloud NAS or 2) LucidLink supplies both the object storage as well as the LucidLink service that glues it all together. The later is a combination of IBM COS and LucidLink that offers less expensive egress charges.

The LucidLink service is billed on capacity under management and user count basis. Capacity is billed on a GB/day, summed over a month. Their minimum solution is 5TB/5 users but they have customers with 1000s of users and PB+ of data. They offer a free 2-week trial period where customers can try LucidLink out.

Peter Thompson, CEO and Co-founder

Peter Thompson, co-founder, and CEO of LucidLink is a passionate and experienced leader and business builder. Thompson has over 30 years of experience in driving business expansion, key programs, and partnerships across regions such as APAC and the Americas mostly in the storage and file system market.

With over 14 years at DataCore Software, most recently as VP of Emerging and Developing Markets, Thompson drove DataCore’s expansion into China working with key industry partners, technology alliances and global teams to develop programs and business focused on emerging markets. Thompson also held the role of Managing Director, APAC responsible for the bottom-line of all Asia operations. He also was President and Representative Director of DataCore Japan, acquiring the majority of ownership and running it as a standalone entity as a beachhead of marquis customers in Japan.

Thompson studied Japanese, history, and economics at Kansai Gaidai and has a BA in International Management, Psychology, Japanese, from Gustavus Adolphus College, is a graduate of Stanford University Business School’s MSx program, with a focus on entrepreneurial finance, design thinking, and the soft skills required to build and lead world-class, high performing teams.

George Dochev, CTO and Co-Founder

George Dochev, co-founder and CTO of LucidLink, is a storage and file system expert with extensive experience in bringing emerging technologies to market. Dochev has over 20 years of success leading the development of complex virtualization products for the storage industry. He specializes in research and development in the fields of high-performance distributed systems, storage infrastructure software, and cloud technologies. 

Dochev was co-founder and principal member of the engineering team at DataCore Software for nearly 17 years. While at Datacore, Dochev helped transform that company from a start-up into a global leader in software-defined storage. Underscoring Dochev’s impact as an entrepreneur is the fact that DataCore Software now powers the data centers of 10,000+ large enterprises around the world.

Dochev holds a degree in Mathematics from Sofia University St. Kliment Ohridski in Bulgaria, and an MS in Computer Science from the University of National and World Economy, in Sofia, Bulgaria.

119: GreyBeards talk distributed cloud file systems with Glen Shok, VP Alliances,Panzura

This month we turn to distributed (cloud) filesystems as we talk with Glen Shok (@gshok), VP of Alliances for Panzura. Panzura uses backend (cloud or onprem, S3 compatible) object store with a ring of software (VMs) or hardware (appliance) gateways that provides caching for local files as well as managing and maintaining metadata which creates a global NFS and SMB file system with near local access times.

Glen is an industry (without the grey beard) veteran with the knowledge to back that up. He’s been in the industry so long that we could probably have spent an hour just talking about where people are that we both know. Listen to the podcast to learn more

The interesting part about Panzura is their gateway ring. It not only manages local file caching and metadata maintenance/access, but it provides an out-of-(data path)-band file (byte range) lock coordination service, cache coherency (via delta block changes) and other services. All the metadata (and data) is backed up on backend object storage, but it’s the direct access to the metadata and its out of band control path as well as its caching service that supplies the near local access times for data.

Panzura supports any public (AWS, Azure, GCP & IBM) cloud object storage for backend data storage as well as a few, on prem, solutions (I think Glen mentioned IBM COS & Cloudian and their website mentions Wasabi, Scality and NetApp StorageGrid). Glen said they are on each of the public cloud’s marketplaces and with virtual gateways, its very easy to spin up and try.

Their system provides global (local, at the gateway) dedupe to reduce backend storage footprint and (both out of band and from backend storage) delta block changes for local cache updates. So in the event that an old version of the file happens to be present in their local cache gateway, it only needs retrieve the changed data from the object storage backend (or another gateway). All this local caching, dedupe and changed block tracking, helps to reduce cloud egress charges.

Data written to backend storage is immutable and versioned. So customers can retrieve any version of any file that was ever destaged to their backend. Glen said they write huge objects, presumably to help reduce storage footprint, IO overhead and API calls.

Glen claimed what with 3-way replication within a cloud region and 1-way replication outside the cloud region, customers no longer have to backup data. I respectively disagreed. He believes over time, customers will come to realize their use of backups for restores, becomes so rare that they can reduce backup frequency, if not eliminate it altogether. Some follow on discussion ensued, but in the end we seemed to agree to disagree on this topic.

Panzura also supports cross cloud mirroring. So, one could have their data mirrored from one cloud to another. One of these clouds will be used as a primary and only in the event that a majority of the gateway rings agree that the primary is DOWN and the secondary is UP, will they all automatically cut over to using the secondary storage cloud. While failover is automated, fail back requires operator intervention.

Panzura is charged for on managed data capacity. But cloud or on prem object storage is in addition to this and is charged for separately by the object storage provider.

As far what size file systems they support, Glen mentioned that they are ZFS internally, so any size imaginable. But he did concede, that at some point, metadata management becomes a problem and that they often suggest splitting apart 20PB file systems into 2 10PB (gateway rings) file systems to deal with this issue.

As for other solutions offered by Panzura, they have a K8s container block storage for persistent volumes that scales in capacity/performance using K8s services/resources.

Glen Shok, VP Alliances, Panzura

Glen Shok has been in the data center and storage industry for over 20 years.

Starting his career at Cisco in the late 90s. Moving to a few startups which were acquired by Brocade and Oracle. Glen has held positions in sales, sales leadership, product management and marketing, and Office of the CTO at Zones, prior to coming to Panzura.

He can’t decide what he likes to do, but at Panzura, he’s the VP of Strategic Alliances.

111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault

Sponsored by:

I’ve known Matthew Tyrer, Senior Manager Solutions Marketing and Competitive Intelligence, Commvault for quite awhile now and he’s always been knowledgeable about the problems the enterprise has in supporting and backing up large file data repositories. But lately he’s been focused on Commvault Activate their data analytics solution.

We had a great talk with Matthew. He was easy to talk to and knew a lot about how data analytics can ease the operational burden of the enterprise growing file data environments. .Remind me not to have two Matthew’s on the same program ever again. Listen to the podcast to learn more.

Matthew mentioned that their Activate was built on the Commvault platform software stack, which has had a rich and long history of development and customer deployments. It seems that Activate data analytics had been an early part of the platform but recently was split out as a separate solution.

One capability that Activate has that many other data analytics solutions do not, is the ability to examine both online data as well as data in backups. Most analytics solution can do one or the other, only a few do both. But if a solution only has access to online or backup data, they are missing half the story.

In addition, Activate can operate across multiple data centers as well as across multiple public cloud environments to provide analytics for an enterprise’s file data where it may reside.

Given the proliferation of file data these days, data analytics has become a necessity to most large IT shops. In the past, an admin could track some data over time but with the volumes of file data today, this is no longer tenable. At PB or more of file data, located in on prem data centers as well as across multiple clouds, there’s just too much file data to keep track of manually anymore.

Activate also indexes file content to provide more visibility and tracking of the different types of data under management in the enterprise. This is in addition to the extensive metadata that is collected and analyzed so it can better understand data access rights, copies and physical locations around the enterprise.

Activate can help organizations govern their data flows in support of industry as well as government data compliance requirements. Activate Data Governance, one of the three Activate solutions, is focused exclusively on providing enterprises the tools needed to manage any and all data that exists under compliance regulation environments.

Mat Leib had worked in eDiscovery before and it had always been a pain to extract “legally relevant” data from online and backup repositories. With the Activate eDiscovery solution and Activate’s content indexing of all file data, legal can perform their own relevant data searches to create eDiscovery data sets in support of litigation activities. Self service legal extracts like this vastly reduces the admin time and cost needed for eDiscovery.

The Activate File Space Optimization solution was deployed in one environment that had ~20PB of data online. By using File Space Optimization, the customer was able to cut 20PB down to 10PB. Any customer could benefit from such a reduction but customers doing data migration would see even more benefit.

At the end of the podcast, Matthew mentioned some videos that show Activate solution use cases.

Matthew Tyrer, Senior Solutions Marketing and Competitive Intelligence

Having worked at Commvault for over twelve years, after 8 years as a Sales Engineer Matt took that technical knowledge and transitioned to marketing where he is currently serving as a Senior Manager in Commvault’s Solution Marketing team. He is also heavily involved in Competitive Intelligence initiatives, and actively participates in field enablement programs.

He brings over 20 years’ experience in the IT industry, including within the fields of data and information management, cloud, data governance, enterprise storage, disaster recovery, and ultimately both implementing and supporting those projects and endeavours for public and private sector clients across Canada and around the globe.

Matt’s passion, deep product knowledge, and broad field experiences have enabled him to translate Commvault technology and vision such that their value is easily understood in the market and amongst client and partner families.

A self-described geek-dad, Matt is an avid boardgame enthusiast, firmly believes that Han shot first, and enjoys tormenting his girls with bad dad jokes.