173: GreyBeards Year End 2025 podcast

Well this year went fast. Keith, Jason and I sat down to try to make some sense of it all.

AI is still on a tear and shows no end in sight. Questions abound on whether we are seeing signs of a bubble or not, our answer – maybe. We see it in GPU pricing. in AI startup valuations, and in enterprise interest. Some question whether the Enterprise is seeing any return from investments in AI but there’s no doubt they are investing. Inferencing on prem with training/fine tuning done in neo-clouds, has become the new norm. I thought we’d be mostly discussing agentic AI but it’s too early to for that yet.

In other news, the real Broadcom VMware play is starting to emerge (if ever in doubt). It’s an all out focus on the (highly profitable) high end enterprises, and abandon the rest. And of course the latest weirdness to hit IT is DRAM pricing, but in reality it’s the price of anything going into AI mega-data centers that’s spiking. Listen to the podcast to learn more

AI

GPU pricing is still high, although we are starting to see some cracks in NVIDIA’s moat.

AMD GPUs made a decent splash in the latest MLperf Training results and Google TPUs are starting to garner some in the enterprise. And NVIDIA GPUs are becoming less of a compute monster by focusing more with their latest GPU offerings on optimization for low precision compute, FP2 anyone, rather than just increasing compute. It seems memory bandwidth (in GPUs) is becoming more of a bottleneck than anything else IMHO.

But NVIDIA CUDA is still an advantage. Grad students grew up on it, trained on it and are so familiar with it, it will take a long time to displace. Yeah, RoCM helps but, IT needs more. Open Sourcing all the CUDA code and its derivatives could be an answer, if anybody’s listening.

Jason talked about AI rack and data center power requirements going through the roof and mentioned SMR (small modular [nuclear] reactors) as one solution. When buying a nuclear power plant is just not an option, SMRs can help. They can be trucked and installed (mostly) anywhere. Keith saw a truckload of SMRs on the highway on one of his road trips.

And last but not least, Apple just announced RDMA over Thunderbolt. And the (Youtube) airwaves have been lighting up with studio Macs being clustered together with sufficient compute power to rival a DGX. Of course it’s Apples MLX running rather than CUDA, and only so many models work on MLX, but it’s a start at democratizing AI.

VMware

Broadrcom’s moves remind Jason of what IBM did with Z. Abandoning the low end, milk the high end forever. If you want vSphere better think about purchasing VCF.

Keith mentioned if a company has a $100M cloud spend, they could save some serious money (~20%), going to VCF. But it’s not a lift and shift. Running a cloud on prem requires a different mindset than running apps in the cloud. Welcome to the pre-cloud era, where every IT shop did it all.

Component Pricing

Jason said that DRAM pricing has gone up 600% in a matter of weeks. Our consensus view is it’s all going to AI data centers. With servers having a TB of DRAM, GPUs with 160GB of HBM per, and LPDDR being gobbled up for mobile/edge compute everywhere is it any doubt that critical server (sub-) components are in high demand.

Hopefully, the Fabs will start to produce more. But that assumes Fab’s have spare capacity and DRAM demand is function of price. There are hints that neither of these are true anymore. Mega data centers are not constrained by capital, yet, and most Fabs are operating flat out producing as many chips as they can. So DRAM pricing may continue to be a problem for some time to come.

Speaking of memory, there was some discussion on memory tiering startups taking off with high priced memory. One enabler for that is the new UALink interconnect. It’s essentially an open source, chip-to-chip interconnect technology, over PCIe or Ethernet. UAlink solutions can connect very high speed components beyond the server itself to support a scale out network of accelerators, memory and CPUs in a single rack. It’s early yet but Meta specs for an OCP wide form factor rack was released in the AMD Helios OGP 72GPU rack that uses UALink tech today, More to come we’re sure.

Keith Townsend, The CTO Advisor, Founder & Executive Strategist | Advisor to CIOs, CTOs & the Vendors Who Serve Them

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Jason Collier, Principal Member Of Technical Staff at AMD, Data Center Solutions Group

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology.

He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years.

He’s on LinkedIN. He’s currently working with AMD on new technology and he has been a GreyBeards on Storage co-host since the beginning of 2022

171: GreyBeards talk Storage.AI with Dr. J Metz, SNIA Chair and Technical Director, AMD

SNIA’s Storage Developer Conference (SDC) was held last week in CA and although I didn’t attend I heard it was quite a gathering. Just prior to the show, I was talking with Jason about the challenges of storage for AI and he mentioned that SNIA had a new Storage.AI initiatdze focused on these issues. I called Dr. J Metz, Chair of SNIA & Technical Director @ AMD (@drjmetz, blog) and asked if the wanted to talk to us about SNIA’s new initiative.

Storage.AI is a SNIA standards development community tasked with addressing the myriad problems AI has with data. Under its umbrella, a number of technical working groups (TWGs) will work on standards to improve AI data access, Just about every IT vendor in the universe is listed as a participating company in the initiative. Listen to the podcast to learn more.

We started discussing Dr. J’s current roles at SNIA and AMD and how SDC went last week. It turns out, it was the best attended SDC ever and Dr. J’s keynote on Storage.AU was a highlight of the show.

The storage/data needs for AI span a wide spectrum of activities or workloads. Dr. J spoke on the lengthy data pipeline, e.g. ingest, prep/clean, transform, train, checkpoint/reload, RAG upload/update and inference to name just a few. In all these diverse activities, storage’s job is getting the right data bits to the right process (GPU/accelerators for training) throughout the pipeline. Inferencing has somewhat less of a convoluted data journey but is still complex and performance critical.

Te take just one component of the data pipeline checkpointing is a data intensive process. When training a multi-billion parameter model or, dare I say, multi-trillion parameter model with 10K to Million’s of GPUs, failure’s happen, often. Checkpoints are the only way model training can make progress in the face of significant GPU failures. And of course, any checkpoint needs to be reloaded to verify it’s correct.

So checkpointing and reloading is an IO activity that happens constantly when models are trained. Checkpoints essentially save the current model parameters during training. Speeding up checkpoint/reload could increase AI model training throughput considerably

And of course, GPUs and the power they consume are an expensive activity. When one has 1000’s to Millions of GPUs in a data center, having them sit idle is a vast waste of resources. Anything to help speed up accelerator data access could potentially save millions.

In the old days compute, storage and networking were isolated/separate silos of technology. Nowadays, the walls between them have been blown away, mostly by the advent of AI.

Dr. J talks about first principles, such as the speed of light that determines the time it takes for data to move from one place to another. These limits exist throughout IT infrastructure. But OS stacks surrounding these activities have spawned layer upon layer of software to do these actions. If one can wipe the slate clean, infrastructure activities can get closer to those first principles and reduce overhead

SNIA has current TWGs focused on a number of activities that could help speed up AI IO. We talked about SNIA’s Smart Data Acceleration Initiative (SDXI), but there are others in process as well. But SNIA has also identified a few new ones they plan to fire up such as GPU direct access bypass and GPU-Initiated IO to address other gaps in Storage.AI.

In today’s performance driven AI environments, proprietary solutions are often developed to address some of these same issues. We ended up discussing the role of standards vs. proprietary solutions in IT in general and in today’s AI infrastructure.

Yes there’s a place for proprietary solutions and there’s also a place for standards. Sometimes they merge, sometimes not, but they can often help inform each other on industry trends and challenges.

I thought that proprietary technologies always seem to emerge early and then transition to standards over time. Dr. J said it’s more of an ebb and flow between proprietary and standards, and mentioned as one example the ESCON-FC-FICON-Fabric proprietary/standards activities from last century.

As always It was an interesting conversation with Dr. J and Jason and I look forward to seeing how SNIA’s Storage.AI evolves over time.

Dr. J. Metz, Chair and Chief Executive of SNIA & Technical Director, AMD

J is Technical Director for Systems Design for AMD where he works to coordinate and lead strategy on various industry initiatives related to systems architecture, including advanced networking and storage. He has a unique ability to dissect and explain complex concepts and strategies, and is passionate about the inner workings and application of emerging technologies.

J has previously held roles in both startup and Fortune 100 companies as a Field CTO, R&D Engineer, Solutions Architect, and Systems Engineer. He is and has been a leader in several key industry standards groups, currently Chair of SNIA as well as the Chair of the Ultra Ethernet Consortium (UEC). Previously, he was on the board of the Fibre Channel Industry Association (FCIA) and Non-Volatile Memory Express (NVMe) organizations. A popular blogger and active on Twitter, his areas of expertise include both storage and networking for AI and HPC environments.

Additionally, J is an entertaining presenter and prolific writer. He has won multiple awards as a speaker and author, writing over 300 articles and giving presentations and webinars attended by over 10,000 people. He earned his PhD from the University of Georgia.

170: FMS25 wrap-up with Jim Handy, Objective Analysis

Jim Handy, General Director at Objective Analysis and I were at FMS25 in Santa Clara last week and there was a lot of news going around. Jim’s been on our show just about every year to discuss FMS news, And with the recent focus beyond flash, it’s even harder for one person to keep up.

Much of the discussion at FMS was on HBM4, new QLC capacity points, UAlink/UCe for chiplets, 100M IOP SSDs, and more. Listen to the podcast to learn more.

Th.ere was not as much on CXL as in past shows. and ditto on increasing layer counts to drive more NAND capacity. A couple of years ago layer counts were all they talked about. And CXL was the major change to hit the data center. Jim’s view (and Jason’s) was that CXL was as a way for hyperscalers to make use of DDR4 DRAM but that need has passed now.

As for layer counts they are still going up but not as fast. And the economics of 3D scaling now have to compete with 2D scaling and “virtual scaling”.

But UAlink and UCe were active topics both of which are used to tie together chiplets in CPUs to build SoCs. SSD vendors are starting to use chiplet architectures to build their massive capacity SSDs and UAlink/UCe would be a way to architect these.

SLC NAND is back to support very high performance SSDs or as a replacement for SCM (storage class memory or Optane). One vendor talked about reaching 100M (random 512B read) IOPS for a single SSD. Current SCL flash can do ~10M IOPS, next gen is speced to do ~30M and the one following would be 100M. One challenge is that current SSDs do 4Kbyte IO and it still takes a msec. or so to erase a page and reading a page isn’t that fast. But the performance is for read only activity.

HBM4 was one topic at the show but the newest wrinkle was HB Flash, or putting SSDs behind HBM to support GPU caching (SSD to HBM to GPU). This would allow more data to be quickly accessed by a GPU.

Jim also mentioned that there’s some interest in narrowing HBM access width, currently 1Kb and increasing to 2Kb with HBM4. This width, and all the pins it requires, limits how many HBM chips one can surround a GPU with. If HBM had a narrower interface more HBM chips could surround a GPU, increasing memory size and perhaps memory bandwidth. HBM4 seems to be going the wrong way but with narrower width HBM, they could easily double the number of HBM chips surrounding a GPU.

They were also showing off a 40 SSD 2U chassis using E.2 form factor SSDs. Pretty impressive and given the capacity on offer a lot of storage per RU.

Speaking of capacity one vendor announced a 246TB QLC SSD, roughly a 1/4PB in a single SSD. With 24 of these per 2U shelf, one could have a >1/10 Exabyte, (>100 PB) in a 40U rack. It looks like no end in sight for SSD capacities. And we aren’t even talking about PLC yet.

At the other end of SSD capacity, it appears that M.2 SSDs were getting hotter on one side (controller side) than the other, throttling performance. So one vender decided to provide heat (liquid cooling) pipes between the two sides to equalize thermal load.

Jim Pappas (lately of Intel) won the lifetime achievement award from FMS. Jim’s accomplishments span a wide swath of storage technology but at the award ceremony he waxed on his work on the USB connector. He said his will stipulates that once he is interned in the ground, they are to take out the casket and spin it around 180 degrees and put it back down again. 🙂

There were quite a number of side topics not directly related to FMS25 on the podcast which were interesting in their own right, but I think i’ll leave it here.

Jim Handy, General Director Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry, including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.

He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media. 

He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

169: GreyBeards talk AgenticAI with Luke Norris, CEO&Co-founder, Kamiwaza AI

Luke Norris (@COentrepreneur), CEO and Co-Founder, Kamiwaza AI, is a serial entreprenaur in Silverthorne CO, where the company is headquartered.. They presented at AIFD6 a couple of weeks back and the GreyBeards thought it would be interesting to learn more about what they were doing, especially since we are broadening the scope of the podcast, to now be GreyBeards on Systems.

Describing Kamiwaza AI is a bit of a challenge. They settled on “AI orchestration” for the enterprise but it’s much more than that. One of their key capabilities is an inference mesh which supports accessing data in locations throughout an enterprise across various data centers to do inferencing, and then gathering replies/responses together, aggregating them into one combined response. All this without violating HIPPA, GDPR or other data compliance regulations.

Kamiwaza AI offer an opinionated AI stack, which consists of 155 components today and growing that supplies a single API to access any of their AI services. They support multi-node clusters and multiple clusters, located in different data centers, as well as the cloud. For instance, they are in the Azure marketplace and plans are to be in AWS and GCP soon.

Most software vendors provide a proof of concept, Kamiwaza offers a pathway from PoC to production. Companies pre-pay to install their solution and then can use those funds when they purchase a license.

And then there’s their (meta-)data catalogue. It resides in local databases (and replicated maybe) throughout the clusters and is used to identify meta data and location information about any data in the enterprise that’s been ingested into their system.

Data can be ingested for enterprise RAG databases and other services. As this is done, location affinity and metadata about that data is registered to the data catalogue. That way Kamiwaza knows where all of an organization’s data is located, which RAG or other database it’s been ingested into and enough about the data to understand where it might be pertinent to answer a customer or service query.

Maybe the easiest way to understand what Kamiwaza is, is to walk through a prompt. 

  • A customer issues a prompt to a Kamiwaza endpoint which triggers,
  • A search through their data catalog to identify what data can be used to answer that prompt.
  • If all the data resides in one data center, the prompt can be handed off to the GenAI model and RAG services at that data center. 
  • But if the prompt requires information from multiple data centers,
  • Separate prompts are then distributed to each data center where RAG information germane to that prompt is located
  • As each of these generate replies, their responses are sent back to an initiating/coordinating cluster
  • Then all these responses are combined into a single reply to the customer’s prompt or service query.

But the key point is that data located in each data center used to answer the prompt are NOT moved to other data centers. All prompting is done locally, at the data center where the data resides.  Only prompt replies/responses are sent to other data centers and then combined into one comprehensive answer. 

Luke mentioned a BioPharma company that had genonome sequences located in various data regimes, some under GDPR, some under APAC equivalents, others under USA HIPPA requirements. They wanted to know information about how frequent a particular gene sequence occurred. They were able to issue this as a prompt at a single location which spun up separate, distributed prompts for each data center that held appropriate information. All those replies were then transmitted back to the originating prompt location and combined/summarized.

Kamiwaza AI also has an AIaaS offering. Any paying customer is offered one (AI agentic) outcome per month per cluster license. Outcomes could effectively be any AI application they would like to perform.

One outcome he mentioned included:

  • A weather-risk researcher had tons of old weather data in a multitude of formats, over many locations, that had been recorded over time.
  • They wanted to have access to all this data so they can tell when extreme weather events had occurred in the past.
  • Kamiwaza AI assigned one of their partner AI experts to work with the researcher to have an AI agent comb through these archives, transform and clean all the old weather data into HTML data more amenable to analysis . 
  • But that was just the start.. They really wanted to understand the risk of damage due to the extreme weather events. So the AI application/system was then directed to go and gather from news and insurance archives, any information that identified the extent of the damage from those weather events. 

He said that today’s AgenticAI can implement a screen mouse click and perform any function that an application or a human could do on a screen. Agentic AI can also import an API and infer where an API call might be better to use than a screen GUI interaction.

He mentioned that Kamiwaza can be used to generate and replace a lot of what enterprises do today with Robotics Process Automation (RPAs). Luke feels that anything an enterprise was doing with RPA’s can be done better with Kamiwaza AI agents.

SaaS solution tasks are also something AgenticAI can easily displace . Luke said at one customer they went from using SAP APIs to provide information to SAP, to using APIs to extract information from SAP, to completely replacing the use of SAP for this task at the enterprise. 

How much of this is fiction or real is subject of some debate in the industry. But Kamiwaza AI is pushing the envelope on what can and can’t be done. And with their AI aaS offering, customers are making use of AI like they never thought possible before. .

Kamiwaza AI has a community edition, a free download that’s functionally restricted, and provides a desktop experience of Kamiwaza AI’s stack. Luke sees this as something a developer could use to develop to Kamiwaza APIs and test functionality before loading on the enterprise cluster. 

We asked where they were finding the most success. Luke mentioned anyone that’s heavily regulated, where data movement and access were strictly constrained. And they were focused on large, multi-data center, enterprises.

Luke mentioned that Kamiwaza AI has been doing a number of hackathons with AI Tinkerers around the world. He suggested prospects take a look at what they have done with them and perhaps join them in the next hackathon in their area.

Luke Norris, CEO & Co-Founder, Kamiwaza AI

Luke Norris is the co-founder of Kamiwaza.AI, driving enterprise AI innovation with a focus on secure, scalable GenAI deployments. With extensive experience raising over $100M in venture capital and leading global AI/ML deployments for Fortune 500 companies.

Luke is passionate about enabling enterprises to unlock the full potential of AI with unmatched flexibility and efficiency.

165: GreyBeard talks VMware Explore’24 wrap-up with Gina Rosenthal, Founder&CEO Digital Sunshine Solutions

I’ve known Gina Rosenthal (@gminks@mas.to), Founder&CEO, Digital Sunshine Solutions seems like forever and she’s been on the very short list for being a GBoS co-host but she’s got her own Tech Aunties Podcast now. We were both at VMware Explore last week in Vegas. Gina was working in the community hub and I was in their analyst program.

VMware (World) Explore has changed a lot since last year. I found the presentation/sessions to be just as insightful and full of users as last years, but it seems like there may have been fewer of them. Gina found the community hub sessions to be just as busy and the Code groups were also very well attended. On the other hand, the Expo was smaller than last year and there were a lot less participants (and [maybe] analysts) at the show. Listen to the podcast to learn more.

The really big news was VCF 9.0. Both a new number (for VCF) and an indicator of a major change in direction for how VMware functionality will be released in the future. As one executive told me, VCF has now become the main (release) delivery vehicle for all functionality.

In the past, VCF would generally come out with some VMware functionality downlevel to what’s generally available in the market. With VCF 9, that’s going to change now. From now on, all individual features/functions of VCF 9.0 will be at the current VMware functionality levels. Gina mentioned this is a major change for how VMware released functionality, and signals much better product integrations than available in the past.

Much of VMware distinct functionality has been integrated into VCF 9 including SDDC, Aria and other packages. They did, however, create a new class of “advanced services” that runs ontop of VCF 9. We believe these are individually charged for and some of these advanced services include:

  • Private AI Foundation – VMware VCF, with their Partner NVIDIA, using NVIDIA certified servers, can now run NVIDIA Enterprise AI suite of offerings which includes just about anything an enterprise needs to run GenAI in house or any other NVIDIA AI service for that matter. The key here is that all enterprise data stays within the enterprise AND the GenAI runs on enterprise (VCF) infrastructure. So all data remains private.
  • Container Operations – this is a bundling of all the Spring Cloud and other Tanzu container services. It’s important to note, TKG (Tanzu Kubernetes Grid) is still part of the base vSphere release, which allows any VVF (VMware vSpere Foundation) or VCF users to run K8S standalone, but with minimal VMware support services.
  • Advanced Security – include vDefend firewall/gateway, WAF, Malware prevention, etc.

There were others, but we didn’t discuss them on the podcast.

I would have to say that Private AI was of most interest to me and many other analysts at the show. In fact, I heard that it’s VMware’s (and supposedly NVIDIA’s) intent to reach functional parity with GCP Vertex and others with Private AI. This could come as soon as VCF 9.0 is released. I pressed them on this point and they held firm to that release number.

My only doubt is that VMware or NVIDIA don’t have their own LLM. Yes, they can use Meta’s LLama 3.1, OpenAI or any other LLM on the market. But running them in-house on enterprise VCF servers is another question.

The lack of an “owned” LLM should present some challenges with reaching functional parity with organizations that have one. On the other hand, Chris Walsh mentioned that they (we believe VMware internal AI services) have been able to change their LLM 3 times over the last year using Private AI Foundation.

Chris repeated more than once that VMware’s long history with DRS and HA makes VCF 9 Private AI Foundation an ideal solution for enterprises to run AI workloads. He specifically mentioned GPU HA that can take GPUs from data scientists when enterprise inferencing activities suffer GPU failures. Unclear whether any other MLops cloud or otherwise can do the same.

From a purely storage perspective I heard a lot about vVols 2.0, This is less a functional enhancement, than a new certification to make sure primary storage vendors offer full vVol support in their storage.

Gina mentioned and it came up in the Analyst sessions, that Broadcom has stopped offering discounts for charities and non-profits. This is going to hurt most of those organizations which are now forced to make a choice, pay full subscription costs or move off VMware.

The other thing of interest was that Broadcom spent some time trying to soothe over the bad feelings of VMware’s partners. There was a special session on “Doing business with Broadcom VMware for partners” but we both missed it so can’t report any details.

Finally, Gina and I, given our (lengthy) history in the IT industry and Gina’s recent attendance at IBM Share started hypothesizing on a potential linkup between Broadcom’s CA and VMware offerings.

I mentioned multiple times there wasn’t even a hint of the word “mainframe” during the analyst program. Probably spent more time discussing this than we should of, but it’s hard to take the mainframe out of IT (as most large enterprises no doubt lament).

Gina Rosenthal, Founder & CEO, Digital Sunshine Solutions

As the Founder and CEO of Digital Sunshine Solutions, Gina brings over a decade of expertise in providing marketing services to B2B technology vendors. Her strong technical background in cloud computing, SaaS, and virtualization enables her to offer specialized insights and strategies tailored to the tech industry.

She excels in communication, collaboration, and building communities. These skills to help her create product positioning, messaging, and content that educates customers and supports sales teams. Gina breaks down complex technical concepts and turn them into simple, relatable terms that connect with business goals.

She is the co-host of The Tech Aunties podcast, where she shares thoughts on the latest trends in IT, especially the buzz around AI. Her goal is to help organizations tackle the communication and organizational challenges associated with modern datacenter transitions.