Grey Beards on Systems

February 20, 2026February 20, 2026

174: GreyBeards talk SDN chips with Ted Weatherford, VP Bus. Dev. & John Carney. Dist. Eng. at Xsight Labs

Ted Weatherford (Lin), VP Business Development and John Carney (Lin), Distinguished Engineer, SW Architecture of Xsight Labs, presented at a recent AI Infrastructure Field Day (AIIFD4, see video of their session here) I attended and thought they had a great way to solve the need for high speed/software defined networking (SDN) in modern data centers.

Turns out Xsight Labs is a fabless semiconductor company, specializing in SDN ASICs and currently have an X2 Switch ASIC and a E1 DPU ASIC out on the market today. They are the first vendor to have an 800Gbps DPU ASIC and their 12.8T X2 switch chip is focused on low power SDN for ToR, edge and extreme edge environments. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 50:44 — 69.7MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Ted said there are 6 primary chips which make up AI data centers: GPU, CPU, NIC, Scale up networking switch, Scale out networking switch, & DPU chips. And there are maybe 20 semiconductor dev groups around the world capable of developing these. Their current monolithic die chips use the TSMC N5 process. While their networking X2 switch chip has been out in volume for a while now and is in its second generation, their E1 DPU chip is brand new.

We started discussing their X2 SDN switch chip. The X2 supports up to 128 ports/12.8T in 1RU (based on reference switch architecture) at under 200W and any port can configured to support 10G to 400G.

In addition to their switch form factor and power envelope, the X2’s programability is a major strength. They have over 3000 dedicated, “MAP” core CPUs or Harvard (separate instructions and data memory and access paths) core CPUs to process, in parallel, the data flows in and out of the chip plus separate parser cores. The 3072 MAP cores are allocated statically to 64 parallel packet forwarding engines and 64 data plane units and there’s a 64MB packet buffer shared across all ports.

They have had a couple of PoCs where customers managed to code X2 switch support for their application in a matter of weeks (days maybe).

Moving on to the E1 DPU chip, the story gets even more interesting. Traditionally, DPUs have had a hybrid architecture where the data path processing is done in parallel cores and the management processing in a handful of RISC cores requiring clients to code in multiple architectural environments. Xsight labs took a different tack.

The E1 has up to 64 N2 ARM cores and can literally run Linux or other OSs (at the same time) to support DPU processing. (Btw, the latest gen Apple M5 CPU only has 8 fast ARM cores and 4 slow ARM cores). On the server side of the DPU, it supports up to 40 PCIe Gen5 lanes and on the ethernet side 1-800Gbps port, 2-400Gbps ports, 4-200Gbps ports, etc..

The E1 also supports 4 DDR5 5200 MHz memory interfaces which means E1 can support TBs of memory!? John went into the software architecture a bit more on our podcast than at AIIFD4 and said it had 32MB of system cache.

John said main memory would mostly be used to host static/slowly changing databases and tables. Actual instructions to support DPU IO would all reside in ARM l1-l2-l3 instruction caches during processing. The E1 operates the 800G port at line speed within 75W of power.

At AIIFD4, they discussed their SONiC DASH benchmark VNET to VNET scenario results. SONiC DASH is an open source project started by Microsoft, used to assess smart switches under cloud service provider workloads. There are many levels for the SONiC DASH benchmark and their E1 DPU-X2 network (reference architecture) smart switch was the first smart switch to sustain Hero800 performance (800Gbps link support) using only a single DPU. This means their E1 DPU-X2 switch processed in excess of 14M TCP connections/second for over a minute and a half, all while sustaining over 120M TCP&UDP background data flows.

As for customer wins, Ted mentioned each SpaceX Starlink V3 satellite uses multiple X2 chips and that the Open Flash Platform (OFP) organization is currently implementing an NFS server using a single E1 DPU and PBs of flash in a single sled. This means OFP is mapping a Linux file system across all the flash in the sled and presenting it as a NFS storage server out the front end. No server required…

Ted Weatherford, VP Business Development, Xsight Labs

Ted Weatherford is Vice President of Business Development at Xsight Labs, where he was hired to build out a commercial team, discover and close first strategic customer engagements, manages HW and SW partner ecosystems, and contribute to go-to-market activities for the company’s products and engineering services.

Ted brings over 30 years of experience in product line management, business development, and strategic marketing in the semiconductor industry. Over his career, his work has contributed to hundreds of chip design-wins globally and generated more than $2 billion in revenue for leading merchant silicon suppliers including BRCM, INTL, CRDO, MTK, MRVL and many others

John Carney, Distinguished Engineer, Software Architecture, Xsight Labs

John is a Distinguished Engineer at Xsight Labs, focused on software architecture for Xsight’s E-series DPU products.

Prior to Xsight, John held hardware and software technical leadership roles at Broadcom, Cisco and Juniper.

John is an expert in high-performance networking data planes.

December 29, 2025December 29, 2025

173: GreyBeards Year End 2025 podcast

Well this year went fast. Keith, Jason and I sat down to try to make some sense of it all.

AI is still on a tear and shows no end in sight. Questions abound on whether we are seeing signs of a bubble or not, our answer – maybe. We see it in GPU pricing. in AI startup valuations, and in enterprise interest. Some question whether the Enterprise is seeing any return from investments in AI but there’s no doubt they are investing. Inferencing on prem with training/fine tuning done in neo-clouds, has become the new norm. I thought we’d be mostly discussing agentic AI but it’s too early to for that yet.

In other news, the real Broadcom VMware play is starting to emerge (if ever in doubt). It’s an all out focus on the (highly profitable) high end enterprises, and abandon the rest. And of course the latest weirdness to hit IT is DRAM pricing, but in reality it’s the price of anything going into AI mega-data centers that’s spiking. Listen to the podcast to learn more

Podcast: Play in new window | Download (Duration: 41:13 — 56.6MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

AI

GPU pricing is still high, although we are starting to see some cracks in NVIDIA’s moat.

AMD GPUs made a decent splash in the latest MLperf Training results and Google TPUs are starting to garner some in the enterprise. And NVIDIA GPUs are becoming less of a compute monster by focusing more with their latest GPU offerings on optimization for low precision compute, FP2 anyone, rather than just increasing compute. It seems memory bandwidth (in GPUs) is becoming more of a bottleneck than anything else IMHO.

But NVIDIA CUDA is still an advantage. Grad students grew up on it, trained on it and are so familiar with it, it will take a long time to displace. Yeah, RoCM helps but, IT needs more. Open Sourcing all the CUDA code and its derivatives could be an answer, if anybody’s listening.

Jason talked about AI rack and data center power requirements going through the roof and mentioned SMR (small modular [nuclear] reactors) as one solution. When buying a nuclear power plant is just not an option, SMRs can help. They can be trucked and installed (mostly) anywhere. Keith saw a truckload of SMRs on the highway on one of his road trips.

And last but not least, Apple just announced RDMA over Thunderbolt. And the (Youtube) airwaves have been lighting up with studio Macs being clustered together with sufficient compute power to rival a DGX. Of course it’s Apples MLX running rather than CUDA, and only so many models work on MLX, but it’s a start at democratizing AI.

VMware

Broadrcom’s moves remind Jason of what IBM did with Z. Abandoning the low end, milk the high end forever. If you want vSphere better think about purchasing VCF.

Keith mentioned if a company has a $100M cloud spend, they could save some serious money (~20%), going to VCF. But it’s not a lift and shift. Running a cloud on prem requires a different mindset than running apps in the cloud. Welcome to the pre-cloud era, where every IT shop did it all.

Component Pricing

Jason said that DRAM pricing has gone up 600% in a matter of weeks. Our consensus view is it’s all going to AI data centers. With servers having a TB of DRAM, GPUs with 160GB of HBM per, and LPDDR being gobbled up for mobile/edge compute everywhere is it any doubt that critical server (sub-) components are in high demand.

Hopefully, the Fabs will start to produce more. But that assumes Fab’s have spare capacity and DRAM demand is function of price. There are hints that neither of these are true anymore. Mega data centers are not constrained by capital, yet, and most Fabs are operating flat out producing as many chips as they can. So DRAM pricing may continue to be a problem for some time to come.

Speaking of memory, there was some discussion on memory tiering startups taking off with high priced memory. One enabler for that is the new UALink interconnect. It’s essentially an open source, chip-to-chip interconnect technology, over PCIe or Ethernet. UAlink solutions can connect very high speed components beyond the server itself to support a scale out network of accelerators, memory and CPUs in a single rack. It’s early yet but Meta specs for an OCP wide form factor rack was released in the AMD Helios OGP 72GPU rack that uses UALink tech today, More to come we’re sure.

Keith Townsend, The CTO Advisor, Founder & Executive Strategist | Advisor to CIOs, CTOs & the Vendors Who Serve Them

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

Jason Collier, Principal Member Of Technical Staff at AMD, Data Center Solutions Group

Jason Collier (@bocanuts) is a long time friend, technical guru and innovator who has over 25 years of experience as a serial entrepreneur in technology.

He was founder and CTO of Scale Computing and has been an innovator in the field of hyperconvergence and an expert in virtualization, data storage, networking, cloud computing, data centers, and edge computing for years.

He’s on LinkedIN. He’s currently working with AMD on new technology and he has been a GreyBeards on Storage co-host since the beginning of 2022

November 24, 2025November 24, 2025

172: Greybeards talk domain specific AI with Dr. Arun Subramaniyan, Founder & CEO, Articul8 AI

Keith and I attended AIFD7 a couple of weeks back and Articul8 AI presented at one session (see videos of their session here). Given all the press on LLMs and GenAI, there are only a few non-GenAI solutions in the market today. Articul8 is one of these and represents a different way of deploying AI for industry. Dr. Arun Subramaniyan, Founder and CEO of Articul8 (LinkedIn), discussed their approach on AI for industries at their session.

With all the press on GenAI, agentic AI and LLMs, it’s hard to remember that AI has had a long history in helping various verticals address their challenges. Articul8 AI was founded only 2 years ago, but already has significant footprints in a number of industry sectors, such as aerospace, telecom, (electrical) energy, etc. Articul8 AI is all about deploying domain specific models trained to focus on select industry challenges. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 49:03 — 67.4MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Articul8 AI can operate on prem, in your VPC or in their own cloud . The solution is deployed on infrastructure sized to your organizations specific requirements and starts (processing) ingesting corporate data the moment it’s enabled. It can run on something as small as an 8GPU server to large clusters with many (1000s of) GPUs.

Articul8 doesn’t host or store any corporate data on this infrastructure, just metadata describing data and relationships between them. Arun said that within 24 hours, Articul8 AI has enough of an organizations knowledge map, what they call the shape of data, to process up to 95% of an organizations requests.

Their AIFD7 demo shows a sort of 3D visual of a knowledge map. And it’s interesting that every query or request changes the knowledge map in subtle ways.

For the industries it supports, Articul8 AI is typically embedded into “systems of record”. There are very few AI solutions today like this. perhaps coding agents for software development firms and recommendation engines for online retailers, but that’s about it.

For Articul8 AI to support a new domain or vertical, takes significant domain expertise and data. In some cases, they have partnered with industry associations to gain expertise and data. For those organizations that have contributed data or IP to support a new domain, Articul8 AI can share revenue from other organizations that adopt their solutions.

One can see the current verticals Articul8 AI supports. One item of interest is their cross domain models. They have one cross-domain model trained to interpret and understand tables/spreadsheets/”structured image data”, another to understand logs or time series data and a third focused on converting text to database queries. Most GenAI/LLMs struggle to understand tables and spreadsheet data well.

The other thing about tables and spreadsheets is that most corporations could not exist without them. By providing a cross domain table understanding model they have opened up vast troves of corp data which was just too inscrutable for LLM AI to understand and process before.

Finally, Artucul8 AI has two offerings currently available on AWS Marketplace one of which is a LLM evaluation tool and the other a network topology log analyzer tool. The LLM evaluator, when provided a prompt, will return which current LLM could handle that prompt best and is callable via API. The topology service can analyze time series logs from networking and other gear and show network topology from logs alone.

Dr. Arun Subramaniyan, Founder & CEO Articul8 AI

Arun Subramaniyan is the founder & CEO of Articul8, where he is building a domain-specific GenAI Platform. Previously, he led the Cloud & AI Strategy team at Intel where he was responsible for establishing and driving the overall AI strategy globally, and was focused on democratizing AI in a sustainable fashion.

Arun joined Intel from Amazon Web Services (AWS), where he led the Extreme-scale computing solution team spanning Machine Learning, Quantum Computing, High Performance Computing (HPC), Autonomous Vehicles, and Autonomous Computing. His team was responsible for developing solutions across all areas of HPC, quantum computing and large-scale machine learning applications, spanning a $1B+ portfolio, and he grew the businesses 2-3x over two years.

Arun’s primary areas of research focus are Bayesian methods, global optimization, probabilistic deep learning for large scale applications, and distributed computing. He is an Executive Fellow at Harvard Business School, where he teaches courses on Generative AI for Business Leaders. He enjoys working at the intersection of massively parallel computing and modeling large-scale systems.

Arun is a prolific researcher with a Ph.D. in Aerospace Engineering from Purdue University with 34 granted patents (60+ filed), 50+ international publications that have been cited more than 1600 times with a h-index of 16. He is also a recipient of the Hull Award from GE, which honors technologists for their outstanding technical impact.

October 3, 2025October 3, 2025

171: GreyBeards talk Storage.AI with Dr. J Metz, SNIA Chair and Technical Director, AMD

SNIA’s Storage Developer Conference (SDC) was held last week in CA and although I didn’t attend I heard it was quite a gathering. Just prior to the show, I was talking with Jason about the challenges of storage for AI and he mentioned that SNIA had a new Storage.AI initiatdze focused on these issues. I called Dr. J Metz, Chair of SNIA & Technical Director @ AMD (@drjmetz, blog) and asked if the wanted to talk to us about SNIA’s new initiative.

Storage.AI is a SNIA standards development community tasked with addressing the myriad problems AI has with data. Under its umbrella, a number of technical working groups (TWGs) will work on standards to improve AI data access, Just about every IT vendor in the universe is listed as a participating company in the initiative. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 49:55 — 68.6MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

We started discussing Dr. J’s current roles at SNIA and AMD and how SDC went last week. It turns out, it was the best attended SDC ever and Dr. J’s keynote on Storage.AU was a highlight of the show.

The storage/data needs for AI span a wide spectrum of activities or workloads. Dr. J spoke on the lengthy data pipeline, e.g. ingest, prep/clean, transform, train, checkpoint/reload, RAG upload/update and inference to name just a few. In all these diverse activities, storage’s job is getting the right data bits to the right process (GPU/accelerators for training) throughout the pipeline. Inferencing has somewhat less of a convoluted data journey but is still complex and performance critical.

Te take just one component of the data pipeline checkpointing is a data intensive process. When training a multi-billion parameter model or, dare I say, multi-trillion parameter model with 10K to Million’s of GPUs, failure’s happen, often. Checkpoints are the only way model training can make progress in the face of significant GPU failures. And of course, any checkpoint needs to be reloaded to verify it’s correct.

So checkpointing and reloading is an IO activity that happens constantly when models are trained. Checkpoints essentially save the current model parameters during training. Speeding up checkpoint/reload could increase AI model training throughput considerably

And of course, GPUs and the power they consume are an expensive activity. When one has 1000’s to Millions of GPUs in a data center, having them sit idle is a vast waste of resources. Anything to help speed up accelerator data access could potentially save millions.

In the old days compute, storage and networking were isolated/separate silos of technology. Nowadays, the walls between them have been blown away, mostly by the advent of AI.

Dr. J talks about first principles, such as the speed of light that determines the time it takes for data to move from one place to another. These limits exist throughout IT infrastructure. But OS stacks surrounding these activities have spawned layer upon layer of software to do these actions. If one can wipe the slate clean, infrastructure activities can get closer to those first principles and reduce overhead

SNIA has current TWGs focused on a number of activities that could help speed up AI IO. We talked about SNIA’s Smart Data Acceleration Initiative (SDXI), but there are others in process as well. But SNIA has also identified a few new ones they plan to fire up such as GPU direct access bypass and GPU-Initiated IO to address other gaps in Storage.AI.

In today’s performance driven AI environments, proprietary solutions are often developed to address some of these same issues. We ended up discussing the role of standards vs. proprietary solutions in IT in general and in today’s AI infrastructure.

Yes there’s a place for proprietary solutions and there’s also a place for standards. Sometimes they merge, sometimes not, but they can often help inform each other on industry trends and challenges.

I thought that proprietary technologies always seem to emerge early and then transition to standards over time. Dr. J said it’s more of an ebb and flow between proprietary and standards, and mentioned as one example the ESCON-FC-FICON-Fabric proprietary/standards activities from last century.

As always It was an interesting conversation with Dr. J and Jason and I look forward to seeing how SNIA’s Storage.AI evolves over time.

Dr. J. Metz, Chair and Chief Executive of SNIA & Technical Director, AMD

J is Technical Director for Systems Design for AMD where he works to coordinate and lead strategy on various industry initiatives related to systems architecture, including advanced networking and storage. He has a unique ability to dissect and explain complex concepts and strategies, and is passionate about the inner workings and application of emerging technologies.

J has previously held roles in both startup and Fortune 100 companies as a Field CTO, R&D Engineer, Solutions Architect, and Systems Engineer. He is and has been a leader in several key industry standards groups, currently Chair of SNIA as well as the Chair of the Ultra Ethernet Consortium (UEC). Previously, he was on the board of the Fibre Channel Industry Association (FCIA) and Non-Volatile Memory Express (NVMe) organizations. A popular blogger and active on Twitter, his areas of expertise include both storage and networking for AI and HPC environments.

Additionally, J is an entertaining presenter and prolific writer. He has won multiple awards as a speaker and author, writing over 300 articles and giving presentations and webinars attended by over 10,000 people. He earned his PhD from the University of Georgia.

August 20, 2025August 20, 2025

170: FMS25 wrap-up with Jim Handy, Objective Analysis

Jim Handy, General Director at Objective Analysis and I were at FMS25 in Santa Clara last week and there was a lot of news going around. Jim’s been on our show just about every year to discuss FMS news, And with the recent focus beyond flash, it’s even harder for one person to keep up.

Much of the discussion at FMS was on HBM4, new QLC capacity points, UAlink/UCe for chiplets, 100M IOP SSDs, and more. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 49:20 — 67.7MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Th.ere was not as much on CXL as in past shows. and ditto on increasing layer counts to drive more NAND capacity. A couple of years ago layer counts were all they talked about. And CXL was the major change to hit the data center. Jim’s view (and Jason’s) was that CXL was as a way for hyperscalers to make use of DDR4 DRAM but that need has passed now.

As for layer counts they are still going up but not as fast. And the economics of 3D scaling now have to compete with 2D scaling and “virtual scaling”.

But UAlink and UCe were active topics both of which are used to tie together chiplets in CPUs to build SoCs. SSD vendors are starting to use chiplet architectures to build their massive capacity SSDs and UAlink/UCe would be a way to architect these.

SLC NAND is back to support very high performance SSDs or as a replacement for SCM (storage class memory or Optane). One vendor talked about reaching 100M (random 512B read) IOPS for a single SSD. Current SCL flash can do ~10M IOPS, next gen is speced to do ~30M and the one following would be 100M. One challenge is that current SSDs do 4Kbyte IO and it still takes a msec. or so to erase a page and reading a page isn’t that fast. But the performance is for read only activity.

HBM4 was one topic at the show but the newest wrinkle was HB Flash, or putting SSDs behind HBM to support GPU caching (SSD to HBM to GPU). This would allow more data to be quickly accessed by a GPU.

Jim also mentioned that there’s some interest in narrowing HBM access width, currently 1Kb and increasing to 2Kb with HBM4. This width, and all the pins it requires, limits how many HBM chips one can surround a GPU with. If HBM had a narrower interface more HBM chips could surround a GPU, increasing memory size and perhaps memory bandwidth. HBM4 seems to be going the wrong way but with narrower width HBM, they could easily double the number of HBM chips surrounding a GPU.

They were also showing off a 40 SSD 2U chassis using E.2 form factor SSDs. Pretty impressive and given the capacity on offer a lot of storage per RU.

Speaking of capacity one vendor announced a 246TB QLC SSD, roughly a 1/4PB in a single SSD. With 24 of these per 2U shelf, one could have a >1/10 Exabyte, (>100 PB) in a 40U rack. It looks like no end in sight for SSD capacities. And we aren’t even talking about PLC yet.

At the other end of SSD capacity, it appears that M.2 SSDs were getting hotter on one side (controller side) than the other, throttling performance. So one vender decided to provide heat (liquid cooling) pipes between the two sides to equalize thermal load.

Jim Pappas (lately of Intel) won the lifetime achievement award from FMS. Jim’s accomplishments span a wide swath of storage technology but at the award ceremony he waxed on his work on the USB connector. He said his will stipulates that once he is interned in the ground, they are to take out the casket and spin it around 180 degrees and put it back down again. 🙂

There were quite a number of side topics not directly related to FMS25 on the podcast which were interesting in their own right, but I think i’ll leave it here.

Jim Handy, General Director Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry, including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.

He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media.

He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

	174: GreyBeards talk… on 174: GreyBeards talk SDN chips…
	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…
	J Metz on 134: GreyBeards talk (storage)…