NVMe – Grey Beards on Systems

August 8, 2018September 3, 2018

68: GreyBeards talk NVMeoF/TCP with Ahmet Houssein, VP of Marketing & Strategy @ Solarflare Communications

In this episode we talk with Ahmet Houssein, VP of Marketing and Strategic Direction at Solar f lare Communications, (@solarflare_comm). Ahmet’s been in the industry forever and has a unique view on where NVMeoF needs to go. Howard had talked with Ahmet at last years FMS. Ahmet will also be speaking at this years FMS (this week in Santa Clara, CA)..

Solarflare Communications sells Ethernet communication gear, mostly to the financial services market and has developed a software plugin for the standard TCP/IP stack on Linux that supports both target and client mode NVMeoF/TCP. That is, their software plugin provides a complete implementation of NVMeoF across TCP Ethernet that extends the TCP protocol but doesn’t require RDMA (RoCE or iWARP) or data center bridging.

Implementing NVMeoF/TCP

Solarflare’s NVMeoF/TCP is a free plugin that once approved by the NVMe(oF) standard’s committees anyone can use to create a NVMeoF storage system and consume that storage from almost anywhere. The standards committee is expected to approve the protocol extension soon and sometime after that the plugin will be added to the Linux Kernel. After standards approval, maybe VMware and Microsoft will adopt it as well, but may take more work.

Over the last year plus most NVMeoF/Ethernet we encounter requires sophisticated RDMA hardware. When we talked with Pavilion Data Systems, a month or so ago, they had designed a more networking like approach to NVMeoF using RoCE and TCP a special purpose FPGA that’s used in their RDMA NICs and Mellanox switches to support client-target mode NVMeoF/UDP [updated 8/8/18 after VR’s comment, the ed.]. When we talked with Attala Systems, they had special purpose FPGA that’s used in RDMA NICs and Mellanox switches to support target & client mode NVMeoF/UDP were using standard RDMA NICs and Mellanox switches to support their NVMeoF/Ethernet storage [updated 8/8/18 after VR’s comment, the ed.].

Solarflare is taking a different tack.

One problem with the NVMeoF/Ethernet RDMA is compatibility. You can use either RoCE or iWARP RDMA NICs but at the moment you can’t use both. With TCP/IP plugins there’s no hardware compatibility issue. (Yes, there’s software compatibility at both ends of the pipe).

SolarFlare recently measured latencies for their NVMeoF/TCP (Iometer/FIO) which shows that the with the protocol running it adds about a 5-10% increase in latency versus running RDMA NVMeoF/UDP-RoCE-iWARP.

Performance measurements were taken using a server, running Red Hat Linux + their TCP plugin with NVMe SSDs on the storage side and a similar configuration on the client side without the SSDs.

If they add 10% latency to 10 microsec. IO (for Optane), latency becomes 11 microsec. Similarly for flash NVMe SSDs it moves from 100 microsec to 110 microsec.

Ahmet did mention that their NICs have some hardware optimizations which brings down this added latency into something approaching closer to 5%. And later we discuss the immense parallelism opportunities using the TCP stack in user space. Their hardware also better supports more threads doing IO in parallel.

Why TCP

Ahmets on a mission. He says there’s this misbelief that Ethernet RDMA hardware is required to achieve lightening fast response times using NVMeoF, but it’s not true. Standard TCP with proper protocol enhancements is more than capable of performing at very close to the same latencies as RDMA, without special NICs and DCB switch configurations.

Furthermore, TCP/IP already has multipathing support. So current high availability characteristics of TCP are readily applicable to NVMeoF/TCP

Parallelism through user space

NVMeoF/TCP was the subject of 1st half of our discussion but we spent the 2nd half talking about scaling or parallelism. Even if you can do 11 or 110 microsecond latency at some point, if you do enough of these IOs, the kernel overhead in processing blocks and transferring control from kernel space to user space will become a bottleneck.

However, there’s nothing stopping IT from running the TCP/IP stack in user space and eliminating any kernel control transfer whatsoever. By doing so, data centers could parallelize all this IO using as many cores as available.

Running the plugin in a TCP/IP stack in user space allows you to scale NVMeoF lightening fast IO to as many users as you have user spaces or cores, and the kernel doesn’t even break into a sweat

Anyone could simply download Solarflare’s plugin, configure a white box server with Linux and 24 NVMe SSDs and support ~8.4M IOPS (350Kx24) at ~110 microsec latency And with user space scaling, one could easily have 1000s of user spaces connected to it.

They’re going to need need faster pipes!

The podcast runs ~39 minutes. Ahmet was very knowledgeable about NVMe, NVMeoF and TCP. He was articulate and easy to talk with. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 38:50 — 53.3MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Ahmet Houssein, VP of Marketing and Strategic Direction at Solarflare Communications

Ahmet Houssein is responsible for establishing marketing strategies and implementing programs to drive revenue growth, enter new markets and expand brand awareness to support Solarflare’s continuous development and global expansion.

He has over twenty-five years of experience in the server, storage, data center and networking industry, and held senior level executive positions in product development, marketing and business development at Intel and Honeywell. Most recently Houssein was SVP/GM at QLogic where he successfully delivered first to market with 25Gb Ethernet products securing design wins at HP and Dell.

One of the key leaders in the creation of the INFINIBAND and PCI-Express industry standard, Houssein is a recipient of the Intel Achievement Award and was a founding board member of the Storage Network Industry Association (SNIA), a global organization of 400 companies in the storage market. He was educated in London, UK and holds an Electrical Engineering Degree equivalent.

July 10, 2018September 3, 2018

65: GreyBeards talk new FlashSystem storage with Eric Herzog, CMO and VP WW Channels IBM Storage

Sponsored by:

In this episode, we talk with Eric Herzog, Chief Marketing Officer and VP of WorldWide Channels for IBM Storage about the FlashSystem 9100 storage series. This is the 2nd time we have had Eric on the show (see Violin podcast) and the 2nd time we have had a guest from IBM on our show (see CryptoCurrency talk). However, it’s the first time we have had IBM as a sponsor for a podcast.

Eric’s a 32 year storage industry veteran who’s worked for many major storage companies, including Seagate, EMC and IBM and 7 startups over his carreer. He’s been predominantly in marketing but was CFO at one company.

New IBM FlashSystem 9100

IBM is introducing a new FlashSystem 9100 storage series, using new NVMe FlashCore Modules (FCM) that have been re-designed to fit a small form factor (SFF, 2.5″) drive slot but also supports standard, NVMe SFF SSDs in a 2U appliance package. The new storage has dual active-active RAID controllers running the latest generation IBM Spectrum Virtualize software that’s running over 100K storage systems in the field today.

FlashSystem 9100 supports up to 24 NVMe FCMs or SSDs, which can be intermixed. The FCMs offer up to 19.2TB of usable flash and have onboard hardware compression and encryption.

With FCM media, the FlashSystem 9100 can sustain 2.5M IOPS at 100µsec response times with 34GB/sec of data throughput. Spectrum Virtualize is a clustered storage system, so one could cluster together up to 4 FlashSystem 9100s into a single storage system and support 10M IOPS and 136GB/sec of throughput.

Spectrum Virtualize just introduced block data deduplication within a data reduction pool. With thin provisioning, data deduplication, pattern matching, SCSI Unmap support, and data compression, the FlashSystem 9100 can offer up to 5:1 effective capacity:useable flash capacity. That means with 24 19.2TB FCMs, a single FlashSystem 9100 offers over 2PB of effective capacity.

In addition to the appliances 24 NVMe FCMs or NVMe SSDS, FlashSystem 9100 storage can also attach up to 20 SAS SSD drive shelves for additional capacity. Moreover, Spectrum Virtualize offers storage virtualization, so customers can attach external storage arrays behind a FlashSystem 9100 solution.

With FlashSystem 9100, IBM has bundled additional Spectrum software, including

Spectrum Virtualize for Public Cloud – which allows customers to migrate data and workloads from on premises to the cloud and back again. Today this only works for IBM Cloud, but plans are to support other public clouds soon.
Spectrum Copy Data Management – which offers a simple way to create and manage copies of data while enabling controlled self-service for test/dev and other users to use snapshots for secondary use cases.
Spectrum Protect Plus – which provides data backup and recovery for FlashSystem 9100 storage, tailor made for smaller, virtualized data centers.
Spectrum Connect – which allows Docker and Kubernetes container apps to access persistent storage on FlashSystem 9100.

To learn more about the IBM FlashSystem 9100, join the virtual launch experience July 24, 2018 here.

The podcast runs ~43 minutes. Eric has always been knowledgeable on the enterprise storage market, past, present and future. He had a lot to talk about on the FlashSystem 9100 and seems to have mellowed lately. His grey mustache is forcing the GreyBeards to consider a name change – GreyHairsOnStorage anyone, Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 42:42 — 58.6MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Eric Herzog, Chief Marketing Officer and VP of Worldwide Channels for IBM Storage

Eric’s responsibilities include worldwide product marketing and management for IBM’s award-winning family of storage solutions, software defined storage, integrated infrastructure, and software defined computing, as well as responsibility for global storage channels.

Herzog has over 32 years of product management, marketing, business development, alliances, sales, and channels experience in the storage software, storage systems, and storage solutions markets, managing all aspects of marketing, product management, sales, alliances, channels, and business development in both Fortune 500 and start-up storage companies.

Prior to joining IBM, Herzog was Chief Marketing Officer and Senior Vice President of Alliances for all-flash storage provider Violin Memory. Herzog was also Senior Vice President of Product Management and Product Marketing for EMC’s Enterprise & Mid-range Systems Division, where he held global responsibility for product management, product marketing, evangelism, solutions marketing, communications, and technical marketing with a P&L over $10B. Before joining EMC, he was vice president of marketing and sales at Tarmin Technologies. Herzog has also held vice president business line management and vice president of marketing positions at IBM’s Storage Technology Division, where he had P&L responsibility for the over $300M OEM RAID and storage subsystems business, and Maxtor (acquired by Seagate).

Herzog has held vice president positions in marketing, sales, operations, and acting-CFO roles at Asempra (acquired by BakBone Software), ArioData Networks (acquired by Xyratex), Topio (acquired by Network Appliance), Zambeel, and Streamlogic.

Herzog holds a B.A. degree in history from the University of California, Davis, where he graduated cum laude, studied towards a M.A. degree in Chinese history, and was a member of the Phi Alpha Theta honor society.

December 30, 2017September 3, 2018

55: GreyBeards storage and system yearend review with Ray & Howard

In this episode, the Greybeards discuss the year in systems and storage. This year we kick off the discussion with a long running IT trend which has taken off over the last couple of years. That is, recently the industry has taken to buying pre-built appliances rather than building them from the ground up.

We can see this in all the hyper-converged solutions available today but it goes even deeper than that. It seems to have started with the trend in organizations to get by with less man-women power.

This led to a desire to purchase pre-buit software applications and now, appliances rather than build from parts. It just takes to long to build and lead architects have better things to do with their time than checking compatibility lists, testing and verifying that hardware works properly with software. The pre-built appliances are good enough and doing it yourself doesn’t really provide that much of an advantage over the pre-built solutions.

Next, we see the coming systems using NVMe over Fabric storage systems as sort of a countertrend to the previous one. Here we see some customers paying well for special purpose hardware with blazing speed that takes time and effort to get working right, but the advantages are significant. Both Howard and I were at the Excelero SFD12 event and it blew us away. Howard also attended the E8 Storage SFD14 event which was another example along a similar vein.

Finally, the last trend we discussed was the rise of 3D TLC and the absence of 3DX and other storage class memory (SCM) technologies to make a dent in the marketplace. 3D TLC NAND is coming out of just about every fab these days and resulting in huge (but costly) SSDs, in the multi-TB range. Combine these with NVMe interfaces and you have msec access to almost a PB of storage without breaking a sweat.

The missing 3DX SCM tsunami some of us predicted is mainly due to the difficulties in bringing new fab technologies to market. We saw some of this in the stumbling with 3D NAND but the transition to 3DX and other SCM technologies is a much bigger change to new processes and technology. We all believe it will get there someday but for the moment, the industry just needs to wait until the fabs get their yields up.

The podcast runs over 44 minutes. Howard and I could talk for hours on what’s happening in IT today. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 44:18 — 60.8MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Howard Marks is the Founder and Chief Scientist of DeepStorage, a prominent blogger at Deep Storage Blog and can be found on twitter @DeepStorageNet.

Ray Lucchesi is the President and Founder of Silverton Consulting, a prominent blogger at RayOnStorage.com, and can be found on twitter @RayLucchesi.

August 19, 2017September 3, 2018

50: Greybeards wrap up Flash Memory Summit with Jim Handy, Director at Objective Analysis

In this episode we talk with Jim Handy (@thessdguy), Director at Objective Analysis, a semiconductor market research organization. Jim is an old friend and was on last year to discuss Flash Memory Summit (FMS) 2016. Jim, Howard and I all attended FMS 2017 last week in Santa Clara and Jim and Howard were presenters at the show.

NVMe & NVMeF to the front

Although, unfortunately the show floor was closed due to fire, there were plenty of sessions and talks about NVMe and NVMeF (NVMe over fabric). Howard believes NVMe & NVMeF seems to be being adopted much quicker than anyone had expected. It’s already evident inside storage systems like Pure’s new FlashArray//X, Kamanario and E8 storage, which is already shipping block storage with NVMe and NVMeF.

Last year PCIe expanders and switches seemed like the wave of the future but ever since then, NVMe and NVMeF has taken off. Historically, there’s been a reluctance to add capacity shelves to storage systems because of the complexity of (FC and SAS) cable connections. But with NVMeF, RoCE and RDMA, it’s now just an (40GbE or 100GbE) Ethernet connection away, considerably easier and less error prone.

3D NAND take off

Both Samsung and Micron are talking up their 64 layer 3D NAND and the rest of the industry following. The NAND shortage has led to fewer price reductions, but eventually when process yields turn up, the shortage will collapse and pricing reductions should return en masse.

The reason that vertical, 3D is taking over from planar (2D) NAND is that planar NAND can’t’ be sharing much more and 15nm is going to be the place it stays at for a long time to come. So the only way to increase capacity/chip and reduce $/Gb, is up.

But as with any new process technology, 3D NAND is having yield problems. But whenever the last yield issue is solved, which seems close, we should see pricing drop precipitously and much more plentiful (3D) NAND storage.

One thing that has made increasing 3D NAND capacity that much easier is string stacking. Jim describes string stacking as creating a unit, of say 32 layers, which you can fabricate as one piece and then layer ontop of this an insulating layer. Now you can start again, stacking another 32 layer block ontop and just add another insulating layer.

The problem with more than 32-48 layers is that you have to (dig) create holes (connecting) between all the layers which have to be (atomically) very straight and coated with special materials. Anyone who has dug a hole knows that the deeper you go, the harder it is to make the hole walls straight. With current technology, 32 layers seem just about as far as they can go.

3DX and similar technologies

There’s been quite a lot of talk the last couple of years about 3D XPoint (3DX) and what it means for the storage and server industry. Intel has released Octane client SSDs but there’s no enterprise class 3DX SSDs as of yet.

The problem is similar to 3D NAND above, current yields suck. There’s a chicken and egg problem with any new chip technologies. You need volumes to get the yield up and you need yields up to generate the volumes you need. And volumes with good yields generate profits to re-invest in the cycle for next technology.

Intel can afford to subsidize (lose money) 3DX technology until they get the yields up, knowing full well that when they do, it will become highly profitable.

The key is to price the new technology somewhere between levels in the storage hierarchy, for 3DX that means between NAND and DRAM. This does mean that 3DX will be more of between memory and SSD tier than a replacement for for either DRAM or SSDs.

The recent emergence of NVDIMMs have provided the industry a platform (based on NAND and DRAM) where they can create the software and other OS changes needed to support this mid tier as a memory level. So that when 3DX comes along as a new memory tier they will be ready

NAND shortages, industry globalization & game theory

Jim has an interesting take on how and when the NAND shortage will collapse.

It’s a cyclical problem seen before in DRAM and it’s a question of investment. When there’s an oversupply of a chip technology (like NAND), suppliers cut investments or rather don’t grow investments as fast as they were. Ultimately this leads to a shortage and which then leads to over investment to catch up with demand. When this starts to produce chips the capacity bottleneck will collapse and prices will come down hard.

Jim believes that as 3D NAND suppliers start driving yields up and $/Gb down, 2D NAND fabs will turn to DRAM or other electronic circuitry whichwill lead to a price drop there as well.

Jim mentioned game theory is the way the Fab industry has globalized over time. As emerging countries build fabs, they must seek partners to provide the technology to produce product. They offer these companies guaranteed supplies of low priced product for years to help get the fabs online. Once, this period is over the fabs never return to home base.

This approach has led to Japan taking over DRAM & other chip production, then Korea, then Taiwan and now China. It will move again. I suppose this is one reason IBM got out of the chip fab business.

The podcast runs ~49 minutes but Jim is a very knowledgeable, chip industry expert and a great friend from multiple events. Howard and I had fun talking with him again. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 48:58 — 22.5MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Jim Handy, Director at Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication. He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media. He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

August 18, 2016September 3, 2018

35: GreyBeards talk Flash Memory Summit wrap-up with Jim Handy, Objective Analysis

In this episode, we talk with Jim Handy (@thessdguy), memory and flash analyst at Objective Analysis. Jim’s been on our podcast before and last time we had a great talk on flash trends. As Jim, Howard and Ray were all at Flash Memory Summit (FMS 2016) last week, we thought it appropriate to get together and discuss what we found interesting at the summit

Flash is undergoing significant change. We started our discussion with which vendor had the highest density flash device. It’s not that easy to answer because of all the vendors at the show. For example Micron’s shipping a 32 GB chip and Samsung announced a 1TB BGA. And as for devices, Seagate announced a monster, 3.5″ 60TB SSD.

MicroSD cards have 16-17 NAND chips plus a mini-controller, at that level, with a 32GB chip, we could have a ~0.5TB MicroSD card in the near future. No discussion on pricing but Howard’s expectation is that they will be expensive.

NVMe over fabric push

One main topic of conversation at FMS was how NVMe over fabric is emerging. There were a few storage vendors at FMS taking advantage of this, including E8 Storage and Mangstor, both showing off NVMe over Ethernet flash storage. But there were plenty of others talking NVMe over fabric and all the major NAND manufacturers couldn’t talk enough about NVMe.

Facebook’s keynote had a couple of surprises. One was their request for WORM (QLC) flash. It appears that Facebook plans on keeping user data forever. Another item of interest was their Open Compute Project Lightning JBOF (just a bunch of flash) device using NVMe over Ethernet (see Ray’s post on Facebook’s move to JBOF). They were also interested in ganging up M.2 SSDs into a single package. And finally they discussed their need for SCM.

Storage class memory

The other main topic was storage class memory (SCM), and all the vendors talked about it. Sadly, the timeline for Intel-Micron 3D Xpoint has them supplying sample chips/devices by year end next year (YE2017) and releasing devices to market with SCM the following year (2018). They did have one (hand built) SSD at the show with remarkable performance.

On the other hand, there are other SCM’s on the market, including EverSpin (MRAM) and CrossBar (ReRAM). Both of these vendors had products on display but their capacities were on the order of Mbits rather than Gbits.

It turns out they’re both using ~90nm fab technology and need to get their volumes up before they can shrink their technologies to hit higher densities. However, now that everyone’s talking about SCM, they are starting to see some product wins. In fact, Mangstor is using EverSpin as a non-volatile write buffer.

Jim explained that 90nm is where DRAM was in 2005 but EverSpin/CrossBar’s bit density is better than DRAM was at the time. But DRAM is now on 15-10nm class technologies and sell 10B DRAM chips/year. EverSpin and CrossBar (together?) are doing more like 10M chips/year. The costs to shrink to the latest technology are ~$100M to generate the masks required. So for these vendors, volumes have to go up drastically before capacity can increase significantly.

Also, at the show Toshiba mentioned they’re focusing on ReRAM for their SCM.

As Jim recounted, the whole SCM push has been driven by Intel and their need to keep improving the performance of memory and storage, otherwise they felt their processor sales would stall.

3D NAND is here

Just about every NAND manufacturer talked about their 3D NAND chips, ranging from 32 layers to 64 layers. From Jim’s perspective, 3D NAND was inevitable, as it was the only way to continue scaling in density and reducing bit costs for NAND.

Samsung was first to market with 3D NAND as a way to show technological leadership. But now everyone’s got it and providing future discussions on bit density and number of layers. What their yields are is another question. But Planar NAND’s days are over.

Toshiba’s FlashMatrix

Toshiba’s keynote discussed a new flash storage system called the FlashMatrix but at press time they had yet to share their slides with the FMS team, so information on FlashMatrix was sketchy at best.

However, they had one on the floor and it looked like a bunch of M2 flash across an NVMe (over Ethernet?) mesh backplane with compute engines connected at the edge.

We had a hard time understanding why Toshiba would do this. Our best guess is perhaps they want to provide OEMs an alternative to SanDisk’s Infiniflash.

The podcast runs over 50 minutes and covers flash technology on display at the show and the history of SCM. I think Howard and Ray could easily spend a day with Jim and not exhaust his knowledge of Flash and we haven’t really touched on DRAM. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 53:21 — 24.4MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Jim Handy, Memory and Flash analyst at Objective Analysis.

JH Mug BW

	174: GreyBeards talk… on 174: GreyBeards talk SDN chips…
	Greybeards talk doma… on 172: Greybeards talk domain sp…
	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…