NVMeoF/TCP – Grey Beards on Systems

June 12, 2019June 13, 2019

83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs

This is the first time we’ve talked with Muli Ben-Yehuda (@Muliby), Co-founder & CTO and Kam Eshghi (@KamEshghi), VP of Strategy & Business Development, Lightbits Labs. Keith and I first saw them at Dell Tech World 2019, in Vegas as they are a Dell Ventures funded organization. The company has 70 (mostly engineering) employees and is based in Israel, with offices in NY and the Valley as well as elsewhere around the world. Kam was previously with (Dell) EMC DSSD and Muli’s spent years as a Master Inventor with IBM Research.

[This was Keith Townsend’s (@CTOAdvisor & The CTO Advisor), first time as a GreyBeard co-host and we had a great time with him on the show.]

I would have to say it was a far ranging discussion but focused on their software defined, NVMeoF/TCP storage. As you may recall we talked with Solarflare Communications last year who were also working on a NVMeoF/TCP, only in their case it was an accelerator board. After the recording, Muli said the hardware accelerator they have is their own design.

Why NVMeoF/TCP?

Most NVMeoF today, that uses Ethernet, requires RoCE or iWARP compatible NICs and switches. Lightbits Labs has long been active in the NVMeoF/RoCE-iWARP market place. Early on they noticed that enterprise and cloud service providers were reluctant to adopt NVMeoF technology because of the need to change out all their networking equipment to use it. This is what brought about their focus on NVMeoF/TCP.

The advantage of NVMeoF/TCP is that it can be run on any Ethernet NIC and switch available today. From Muli’s perspective, NVMeoF/TCP is going to become the next SAN of choice for the data center. They were active, early on, in the standards committee to push for NVMeoF/TCP adoption.

How does it work?

Their software defined solution runs LightOS® storage software, a Linux based package, and uses off the shelf, server hardware with persistent storage (Optane DC PM/SSDs, NV DIMMs, V-NAND, etc.). They use persistent memory for a FAST write buffer and a place where they can “mold” the written data into something that can be better written to backend NVMe SSDs.

One surprise about Lightbits solution is that it offers a decent set of data services. These include erasure coding, thin provisioning, wire-speed inline compression, QoS and wide striping. It seems like any of these can be disabled by a customers want. But they only add very little overhead. I think Muli mentioned one Lightbits customer with encrypted data that disabled compression.

Lightbits also offers a global FTL (flash translation layer), which means they control SSD addressing which maps data to physical/raw NAND locations at the storage system level. If done well, a global FTL can help improve flash endurance and may offer better write performance (through increased parallelism).

Lightbits claim to inline, wire speed data compression is premised on the use of more current CPUs with high (>=28) core counts in a storage server. If the storage server has older CPUs (<28 cores), they suggest you install their LightField™ hardware accelerator add in card. LightField offers a number of hardware based, performance accelerations in addition to compression speedups.

LightOS requires no host (client) software. Muli’s a long time Linux kernel contributor and indicated that the only thing LightOS needs is a current Linux Kernel (5.0 or later) which has the NVMeoF/TCP driver software (and persistent memory). Lightbits believes that it’s only a matter of time until other OSs also implement NVMeoF/TCP drivers.

Lightbits business considerations

Long term, Lightbits sees a need for compute-storage disaggregation in hyper scalar and enterprise cloud environments. Early on it was relatively easy to replicate servers with DAS storage but as NVMe SSDs came out the expense to do this throughout their >>1000 server environment starts to become exorbitant. If they only had an easy way to disaggregate their storage from compute and still enjoy all the performance advantages of DAS NVMe SSDS. With LightOS they can do that.

Lightbits can be sold today through Dell, as a partner solution, which means that Dell can integrate, test and validate their servers with LightField accelerator card and deliver that package to your data center. I believe you still need to purchase and install their LightOS software yourself.

Lightbits charges for LightOS software on a per storage node basis, but they have different charges based on the maximum number of NVMe SSD slots available is in a server. There is no capacity charge. They also offer worldwide service and support for LightOS software and LightField hardware.

It’s all about performance

From a performance perspective, one Fortune 500 hyper-scalar benchmarked their storage solution against a DAS NVMe server and found it added about 30 µsec to the IO latency as compare to DAS NVMe SSDs. From their perspective, the added data services, better endurance, and disaggregated compute-storage environment provided by LightOS more than made up for the additional overhead.

Finally, I asked about whether multiple LightOS storage servers could be clustered together. Muli intervened, after stating some legal stuff, said they were working on the next generation LightOS and it will support clustered storage servers, local data replication as well as distributed (across storage servers) erasure coding.

The podcast is a long one and runs over ~47 minutes. There was a lot to talk about and Kam and Muli seem to know it all. It was interesting to hear the history of their pivot to TCP. They seem to have the right technology to address the market. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 47:33 — 65.3MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Muli Ben-Yehuda, Co-founder and CTO, Lightbits Labs

Muli Ben-Yehuda is the CTO and Co-Founder of Lightbits Labs, where he leads technological developments.

Prior to founding Lightbits, he was chief scientist at Stratoscale and a researcher and Master Inventor at IBM Research.

He holds an M.Sc. in Computer Science (summa cum laude) from the Technion — Israel Institute of Technology and a B.A. (cum laude) from the Open University of Israel.

He is a long time Linux kernel contributor and his code and ideas are most likely included in an operating system or hypervisor running near you. He is also one of the authors of the NVMe/TCP standard and technology.

Kam Eshghi, VP Strategy & Business Development, Lightbits Labs

Kam joined Lightbits Labs from Dell EMC and has over 20yrs of experience in strategic marketing and business development with startups and public companies.

Most recently as VP of strategic alliances at startup DSSD, Kam led business development with technology partners and developed DSSD’s partnership with EMC, leading to EMC’s acquisition of DSSD.

Previously as Sr. Director of Marketing & Business Development at IDT, Kam built their NVMe Controller business from scratch. Previous to that, Kam worked in data center storage, compute and networking markets at HP, Intel, and Crosslayer Networks.

Kam is a U.C. Berkeley and MIT graduate with a BS and MS in Electrical Engineering and Computer Science and an MBA.

June 16, 2018September 3, 2018

62: GreyBeards talk NVMeoF storage with VR Satish, Founder & CTO Pavilion Data Systems

In this episode, we continue on our NVMeoF track by talking with VR Satish (@satish_vr), Founder and CTO of Pavilion Data Systems (@PavilionData). Howard had talked with Pavilion Data over the last year or so and I just had a briefing with them over the past week.

Pavilion data is taking a different tack to NVMeoF, innovating in software and hardware design, but using merchant silicon for their NVMeoF accelerated array solution. They offer Ethernet based NVMeoF block storage.

VR is a storage “lifer“, having worked at Veritas on their Volume Manager and other products for a long time. Moreover, Pavilion Data has a number of exec’s from Pure Storage (including their CEO, Gurpreet Singh), other storage technology companies and is located in San Jose, CA.

VR says there were 5 overriding principles for Pavilion Data as they were considering a new storage architecture:

The IT industry is moving to rack scale compute and hence, there is a need for rack scale storage.
Great merchant silicon was coming online so, there was less of a need to design their own silicon/asics/FPGAs.
Rack scale storage needs to provide “local” (within the rack) resiliency/high availability and let modern applications manage “global” (outside the rack) resiliency/HA.
Rack scale storage needs to support advanced data management services.
Rack scale storage has to be easy to deploy and run

Pavilion Data’s key insight was in order to meet all those principles and deal with high performance NVMe flash and up and coming, SCM SSDs, storage had to be redesigned to look more like network switches.

Controller cards?

One can see this new networking approach in their bottom of rack, 4U storage appliance. Their appliance has up to 20 controller cards creating a heavy compute/high bandwidth cluster attached via an internal PCIe switch to a backend storage complex made up of up to 72 U.2 NVMe SSDs.

The SSDs fit into an interposer that plugs into their PCIe switch and maps single (or dual ported) SSDs to the applianece’s PCIe bus. Each controller card supports an Intel XeonD micrprocessor and 2 100GbE ports for up to 40 100GbE ports per appliance. The controller cards are configured in an active-active, auto-failover mode, for high availability. They don’t use memory caching or have any NVram.

On their website Pavilion data show 117 µsec response times and 114 GB/sec of throughput for IO performance.

Data management for NVMeoF storage

Pavilion Data storage supports widely striped/RAID6 data protection (16+2), thin provisioning, space efficient read only (redirect on write) snapshots and space efficient read-write clones. With RAID6, it takes more than 2 drive failures to lose data.

Like traditional storage, volumes (NVMe namespaces) are assigned to RAID groups. The backend layout appears to be a log structured file. VR mentioned that they don’t do garbage collection and with no Nvram/no memory caching, there’s a bit of secret sauce here.

Pavilion Data storage offers two NVMeoF/Ethernet protocols:

Standard off the shelf, NVMeoF/RoCE interface that makes use of v1.x of the Linux kernel NVMeoF/RoCE drivers and special NIC/switch hardware
New NVMeof/TCP interface that doesn’t need special networking hardware and as such, offers NVMeoF over standard NIC/switches. I assume this takes host software to work.

In addition, Pavilion Data has developed their own Multi-path IO (MPIO) driver for NVMeoF high availability which they have contributed to the current Linux kernel project.

Their management software uses RESTful APIs (documented on their website). They also offer a CLI and GUI, both built using these APIs. Bottom of rack storage appliances are managed as separate storage units, so they don’t support clusters of more than one appliance. However, there are only a few cluster storage systems we know of that support 20 controllers today for block storage.

Market

VR mentioned that they are going after new applications like MongoDB, Cassandra, CouchBase, etc. These applications are designed around rack scaling and provide “global”, off-rack/cross datacenter availability themselves. But VR also mentioned Oracle and other, more traditional applications. Pavilion Data storage is sold on a ($/GB) capacity basis.

The system comes in a minimum, 5 controller cards-18 NVMe SSD configuration and can be extended in groups of 5 controllers-18 NVMe SSDs to the full 20 controller cards-72 NVMe SSDs.

The podcast runs ~42 minutes. VR was very knowledgeable about the storage industry, NVMeoF storage protocols, NVMe SSDs and advanced data management capabilities. We had a good talk with VR on what Pavilion Data does and how well it works. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 41:44 — 38.2MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

VR Satish, Founder and CTO, Pavilion Data Systems

VR Satish is the Chief Technology Officer at Pavilion Data Systems and brings more than 20 years of experience in enterprise storage software products.

Prior to joining Pavilion Data, he was an Entrepreneur-in-Residence at Artiman Ventures. Satish was an early employee of Veritas and later served as the Vice President and the Chief Technology Officer for the Information & Availability Group at Symantec Corporation prior to joining Artiman.

His current areas of interest include distributed computing, information-centric storage architectures and virtualization.

Satish holds multiple patents in storage management, and earned his Master’s degree in computer science from the University of Florida.

	174: GreyBeards talk… on 174: GreyBeards talk SDN chips…
	Greybeards talk doma… on 172: Greybeards talk domain sp…
	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…