0102 GreyBeards talk big memory data with Charles Fan, CEO & Co-founder, MemVerge

It’s been a couple of months since we last talked with a startup, so the GreyBeards thought it was time. We reached out to Charles Fan (@CharlesFan14), CEO and Co-Founder of MemVerge to find out about their big memory solution or as Charles likes to call it, “software defined (big) memory”. Although neither Matt or I had ever talked with Charles before, he’s been just about everywhere in the storage industry throughout his career.

If you have been following my RayOnStorage blog you will have seen a post (Need memory, Intel’s Optane DC PM to the rescue) last year on Intel’s new Persistent Memory solutions using 3D XPoint, called Optane DC PM (data center, persistent memory) . At the announcement Intel made available a couple of ways customers could use Optane DC PM (PMem).

Optane DC PM primer

Native Optane DC PM access modes include:

  • A Memory Mode, which has Pmem emulating a large volatile memory space and uses a defined ratio of DRAM to PMem as a cache to access the Optane DC PM memory behind it.
  • An Application Direct (AppDirect) Mode which supports two sub-modes: a storage device mode that uses Pmem to emulate a persistent, 4KB block storage device; and a byte addressable, persistent memory address space mode that uses Pmem to emulate a large, non-volatile memory space . AppDirect memory content persists across boots, power failures and other system crashes.

Native PMem modes are selectected in the BIOS and are deployed at Boot time. Optane DC PM on a server can be split up into any of the three modes. And currently with Optane DC PM (Gen 1), a single server can have up to 6TB of DC PM which will go up to 8TB with Optane DC PM Gen 2 coming out later this year.

MemVerge Memory Machine

MemVerge has written a “software defined memory” service called the Memory Machine, that sits above the Intel Optane DC PM in server(s) and provides application access AND data services for PMem. .

Charles likens their Memory Machine to what VMware did for CPU cores, ie. they provide memory virtualization. This, Charles believes will bring on the age of Big Memory applications. He feels that PMem, with Memory Machine on top of it, will eliminate the need for high performance, tier 0 storage. Tier 0 storage is ~$10B market today, which he sees shifting from networked storage to PMem solutions. 

Memory Machine Data Services

One of the data services that the Memory Machine offers is a Pmem snapshot service. PMem thick or thin snapshots can be taken any (infinite) number of times (for thick snapshots storage space availability may limit their number) and can be taken up to once per minute. PMem thin snapshots take little time to accomplish and are very PMem space efficient but thick snapshots are a PMem to PMem copy of data, which will take longer to accomplish and will take double the memory of the original PMem being snapshot.

One significant use case for Pmem snapshots is for checkpoint crash recovery. Charles mentioned many securities and financial analysis firms use KDB as streaming data base service to monitor/analyze market activity and provide automated trading and other market services. These firms are always trying to gain an advantage through speed and reduced latency and as a result have moved their time sensitive processing to use in memory data structures/databases.

However, because checkpointing for crash recovery takes time, they usually checkpoint in memory databases only once a day (after market close) and maintain a log of database transactions on SSD. If there’s a system crash, they reload the last checkpoint and re-play all the transaction logs since that checkpoint to bring their in memory database back to the point of crash. Due to the number of transactions these firms do, this sort of crash recoverys can take hours.

With Memory Machine, these customers can take in memory checkpoints every minute and in the event of a crash, only have to re-play a minutes worth of transaction logs which could be done in no time to get back up

Other environments do similar checkpoint crash recoveries all of which could also take advantage of PMem snapshots to take more frequent checkpoints. Charles mentioned Rendering farms on the podcast but long scientific simulations (HPC) and others use checkpoints for crash recovery.

Another data (or application) service offered by Memory Machine is application cloning. Most in memory applications are single threaded. meaning they can only take advantage of a single CPU core (thread). In order to speed up processing, customers must shard (split up) or copy their database and application onto other servers/CPU/cores to provide more processing power. Memory Machine can use its thick or thin snapshots to clone applications in seconds.

Charles also mentioned that Memory Machine offers PMem dynamic reconfiguration. That is instead of having to make BIOS changes and re-boot server(s) to re-allocate PMem across different applications, Memory Machine is allocated 100% of the PMem at boot time but then, on demand, anytime its operating, operators using MemVerge’s GUI/CLI can carve Pmem up into any number of application memory spaces. That is as application demand for in memory data changes, operations can use the Memory Machine to re-allocate PMem to keep up.

Memory Machine also supports PMem clustering or scaling across servers. With the current 6TB (and soon 8TB) per server PMem limit, some customer applications still run out of memory. Memory Machine is able to cluster or aggregate PMem across up to 32 servers to support a single larger, PMem address space of 192TB (Gen 1) or 256TB (Gen 2) DC PM. The Memory Machine uses an RDMA (RoCE Ethernet or InfiniBand) cluster interconnect which adds ~1 microsecond of overhead to access PMem in another server. This comes with PMem automatic data tiering using DRAM, local (on the server) PMem and remote (across cluster interconnect) PMem.

Charles mentioned another data service provided by Memory Machine is (Synch or Asynch) replication. One use case for replication is to create a Pub-Sub service for market data.

Charles believes that in memory databases and data processing workloads are just starting to become popular these days. Besides KDB and rendering, other data processing such as AI training/inferencing, Reddis applications, and other database systems are able to take advantage of in memory, large data structures to speed up their data processing

MemVerge’s EAP (early access program) opened up recently (5/19/2020). Charles suggested anyone using large, in memory data processing, take a look at what the Memory Machine can do and contact them to sign up.

The podcast runs ~45 minutes. Charles was very articulate as well as knowledgeable about the technology and its applications. He was great to talk tech with. Matt and I had a fun time talking Optane DC PM and Memory Machine functionality/applications with him. Listen to the podcast to learn more.

Charles Fan, CEO & Co-founder, MemVerge

Charles Fan is co-founder and CEO of MemVerge. Prior to MemVerge, Charles was a SVP/GM at VMware, founding the storage business unit that developed the Virtual SAN product.

Charles also worked at EMC and was the founder of the EMC China R&D Center. Charles joined EMC via the acquisition of Rainfinity, where he was a co-founder and CTO.

Charles received his Ph.D. and M.S. in Electrical Engineering from the California Institute of Technology, and his B.E. in Electrical Engineering from the Cooper Union.

70: GreyBeards talk FMS18 wrap-up and flash trends with Jim Handy, General Dir. Objective Analysis

In this episode we talk about Flash Memory Summit 2018 (FMS18) and recent trends affecting the flash market with Jim Handy, General Director, Objective Analysis. This is the 4th time Jim’s been on our show and has been our go to guy on flash technology forever.

NAND supply?

Talking with Jim is always a far reaching discussion. We quickly centered on recent spot NAND pricing trends. Jim said the market is seeing a 10 to 12% pricing drop, Quarter/Quarter, almost 60% since the year started, in NAND spot pricing which is starting to impact long term contracts. During supply glut’s like this, DRAM spot prices typically drop 40-60% Q/Q, so maybe there’s more NAND price reductions on the way.

A new player in the NAND fab business was introduced at FMS18, Yangtze Memory Technology from China. Jim said they were one generation behind the leaders which says their product costs ($/NAND bit) are likely 2X the industry. But apparently, China is prepared to lose money until they can catch up.

I asked Jim if they have a hope of catching up – yes. For example, there’s been some shenanigans with DRAM technology and a Chinese DRAM Fab. They  have (allegedly)stolen technology from Micron’s Taiwan DRAM FAB. They in turn have sued Micron for patent infringement and won, locking Micron out of the Chinese DRAM market. With DRAM market tightening, Micron’s absence will hurt Chinese electronics producers. Others will step in, but Micron will have to focus DRAM sales elsewhere.

3D Xpoint/Optane?

There wasn’t much discussion on 3D XPoint. Intel did announce some customers for Optane SSDs and that they are starting to produce 3D XPoint in DIMMs. The Intel-Micron 3D XPoint partnership has disolved. Intel seems willing to continue to price their Optane and 3D XPoint DIMM below cost and make it up selling micro processors.

Jim predicted years back there would be little to no market for 3D Xpoint SSDs. With Optane SSDs at 5X higher cost than NAND SSDs and only 5X faster, it’s not a significant enough advantage to generate volumes needed to make a profitable product. But in a DIMM form factor, hanging off the memory bus, it’s 1000X faster than NAND, and with that much performance, it shouldn’t have a problem generating sufficient volumes to become profitable.

Other NAND/SCM news

We talked about the emergence of QLC NAND. With 3D NAND, there appears to be sufficient electrons to make QLC viable. The write speeds are still horrible,  ~1000X slower than SLC. But vendors are now adding SLC NAND (write cache) in their SSDs to sustain faster writes.

The other new technology from FMS18 was computational storage. Computational storage vendors are putting compute near (inside) an SSD to better perform IO intensive workloads. Some computational storage vendors   talked about their technology and how it could speed up select workloads

There’s SCM beyond 3D XPoint. These vendors have been quietly shipping for some time now, they just aren’t at the capacities/bit density to challenge NAND. Jim mentioned a few that were in production, EverSpin/MRAM, Adesto/ReRAM and Crossbar/FeRAM.

Jim said IBM was using EverSpin/MRAM technology in their latest FlashCore Modules for their FlashSystem 9100. And EverSpin MRAM is being used in satellites. Adesto/ReRAM is being used medical instrument market.

The podcast runs ~42 minutes. We apologize for the audio quality, we promise to do better next time. Jim’s been the GreyBeards memory and flash technology guru before our hair turned grey and is always enlightening about the flash market and technology trends.  Listen to the podcast to learn more.

Jim Handy, General Director, Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication. He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media.  He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com