0101: Greybeards talk with Howard Marks, Technologist Extraordinary & Plenipotentiary at VAST

As most of you know, Howard Marks (@deepstoragenet), Technologist Extraordinary & Plenipotentiary at VAST Data used to be a Greybeards co-host and is still on our roster as a co-host emeritus. When I started to schedule this podcast, it was going to be our 100th podcast and we wanted to invite Howard and the rest of the co-hosts to be on the call to discuss our podcast. But alas, the 100th Greybeards podcast came and went, before we could get it done. So we decided to refocus this podcast back on VAST Data.

We talked with Howard last year about VAST and some of this podcast covers the same ground (see last year’s podcast with Howard on VAST Data) but I highlighted below different aspects of their product that we also discussed.

For starters, VAST just finalized a recent round of funding, which if I recall, valued them at over $1B USD, or yet another data storage unicorn.

VAST is a scale out, disaggregated, unstructured data platform that takes advantage of the economics of QLC SSD (from Intel) combined with the speed of 3D XPoint storage class memory (Optane SSD, also from Intel) to support customer data. Intel is an investor in VAST.

VAST uses mutliple front end (controller) servers, with one or more HA NVMe drive module(s) connected via a dual infiniband or 100Gbps Ethernet RDMA cluster interconnect. The HA NVMe drive module has two (IO modules) adapter cards, one for each connection that takes IO and data requests and transfers them across a PCIe bus which connects to QLC and Optane SSDs. They also have a Mellanox (another investor) switch on their backend with a (round robin) DNS router to connect hosts to their storage (front-end) servers.

Each backend HA NVMe drive module has 12 1.5TB Optane U.2 SSDs and 44 15.4TB QLC SSDs, for a total of 56 drives. Customer data is first written to Optane and then destaged to QLC SSD.

QLC has the advantage of being 4 bits per cell (for a lower $/GB stored) but it’s endurance or drive writes/day (dw/d)) is significantly worse than TLC. So VAST has had to work to increase QLC endurance in their system.

Natively, QLC offers ~0.2 dw/d when doing random 4K writes. However, if your system does 128KB sequential writes, it offers 4.0 dw/d. VAST destages data from Optane SSDs to QLC in 1MB chunks which both optimizes endurance and reduces garbage collection write amplification within the drive.

Howard mentioned their frontend servers are stateless, i.e., maintain no state information about any IO activity going on. Any IO state information is maintained by their system in Optane SSDs. Each server maintains a work log (like) structure on Optane that describes what they are doing in support of host IO and other activities. That way, if one front end server goes down, another one can access its log and take over its activity.

Metadata is also maintained only on Optane SSDs. Howard called their metadata structure a V-tree (B-tree). VAST mirrors all meta-data and customer data to two Optane SSDs. So if one Optane SSD goes down, its pair can be used to continue operations.

In last years podcast we talked at length about VAST data protection and data reduction capabilities so we won’t discuss these any further here.

However, one thing worth noting is that VAST has a very large RAID (erasure code protection) stripe. Data is written to the QLC SSDs in a VAST designed, locally decodable erasure coding format.

One problem with large stripes is rebuild time. VAST’s locally decodable parity codes help with this but the other thing that helps is distributing rebuild IO activity to all front end servers in the system.

The other problem with large stripe sizes is garbage collection. VAST segregates customer data by “temporariness” based on their best guess. In this way all data in one stripe should have similar lifetimes. When it’s time for stripe garbage collection, having all temporary data allows VAST to jettison the whole stripe (or most of it) rather than having to collect and re-write old stripe data to another new stripe.

VAST came out supporting NFSv3 and S3 object storage protocols, Their next release adds support for SMB 2.2, data-at-rest encryption and snapshotting to an external S3 store. As you may recall SMB is a stateful protocol. In VAST’s home grown, SMB implementation, front end servers can take over SMB transactions from other failed servers, without having to fail the whole transaction and start over again.

VAST uses a fail in place, maintenance policy. That is failed SSDs are not normally replaced in customer deployments, rather blocks, pages, or SSDs are marked as failed and the spare capacity available in the drive enclosure is used to provide space for any needed rebuilt data.

VAST offers a 10 year maintenance option where the customer keeps the same storage for 10 full years. That way customers don’t have to migrate data from one system to another until their 10 years are up.

The podcast runs a little under 44 minutes. Howard and I can talk forever. He is always a pleasure to talk with as well as extremely knowledgeable about (VAST) storage and other industry solutions.  The co-hosts and I had a great time talking with him again. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data, Inc.

Howard Marks brings over forty years of experience as a technology architect for hire and Industry observer to his role as VAST Data’s Technologist Extraordinary and Plienopotentary. In this role, Howard demystifies VAST’s technologies for customers and customer requirements for VAST’s engineers.

Before joining VAST, Howard ran DeepStorage an industry test lab and analyst firm. An award-winning speaker, he has appeared at events on three continents including Comdex, Interop and VMworld.

Howard is the author of several books (all gratefully out of print) and hundreds of articles since Bill Machrone taught him journalism at PC Magazine in the 1980s.

Listeners may also remember that Howard was a founding co-Host of the Greybeards-on-Storage Podcast.

0100: GreyBeards talk with Colin Gallagher, VP Dig. Infra. Prod. Mkt. @ Hitachi Vantara

Sponsored By:

We have known Colin Gallagher (@worldc3), VP, Digital Infrastructure Product Marketing at Hitachi Vantara, for a long time and he has always been an all around smart storage guy. Colin’s team at Hitachi Vantara are bringing out a brand new, midrange storage system and we thought it would be a good time to catch up with him and learn about it.

The new Hitachi Vantara VSP E990 Storage System is an all NVMe SSD array for medium sized enterprises that need predictable, high IOPS-low latency performance with enterprise class functionality and world class reliability/availability. We asked Colin why they needed all NVMe levels of performance. Colin replied that many of these data centers are starting to use advanced HPC, AI, and data analytics applications together with their standard Oracle, SAP and Microsoft solutions. These combined workloads have an acute need for predictable, high end performance and enterprise class functionality in order to work well.

The VSP E99O comes from a long heritage of enterprise storage at Hitachi, most recently embodied in the Hitachi VSP 5000. In fact, the VSP E990 uses the same storage OS as the VSP 5000, with changes made to streamline it for use with higher performing, all NVMe storage on a dual controller architecture.

This means all the advanced storage functionality of the high end enterprise VSP 5000 are available on the VSP E990 midrange system, minus some items not pertinent to midrange such as mainframe attach.

Many of the software changes involved cache and cache management. In the VSP E990, cache is now automatically shared and distributed across controllers reducing the performance impact of mirroring. Further, Hitachi has added more cores and higher performing processors as well. As a result, the VSP E990 all NVMe array can provide up to 5.8M IOPS and a best in any networked storage system, IO response time as low as 64 µsec. Colin also mentioned that they have reduced flash drive rebuild times by 80%.

The VSP E990 comes in a 4U base configuration and can offer from ~6TB to up to over 6PB of virtual capacity with drive expansion. In 8U plus controller (on the audio, it was incorrectly stated as 6U, The Eds.), the VSP E990 provides slots for up to 96 NVMe SSDs. Just like all VSP storage, the VSP E990 also offers the Hitachi 100% Data Availability Guarantee, the world’s oldest. Further, the VSP E990 supports 6-9s (99.9999%) reliability.

In addition the VSP E990 also supports Hitachi Adaptive Data Reduction, which compresses and deduplicates data to increase virtual capacity and reduce physical footprint. In the VSP E990, Adaptive Data Reduction uses AI to determine the best time to deduplicate data while at the same time optimizing host IO performance and effective storage capacity.

Hitachi Ops Center

During the last year or so Hitachi Vantara introduced its new Hitachi Ops Center solution to better administer and manage storage and other digital infrastructure. Ops Center now comes with 4 components: Administrator, Protector (copy data management), Automator and Analyzer.

  • Administrator supplies an element manager for VSP, other storage, and digital infrastructure in the data center.
  • Protector provides enterprise class, copy data management to protect, migrate, and archive VSP data storage.
  • Analyzer supports AI analysis of the data center’s storage operations to monitor SLAs, troubleshoot shoot problems, and improve storage performance as well as 3rd party compute, network and storage.
  • Automator supplies a series of templates and services to automate mundane, manual storage and other digital infrastructure tasks required to configure, operate and manage these systems in the data center. Automator provides a number of templates which customers can tailor to automate infrastructure operations such as provisioning an ESXi data store. The templates together with Automator services automatically carry out all the OS, fabric and storage/digital infrastructure tasks and activities required to perform these functions.

Hitachi EverFlex consumption models

Hitachi Vantara is also introducing EverFlex, a new series of consumption models, that any customer can use to provide more financial flexibility in their data center digital infrastructure acquisitions, deployments, and management.

EverFlex offers customers the option to purchase, lease or buy on a pay-as-you-go, cloud-like basis any Hitachi Vantara storage or digital infrastructure. Colin mentioned there were two ways that pay-as-you-go can operate,

  1. Customers pay on pure capacity over time basis. Here the customer would contract for a certain capacity and Hitachi Vantara would install storage/digital infrastructure capacity and would bill them monthly for it.
  2. Customers pay on an SLA over time basis. Here they would contract for a specific SLA, such as IOPS or other performance characteristic and Hitachi Vantara would install and maintain any storage/digital infrastructure to meet that SLA and bill them monthly for it.

Colin said that all Hitachi, world-class services are also now available to be purchased under EverFlex.

The podcast ran ~24 minutes. Colin has always been easy to talk with and very knowledgeable about storage. We were very impressed with the performance and innovation in the VSP E990 as well as Ops Center and EverFlex. Keith and I had fun discussing these solutions with Colin. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Colin Gallagher, VP Digital Infrastructure Product Marketing at Hitachi Vantara

Colin is Vice President for Digital Infrastructure Product Marketing at Hitachi Vantara where he leads product marketing for storage systems, storage software, and converged/hyper-converged solutions.

Over his 25-year career he has lead marketing and product management team teams at several major storage companies. Colin has a passion for telling compelling stories about technical products that help customers solve both business and personal pain – and he enjoys the challenge of telling them in creative ways.

He holds a bachelor’s degree from Georgetown University and an MBA from Northeastern University. Colin tries to put as many miles on his bike as possible, “hangs out” on twitter as @worldc3, and (unlike the GreyBeards) is team Oxford comma.

099: GreyBeards talk Folding@Home with Mike Harsch, a longtime enthusiast

Microscopic picture of Coronavirus

Mike Harsch (@harschness) is a personal friend, a computer enthusiast with a particular and enduring interest in distributed systems and GPU computing. MIke’s been a longtime user and proponent of Folding@Home, a distributed system focused on protein dynamics that anyone can download and run on their personal computer(s) or gaming devices.

We started the discussion on the history of distributed processing using home computers. Mike apparently first ran accross these systems in college and was using one in his college dorm room, back in 1997. At the time there was a system called, distributed.net, which was attempting to crack the (RC5-56[bit]) encryption keys used for computer security and offered a $10K prize for solving it. That was solved in 250 days (source: wikipedia article on distributed.net). Distributed.net is still up and working but since then they have moved to ever larger keys.

Next came Seti@Home which was a 2nd gen distributed system. SETI @Home sent out slices of recorded radio telescope spectrum and tasked people’s computers (during screen saving) to analyze that spectrum for alien signals. Seti@Home painted a nice image of the analysis. Seti@Home also used some gamification, where users gained points for analyzing spectrum. Over time they had something like a leader board tracking the top users. Recently, Seti@Home shut down their distributed system and changed their focus to analyze all the results they received from their users. I was a SETI@Home user for a while.

Folding@Home

Folding@Home is 3rd generation distributed computing solution built along the same lines but rather than searching for aliens, with Folding@Home you are running a simulation of what a protein molecule does over time. Mike mentioned that a typical Folding@Home work unit is to simulate a few nanoseconds in the life of a protein and this could take an hour or more on a x86 class multi-core CPU (with less time on GPUs).

Mike mentioned that there was a recent Ask Me Anything (AMA) event on Reddit with the team on Folding@Home answering questions. And on March 15th, the team at Folding@Home clarified how they are helping to solve the COVID-19 pandemic.

Keith has used Folding@Home in the past. And my son was an early user as well.

What Folding@Home does

Fold@Home uses idle CPU or GPU time on home gaming platforms/computers/servers or data center servers. Initially, in October of 2000, it was used to understand protein folding. But nowadays it’s gone beyond just folding, to simulate the life of a protein.

Prior to their turn to concentrate on COVID-19, they usually had ~30K active users, supplying ~100PFlops (100 quintillian x86 double precision floating point operations per second) of compute power. 

You get points for doing Folding@Home work. When Folding@Home was launched it was designed to use a single CPU/single core. Sometime in 2006, they released a SMP version of the code ,which could use multi-cores. Later they released a multi-threaded version which worked better on multi-core CPUs. And within the last few years, they have released a GPU support that could take advantage of the massive numbers of GPU cores available today.

Mike said that Folding@Home work unit GPU is generally 10 to 100X faster than what can be done with multi-core/multi-threaded CPU systems. 

Around Feb 27, Folding@Home announced they were going to focus all their efforts on understanding how to combat the COVID-19 coronavirus. After the announcement, their user count went through the roof, to now ~400K active users/day. This led to throttling requests for work and delays in handling responses. Over the ensuing weeks, (as of 3/18), they seem to have added enough resources to support their current levels of users.

The architecture of the old Folding@Home system was 2 tiered, they had a set of Folding@Home front-end servers that handled web traffic and distributed the work requests/responses to a set of backend servers that supplied work requests to users and combined work results. In their latest rush they seemed to have had to add servers, networking and storage to both tiers.

Sometime around March 25th, Folding@Home became the firsth and only ExaFlop supercomputer, achieving 1.56 (x86) ExaFlops (10**18 FLOPS, source: wikipedia article on Folding@Home) and have over 1 million active computing devices (GPUs & CPUs) in their network (see: Greg Bowwan’s status tweet).

Deploying Folding@Home on your systems

Folding@Home operates on any number of endpoint devices OSs and gaming console -systems. It comes in two software packages, one is the software that logs into the Folding@Home server to gather the next slice of work unit to perform and the other is the one that does the simulation work. They have an option to paint a picture of what is happening but most disable this feature to devote 100% of any idle CPU/GPU resources to the simulation. They also have a support forum, if you have any questions or need assistance in deploying their software.

Keith mentioned that some gal at VMware asked VMware users to devote their home server CPUs/GPUs to the project. I checked their website and they have a vSphere appliance (FLING) that will run Folding@Home and will register itself as joining the VMware team. Mike mentioned that GitHub (announced on Twitter) was going to supply up to 60K CPU core hours a day to the project. They recently reported that they are shifting work units from understanding COVID-19 to screening compounds for therapeutic potential against the coronavirus.

The world needs you to help solve the COVID-19 pandemic. So join up with Folding@Home to do your part. Downloading the software and installing it on a Mac was easy. Just don’t forget to reboot afterwards and then run FAHcontrol and FAHviewer in “Applications/Folding@home” folder to see what’s going on.

The podcast runs a little under 40 minutes. Mike was very knowledgeable about the IT side of Folding@Home, but was less knowledgeable about the biological side of what they are doing.  Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Mike Harsch, a computer

Mike is a long time computer enthusiast with particular interests in distributed systems and GPU computing.  He lives in CO and has a basement full of (GPUs &) computers.

Mike and I have co-coached a local high school, FTC robotics team for the last 4 years. And Mike has been involved with FTC robotics for much longer than that.