This month we talk high performance, cluster file systems with Liran Zvibel (@liranzvibel), CEO and Co-Founder of WekaIO, a new software defined, scale-out file system. I first heard of WekaIO when it showed up on SPEC sfs2014 with a new SWBUILD benchmark submission. They had a 60 node EC2-AWS cluster running the benchmark and achieved, at the time, the highest SWBUILD number (500) of any solution.
At the moment, WekaIO are targeting HPC and Media&Entertainment verticals for their solution and it is sold on an annual capacity subscription basis.
By the way, a Wekabyte is 2**100 bytes of storage or ~ 1 trillion exabytes (2**60).
High performance file storage
The challenges with HPC file systems is that they need to handle a large number of files, large amounts of storage with high throughput access to all this data. Where WekaIO comes into the picture is that they do all that plus can support high file IOPS. That is, they can open, read or write a high number of relatively small files at an impressive speed, with low latency. These are becoming more popular with AI-machine learning and life sciences/genomic microscopy image processing.
Most file system developers will tell you that, they can supply high throughput OR high file IOPS but doing both is a real challenge. WekaIO’s is able to do both while at the same time supporting billions of files per directory and trillions of files in a file system.
WekaIO has support for up to 64K cluster nodes and have tested up to 4000 cluster nodes. WekaIO announced last year an OEM agreement with HPE and are starting to build out bigger clusters.
Media & Entertainment file storage requirements are mostly just high throughput with large (media) file sizes. Here WekaIO has a more competition from other cluster file systems but their ability to support extra-large data repositories with great throughput is another advantage here.
WekaIO cluster file system
WekaIO is a software defined storage solution. And whereas many HPC cluster file systems have metadata and storage nodes. WekaIO’s cluster nodes are combined meta-data and storage nodes. So as one scale’s capacity (by adding nodes), one not only scales large file throughput (via more IO parallelism) but also scales small file IOPS (via more metadata processing capabilities). There’s also some secret sauce to their metadata sharding (if that’s the right word) that allows WekaIO to support more metadata activity as the cluster grows.
One secret to WekaIO’s ability to support both high throughput and high file IOPS lies in their performance load balancing across the cluster. Apparently, WekaIO can be configured to constantly monitoring all cluster nodes for performance and can balance all file IO activity (data transfers and metadata services) across the cluster, to insure that no one node is over burdened with IO.
Liran says that performance load balancing was one reason they were so successful with their EC2 AWS SPEC sfs2014 SWBUILD benchmark. One problem with AWS EC2 nodes is a lot of unpredictability in node performance. When running EC2 instances, “noisy neighbors” impact node performance. With WekaIO’s performance load balancing running on AWS EC2 node instances, they can just redirect IO activity around slower nodes to faster nodes that can handle the work, in real time.
WekaIO performance load balancing is a configurable option. The other alternative is for WekaIO to “cryptographically” spread the workload across all the nodes in a cluster.
WekaIO uses a host driver for Posix access to the cluster. WekaIO’s frontend also natively supports (without host driver) NFSv3, SMB3.1, HDFS and AWS S3 protocols.
WekaIO also offers configurable file system data protection that can span 100s of failure domains (racks) supporting from 4 to 16 data stripes with 2 to 4 parity stripes. Liran said this was erasure code like but wouldn’t specifically state what they are doing differently.
They also support high performance storage and inactive storage with automated tiering of inactive data to object storage through policy management.
WekaIO creates a global name space across the cluster, which can be sub-divided into one to thousands of file systems.
Snapshoting, cloning & moving work
WekaIO also has file system snapshots (readonly) and clones (read-write) using re-direct on write methodology. After the first snapshot/clone, subsequent snapshots/clones are only differential copies.
Another feature Howard and I thought was interesting was their DR as a Service like capability. This is, using an onprem WekaIO cluster to clone a file system/directory, tiering that to an S3 storage object. Then using that S3 storage object with an AWS EC2 WekaIO cluster to import the object(s) and re-constituting that file system/directory in the cloud. Once on AWS, work can occur in the cloud and the process can be reversed to move any updates back to the onprem cluster.
This way if you had work needing more compute than available onprem, you could move the data and workload to AWS, do the work there and then move the data back down to onprem again.
WekaIO’s RtOS, network stack, & NVMeoF
WekaIO runs under Linux as a user space application. WekaIO has implemented their own Realtime O/S (RtOS) and high performance network stack that runs in user space.
With their own network stack they have also implemented NVMeoF support for (non-RDMA) Ethernet as well as InfiniBand networks. This is probably another reason they can have such low latency file IO operations.
The podcast runs ~42 minutes. Linar has been around data storage systems for 20 years and as a result was very knowledgeable and interesting to talk with. Liran almost qualifies as a Greybeard, if not for the fact that he was clean shaven ;/. Listen to the podcast to learn more.
Podcast: Play in new window | Download (Duration: 42:24 — 58.2MB) | Embed
Subscribe: Apple Podcasts | Google Podcasts | Spotify | Stitcher | Email | RSS
Linar Zvibel, CEO and Co-Founder, WekaIO
Liran also held principal architectural responsibilities for the hardware platform, clustering infrastructure and overall systems integration for XIV Storage System, acquired by IBM in 2007.
Mr. Zvibel holds a BSc.in Mathematics and Computer Science from Tel Aviv University.