157: GreyBeards talk commercial cloud computer with Bryan Cantrill, CTO, Oxide Computer

Bryan Cantrill (@bcantrill), CTO, Oxide Computer was a hard man to interrupt once started but the GreyBeards did their best to have a conversation. Nonetheless, this is a long podcast. Oxide are making a huge bet on rack scale computing and have done everything they can to make their rack easy to unbox, setup and deploy VMs on.

They use commodity parts (AMD EPYC CPUs) and package them in their own designed hardware (server) sleds, which blind mate to networking and power in the back of the own designed rack. They use their own OS Helios (OpenSolaris derivative) with their own RTOS, Hubris, for system bringup, monitoring and the start of their hardware root of trust. And of course, to make it all connect easie,r they designed and developed their own programmable networking switch. Listen to the podcast to learn more.

Oxide essentially provides rack hardware which supports EC2-like compute and EBS-like storage to customers. It also has Terraform plugins to support infrastructure as code. In addition, all their software is completely API driven.

Bryan said time and time again, developing their own hardware and software made everything easier for them and their customers. Customers pay for hardware but there’s absolutely NO SOFTWARE LICENSING FEEs, because all their software is open source.

For example, the problem with AMI bios and UEFIs is their opacity, There’s really no way to understand what packages are included in its root of trust because it’s proprietary. Brian said one company UEFI they examined, had URL’s embedded in firmware. It seemed odd to have another vendor’s web pages linked to their root of trust.

Bryan said they did their own switch to reduce integration and validation test time. The Oxide rack supports all internal networking, compute sled to compute sled, and ToR switch (with no external cabling) and has 32 networking ports to connect the rack to the data center’s core networking.

As for storage, Bryan said each of the 10 U.2 NVMe drives in their compute sled is a separate, ZFS file system and customer data is 3 way mirrored across any of them. ZFS also provides end to end checksumming across all customer data for IO integrity.

Bryan said Oxide Computer rack bring up is 1) plug it in to core networking and power, 2) power it on, 3) attach a laptop to their service processor, 4) SSH into it, 5) Run a configuration script and your ready to assign VMs. He said that from the time an Oxide Rack hits your dock until you are up and firing up VMs, could be as short as an HOUR.

The Rust programming language is the other secret to Oxide’s success. More to the point their company is named after Rust (oxide get it). Apparently just about any software they developed is written in Rust.

The question for Oxide and every other computer and storage vendor is – do you believe that on premises computing will continue for the foreseeable future. The GreyBeards and Oxide believe yes. If not for compliance and better latency but also because it often costs less.

Bryan mentioned they have their own podcast, Oxide and Friends. On their podcast, they did a board bring up series (Tales from the Bring-Up Lab) and a series on taking their rack through FCC compliance (Oxide and the Chamber of Mysteries).

Bryan Cantrill, CTO, Oxide Computers

Bryan Cantrill is a software engineer who has spent over a quarter of a century at the hardware/software interface. He is the co-founder and CTO of Oxide Computer Company, the creator of the world’s first commercial cloud computer.

Prior to Oxide, he spent nearly a decade at Joyent, a cloud computing pioneer; prior to Joyent, he spent 14 years at Sun Microsystems.

Bryan received the Sc.B. magna cum laude with honors in Computer Science from Brown University, and is a MIT Technology Review 35 Top Young Innovators alumnus.

You can learn more about his work with Oxide at oxide.computer, or listen in on their weekly live show, Oxide and Friends (link above), on Discord or anywhere you get your podcasts.

138: GreyBeards talk big data orchestration with Adit Madan, Dir. of Product, Alluxio

We have never talked with Alluxio before but after coming back last week from Cloud Field Day 15 (CFD15) it seemed a good time to talk with other solution providers attempting to make hybrid cloud easier to use. Adit Madan (@madanadit) , Director of Product Management, Alluxio, which is a data orchestration solution that’s available in both a free to download/use, open source, community edition (apparently, Meta is a customer ) or a licensed, closed source, enterprise edition.

Alluxio data orchestration is all about suppling local like, IO access to data that resides elsewhere for BI, AI/ML/DL, and just about any other application needing to process data residing elsewhere. Listen to the podcast to learn more

Alluxio started out at UC Berkeley’s AMPlab, which is focused on big data problems and was designed to provide local access to massive amounts of distributed data. Alluxio ends up constructing a locally accessible, federation of data sources for compute apps running elsewhere,

Alluxio software installs near where compute apps run that need access to remote data. We asked about a typical cloud bursting case where S3 object data needed by an app are sitting on prem, but the apps need to run in a cloud, e.g., AWS.

He said Alluxio software would be deployed in AWS, close to app compute and that’s all there is. There’s no Alluxio software running on prem, as Alluxio just uses normal (remote access) S3 APIs to supply data to the compute apps running in AWS.

Adit mentioned that BI was one of the main applications to take advantage of Alluxio, but AI/ML/DL learning is another that could use data orchestration. It turns out that AI/ ML/DL training’s consumption of data is repetitive and highly sequential, so caching, sequential pre-fetch and other Alluxio techniques can work well there to provide local-like access to remote data.

Adit said that enterprises are increasingly looking to avoid vendor lock-in and this applies equally well to the cloud. By supporting data access in one location, say GC,P and accessing that data from another, say Azure, data gravity need no longer limit where work is done.

Adit said what makes their solution so valuable is that instead of duplicating all data from one place to another all that Alluxio moves is just the data required/requested by the apps running there.

Keith asked whether Adit considered Alluxio a data mesh or data fabric. Keith had to explain the terms to me and said data fabrics are pipes and physical infrastructure/functionality that moves data around and data mesh is what gives clients/apps/users access to that data. From that perspective Alluxio is a data mesh.

Alluxio Caching

Adit said that caching is one of the keys to making Alluxio work. Much of the success of their solution depends on applications having a well behaved working set. He also mentioned they use pre-fetching and other techniques to minimize access latency and maximize throughput. However, the first byte of data being accessed may take some time to get to where compute executes.

Adit said it’s not unusual for them to have a 1/2PB of cache (storage) for an application with multiPBs of source data.

Keith asked how Alluxio’s performance can be managed. Adit said they (we assume enterprise edition) have a solution called Cache Insights which uses Alluxio’s extensive access pattern history to predict application IO performance with larger cache (storage), higher speed networking, higher performing/more compute cores, etc. In this way, customers can see what can be done to improve application IO performance and what it would cost.

Keith asked if Alluxio were available as a SaaS solution. Adit said, although it could be deployed in that fashion, it’s not currently a SaaS solution. When asked how Alluxio (enterprise) was priced, Adit said it’s a function of the total resources consumed by their service, i.e, storage (cache), cores, networking that runs Alluxio software etc.

As for deployment options, it turns out for Spark, Alluxio is just another lib package installed inside Spark. For K8s, Alluxio is installed as a CSI drivers and a set of containers and can be deployed as containers within a cluster that needs access to data or in an external, standalone K8s cluster, servicing IO from other clusters. Alluxio HA is supplied by using multiple nodes to provide IO access.

Alluxio also supports access to multiple data locations. In this case, the applications would just access different mount points.

Data reads are easy, writes can be harder due to data integrity issues. As such, trying to supply IO performance becomes a trade off for data integrity when data updates are supported. Adit said Alluxio offers a couple of different configuration options for write concurrency (data integrity) that customers can select from. We assume this includes write through, write back and perhaps other write consistency options.

Alluxio supports AWS, Azure and GCP cloud compute accessing HDFS, S3 and Posix protocol access to data residing at remote sites. At remote sites, they currently support MinIO, Cloudian and any other S3 compatible storage solutions as well as NetApp (ONTAP) and Dell (ECS) storage as data sources.

Adit Madan, Director of Product, Alluxio

Adit Madan is the Director of Product Management at Alluxio. Adit has extensive experience in distributed systems, storage systems, and large-scale data analytics.

Adit holds an MS from Carnegie Mellon University and a BS from the Indian Institute of Technology – Delhi.

Adit is the Director of Product Management at Alluxio and is also a core maintainer and Project Management Committee (PMC) member of the Alluxio Open Source project.

130: GreyBeards talk high-speed database access using Apache Arrow Flight, with James Duong and David Li

We had heard about Apache Arrow and Arrow Flight as being a hi-performing database with access speeds to match for a while now and finally got a chance to hear what it was all about with James Duong, Co-Fourder of Bit Quill Technologies/Senior Staff Developer at Dremio and David Li (@lidavidm), Apache PMC and software developer at Voltron Data.

First, Apache Arrow is an open source, in memory data base (GitHub repo) for columnar data that enables lightening fast access and processing of data. Apache Arrow Flight is a set of interfaces, protocols, and services that parallelizes access to load and unload Arrow data over the network, from storage to memory and back, very fast. Listen to the podcast to learn more.

Columnar databases are all the rage these days and have more or less taken over from row oriented data bases. With row based database, data is stored (and accessed) row by row. In a columnar database, data is stored in columns, i.e, all data for one column is stored in sequence and then the next column is stored in sequence. Columnar databases can be queried/processed faster than row databased (depending on whether you are looking at/accessing multiple columns per row or not). And columnar data should compress better as all the data in a single column is of the same type..

Also the fact that columns are located contiguous in memory means if you process a column at a time, CPU data caches should work better. This is because they can grab a whole vector (columns worth of data) with one request.

Arrow data is processed and accessed in record batches. These are 2D segments which represent all the columns in a sequence/set of rows. And record batches are the unit of parallelism in Arrow and Arrow Flight. So an Arrow client operating on a CPU thread/core/chip or server could be processing one record batch while another CPU thread/core/CPU or server could process a different record batch.

Arrow Flight (GitHub RPC format doc repo) is an RPC framework that includes API’s, protocols, standards (for on storage, on wire and in memory) and libraries used to transfer Arrow data and metadata (record batches) across the network. For the typical system there exists Flight clients and Flight services in a system.

Arrow Flight currently uses Google’s gRPC for data transfers. gRPC is a open source remote procedure call (RPC) service that supports within data center, across data centers and out to the edge processing services. Although Arrow Flight is currently implemented on top of gRPC, other network protocols will be supported in the future.

What makes Arrow Flight so fast is its ability to support parallel transfers. That is customers can configure Arrow (Flight) clients across clusters of servers and Arrow (Flight) services residing on one or more other servers. Any client can request metadata and record batches from any end point (Flight service) in the data center. And yes Arrow data can be supplied from multiple end points by being mirrored/replicated. All data transfers can operate in parallel across all Flight client and services, with no known bottleneck other than the network.

A single stream of Arrow Flight data was able to deliver 20GB/sec. The fact that you can have any (?) number of Arrow Flight data streams in operation at the same time makes that a very interesting number.

Also, Arrow data can be stored on or sourced from typical data lakes such as Azure Data Lake, AWS S3, Google Cloud storage, etc.

Another advantage of Arrow Flight is the ability to use the same format on the wire and in storage. Normally JDBC (and ODBC) have on storage and on wire formats which require format conversion (serialization) to move data from storage/memory to wire and another conversion (deserialization) to move data from on wire format to in storage/memory format. Arrow Flight does away with serialization and deserialization of data all together and uses the same format for on wire and in storage.

Arrow Flight SQL allows Arrow processing of SQL database data. My understanding is that customers using non Arrow databases such as Oracle, SQL Server, Postgres, etc. can use Arrow Flight SQL to provide Arrow in-memory database processin/query execution for their data.

Arrow and Arrow flight are primarily used to process data analytics workloads but Arrow also has a new execution engine, the Arrow Gandiva project, that enables vectorized processing of Arrow data. This is a special execution engine for Arrow that supports X86 cores with AVX instructions, (NVIDIA) GPUs, and FPGAs.

There’s also an open source package, Fletcher, used to create Arrow and Arrow flight processing HDLs so that customers can add Arrow data processing and Arrow Flight data transfer functionality to custom built FPGAs.

One challenge with open source software is support for problems/bugs that crop up. An active developer community helps, but enterprise customers require professional, on call 7×24 (5×12?) support for all their critical (and most non-critical) software. Voltron Data (David’s) company provides paid for support for Arrow Flight and Arrow data services.

The other major problem with open source software has been use complexity. At the moment the Arrow Flight team is very responsive in clarifying documentation and are trying to make it easier to use. But at the moment Arrow Flight is mostly a set of APIs, libraries and connectors that end users can use to standup Arrow (Flight) clients and servers to transfer Arrow data between them.

James Duong, Co-Founder Bit Quill Technologies & Sr. Staff Developer at Dremio

An Apache Arrow contributor, cofounder at Bit Quill Technologies, and contributor to Dremio Corporation projects, James Duong has worked with databases for over 15 years, from backend query engines to drivers and protocols. He’s worked with a variety of relational, big data, and cloud databases including Dremio, SQL Server, Redshift, and Hive.

Previously at Simba Technologies, James architected and built connectors for sources, as well as designing the Simba Engine SDK for developing connectivity solutions for any data source.

Bit Quill Technologies, the company James helped co-found, builds back end software in the data and cloud space. Bit Quill has built a name for itself as a producer of high-quality software, a collaborative approach to design and development, and a love for good tech and happy people.

Balancing his passion for the data ecosystem with a young family, James occasionally steps away from it all to go hiking.

David Li, Apache Arrow PMC and software engineer at Voltron Data

David is a PMC member for Apache Arrow and a software engineer at Voltron Data (formerly known as Ursa Computing). Prior to that, he worked on data services and Apache Arrow at Two Sigma.

David holds an M.Eng. in Computer Science from Cornell University.

124: GreyBeards talk k8s storage orchestration using CNCF Rook Project with Sébastien Han & Travis Nielsen, Red Hat

Stateful containers are becoming a hot topic these days so we thought it a good time to talk to the CNCF (Cloud Native Computing Foundation) Rook team about what they are doing to make storage easier to use for k8s container apps. CNCF put us into contact with Sébastien Han (@leseb_), Ceph Storage Architect and Travis Nielsen (@STravisNielsen), both Principal Software Engineers at Red Hat and active on the Rook project. Rook is a CNCF “graduated” open source project just like Kubernetes, Prometheus, ContainerD, etc., this means it’s mature enough to run production workloads.

Rook is used to configure, deploy and manage a Red Hat Ceph(r) Storage cluster under k8s. Rook creates all the k8s deployment scripts to set up a Ceph Storage cluster as containers, start it and monitor its activities. Rook monitoring of Ceph operations can restart any Ceph service container or scale any Ceph services up/down as needed by container apps using its storage. Rook is not in the Ceph data path, but rather provides a k8s based Ceph control or management plane for running Ceph storage under k8s.

Readers may recall we talked to SoftIron, an appliance provider, for Ceph Storage in the enterprise for our 120th episode. Rook has another take on using Ceph storage, only this time running it under k8s,. Listen to the podcast to learn more.

The main problem Rook is solving is how to easily incorporate storage services and stateful container apps within k8s control. Containerized apps can scale up or down based on activity and storage these apps use needs the same capabilities. The other option is to have storage that stands apart or outside k8s cluster and control. But then tho container apps and their storage have 2 (maybe more) different control environments. Better to have everything under k8s control or nothing at all.

Red Hat Ceph storage has been available as a standalone storage solutions for a long time now and has quite the extensive customer list, many with multiple PB of storage. Rook-Ceph and all of its components run as containers underneath k8s.

Ceph supports replication (mirroring) of data 1 to N ways typically 3 way or erasure coding for data protection and also supports file, block and object protocols or access methods. Ceph normally consumes raw block DAS for it’s backend but Ceph can also support a file gateway to NFS storage behind it. Similarly, Ceph can offers an object storage gateway option. But with either of these approaches, the (NFS or object) storage exists outside k8s scaling and resiliency capabilities and Rook management.

Ceph uses storage pools that can be defined using storage performance levels, storage data protection levels, system affinity, or any combination of the above. Ceph storage pools are mapped to k8s storage classes using the Ceph CSI. Container apps that want to use storage would issue a persistent volume claim (PVC) request specifying a Ceph storage class which would allocate the Ceph storage from the pool to the container.  

Besides configuring, deploying and monitoring/managing your Ceph storage cluster, Rook can also automatically upgrade your Ceph cluster for you. 

We discussed the difference between running Rook-Ceph within k8s and running Ceph outside k8s. Both approaches depend on Ceph CSI but with Rook, Ceph and all its software is all running under k8s control as containers and Rook manages the Ceph cluster for you. When it’s run outside 1) you manage the Ceph cluster and 2) Ceph storage scaling and resilience are not automatic. 

Sébastien Han, Principal Software Engineer, Ceph Architect, Red Hat

Sebastien Han currently serves as a Senior Principal Software Engineer, Storage Architect for Red Hat. He has been involved with Ceph Storage since 2011 and has built strong expertise around it.

Curious and passionate, he loves working on bleeding edge technologies and identifying opportunities where Ceph can enhance the user experience. He did that with various technology such as OpenStack, Docker.

Now on a daily basis, he rotates between Ceph, Kubernetes, and Rook in an effort to strengthen the integration between all three. He is one of the maintainers of Rook-Ceph.

Travis Nielson, Principal Software Engineer, Red Hat

Travis Nielsen is a Senior Principal Software Engineer at Red Hat with the Ceph distributed storage system team. Travis leads the Rook project and is one of the original maintainers, integrating Ceph storage with Kubernetes.

Prior to Rook, Travis was the storage platform tech lead at Symform, a P2P storage startup, and an engineering lead for the Windows Server group at Microsoft.