Distributed computing – Grey Beards on Systems

February 14, 2025February 14, 2025

169: GreyBeards talk AgenticAI with Luke Norris, CEO&Co-founder, Kamiwaza AI

Luke Norris (@COentrepreneur), CEO and Co-Founder, Kamiwaza AI, is a serial entreprenaur in Silverthorne CO, where the company is headquartered.. They presented at AIFD6 a couple of weeks back and the GreyBeards thought it would be interesting to learn more about what they were doing, especially since we are broadening the scope of the podcast, to now be GreyBeards on Systems.

Describing Kamiwaza AI is a bit of a challenge. They settled on “AI orchestration” for the enterprise but it’s much more than that. One of their key capabilities is an inference mesh which supports accessing data in locations throughout an enterprise across various data centers to do inferencing, and then gathering replies/responses together, aggregating them into one combined response. All this without violating HIPPA, GDPR or other data compliance regulations.

Podcast: Play in new window | Download (Duration: 41:59 — 57.6MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Kamiwaza AI offer an opinionated AI stack, which consists of 155 components today and growing that supplies a single API to access any of their AI services. They support multi-node clusters and multiple clusters, located in different data centers, as well as the cloud. For instance, they are in the Azure marketplace and plans are to be in AWS and GCP soon.

Most software vendors provide a proof of concept, Kamiwaza offers a pathway from PoC to production. Companies pre-pay to install their solution and then can use those funds when they purchase a license.

And then there’s their (meta-)data catalogue. It resides in local databases (and replicated maybe) throughout the clusters and is used to identify meta data and location information about any data in the enterprise that’s been ingested into their system.

Data can be ingested for enterprise RAG databases and other services. As this is done, location affinity and metadata about that data is registered to the data catalogue. That way Kamiwaza knows where all of an organization’s data is located, which RAG or other database it’s been ingested into and enough about the data to understand where it might be pertinent to answer a customer or service query.

Maybe the easiest way to understand what Kamiwaza is, is to walk through a prompt.

A customer issues a prompt to a Kamiwaza endpoint which triggers,
A search through their data catalog to identify what data can be used to answer that prompt.
If all the data resides in one data center, the prompt can be handed off to the GenAI model and RAG services at that data center.
But if the prompt requires information from multiple data centers,
Separate prompts are then distributed to each data center where RAG information germane to that prompt is located
As each of these generate replies, their responses are sent back to an initiating/coordinating cluster
Then all these responses are combined into a single reply to the customer’s prompt or service query.

But the key point is that data located in each data center used to answer the prompt are NOT moved to other data centers. All prompting is done locally, at the data center where the data resides. Only prompt replies/responses are sent to other data centers and then combined into one comprehensive answer.

Luke mentioned a BioPharma company that had genonome sequences located in various data regimes, some under GDPR, some under APAC equivalents, others under USA HIPPA requirements. They wanted to know information about how frequent a particular gene sequence occurred. They were able to issue this as a prompt at a single location which spun up separate, distributed prompts for each data center that held appropriate information. All those replies were then transmitted back to the originating prompt location and combined/summarized.

Kamiwaza AI also has an AIaaS offering. Any paying customer is offered one (AI agentic) outcome per month per cluster license. Outcomes could effectively be any AI application they would like to perform.

One outcome he mentioned included:

A weather-risk researcher had tons of old weather data in a multitude of formats, over many locations, that had been recorded over time.
They wanted to have access to all this data so they can tell when extreme weather events had occurred in the past.
Kamiwaza AI assigned one of their partner AI experts to work with the researcher to have an AI agent comb through these archives, transform and clean all the old weather data into HTML data more amenable to analysis .
But that was just the start.. They really wanted to understand the risk of damage due to the extreme weather events. So the AI application/system was then directed to go and gather from news and insurance archives, any information that identified the extent of the damage from those weather events.

He said that today’s AgenticAI can implement a screen mouse click and perform any function that an application or a human could do on a screen. Agentic AI can also import an API and infer where an API call might be better to use than a screen GUI interaction.

He mentioned that Kamiwaza can be used to generate and replace a lot of what enterprises do today with Robotics Process Automation (RPAs). Luke feels that anything an enterprise was doing with RPA’s can be done better with Kamiwaza AI agents.

SaaS solution tasks are also something AgenticAI can easily displace . Luke said at one customer they went from using SAP APIs to provide information to SAP, to using APIs to extract information from SAP, to completely replacing the use of SAP for this task at the enterprise.

How much of this is fiction or real is subject of some debate in the industry. But Kamiwaza AI is pushing the envelope on what can and can’t be done. And with their AI aaS offering, customers are making use of AI like they never thought possible before. .

Kamiwaza AI has a community edition, a free download that’s functionally restricted, and provides a desktop experience of Kamiwaza AI’s stack. Luke sees this as something a developer could use to develop to Kamiwaza APIs and test functionality before loading on the enterprise cluster.

We asked where they were finding the most success. Luke mentioned anyone that’s heavily regulated, where data movement and access were strictly constrained. And they were focused on large, multi-data center, enterprises.

Luke mentioned that Kamiwaza AI has been doing a number of hackathons with AI Tinkerers around the world. He suggested prospects take a look at what they have done with them and perhaps join them in the next hackathon in their area.

Luke Norris, CEO & Co-Founder, Kamiwaza AI

Luke Norris is the co-founder of Kamiwaza.AI, driving enterprise AI innovation with a focus on secure, scalable GenAI deployments. With extensive experience raising over $100M in venture capital and leading global AI/ML deployments for Fortune 500 companies.

Luke is passionate about enabling enterprises to unlock the full potential of AI with unmatched flexibility and efficiency.

November 4, 2024November 4, 2024

167: GreyBeards talk Distributed S3 storage with Enrico Signoretti, VP Product & Partnerships, Cubbit

Long time friend, Enrico Signoretti (LinkedIn), VP Product and Partnerships, Cubbit, used to be a common participant at Storage Field Day (SFD) events and I’ve known him since we first met there. Since then, he’s worked for a startup and a prominent analyst firms. But he’s back at another startup and this one looks like it’s got legs.

Cubbit offers Distributed S3 compatible object storage that offers geo-distribution and geo-fencing for object data, in which the organization owns the hardware and Cubbit supplies the software. There’s a management component, the Coordinator, which could run on your hardware or as a SaaS service they provide but other than that, IT controls the rest of the system hardware. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 52:27 — 72.0MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Cubbit comes in 3 components:

One or more Storage nodes which includes their agent software running ontop of a linux system with direct attached storage.
One or more Gateway nodes which provides S3 protocol acces to the objects stored on storage nodes. Typical S3 access points https://S3.company_name, com/… points to either a load balancer, front end or one or more Gateway nodes. Gateway nodes provide the mapping between the bucket name/object identifier and where the data currently resides or will reside.
One Coordinator node which provides the metadata to locate the data for objects, manage the storage nodes, gateways and monitor the service. The Coordinator node can be a SaaS service supplied by Cubbit or a VM/bare metal node running Cubbit Coordinator software. Metadata is protected internally within the Coordinator node.

With these three components one can stand up a complete, geo-distributed/geo-fenced, S3 object storage system which the organization controls.

Cubbit encrypts data as it at the gateway and decrypts data when accessed. Sign-on to the system uses standard security offerings. Security keys can be managed by Cubbit or by standard key management systems.

All data for an object is protected by nested erasure codes. That is 1) erasure code within a data center/location over its storage drives and 2) erasure code across geographical locations/data centers..

With erasure coding across locations, customer with say 10 data center locations can have their data stored in such a fashion that as long as at least 8 data centers are online they still have access to their data, that is the Cubbit storage system can still provide data availability.

Similarly for erasure coding within the data center/location or across storage drives, say with 12 drives per stripe, one could configure lets say 9+3 erasure coding, where as long as 9 of the drives still operate, data will be available.

Please note the customer decides the number of locations to stripe across for erasure coding, and diet for the number of storage drives.

The customer supplies all the storage node hardware. Some customers start with re-purposed servers/drives for their original configuration and then upgrade to higher performing storage-servers-networking as performance needs change. Storage nodes can be on prem, in the cloud or at the edge.

For adequate performance gateways and storage nodes (and coordinator nodes) should be located close to one another. Although Coordinator nodes are not in the data path they are critical to initial object access.

Gateways can provide a cache for faster local data access.. Cubbit has recommendations for Gateway server hardware. And similar to storage nodes, Gateways can operate at the edge, in the cloud or on prem.

Use cases for the Distributed S3 storage include:

As a backup target for data elsewhere
As a geographically distributed/fenced object store.
As a locally controlled object storage to feed AI training/inferencing activity.

Most backup solutions support S3 object storage as a target for backups.

Geographically distributed S3 storage means that customers control where object data is located. This could be split across a number of physical locations, the cloud or at the edge.

Geographically fenced S3 storage means that the customer controls which of its many locations to store an object. For GDPR countries with multi-nation data center locations this could provide the compliance requirements to keep customer data within country.

Cubbit’s distributed S3 objects storage is strongly consistent in that an object loaded into the system at any location is immediately available to any user accessing it through any other gateway. Access times vary but the data will be the same regardless of where you access it from.

The system starts up through an Ansible playbook which asks a bunch of questions and loads and sets up the agent software for storage nodes, gateway nodes and where applicable, the coordinator node.

At any time, customers can add more gateways or storage nodes or retire them. The system doesn’t perform automatic load balancing for new nodes but customers can migrate data off storage nodes and onto other ones through api calls/UI requests to the Coordinator.

Cubbit storage supports multi-tenancy, so MSPs can offer their customers isolated access.

Cubbit charges for their service on data storage under management. Note it has no egress charges, and you don’t pay for redundancy. But you do supply all the hardware used by the system. They offer a discount for M&E customers as the metadata to data ratio is much smaller (lots of large files) than most other S3 object stores (mix of small and large files).

Cubbit is presently available only in Europe but will be coming to USA next year. So, if you are interested in geo-distributed/geo-fenced S3 object storage that you control and can be had for much cheaper than hyperscalar object storage, check it out.

Enrico Signoretti, VP Products & Partnerships

Enrico Signoretti has over 30 years of experience in the IT industry, having held various roles including IT manager, consultant, head of product strategy, IT analyst, and advisor.

He is an internationally renowned visionary author, blogger, and speaker on next-generation technologies. Over the past four years, Enrico has kept his finger on the pulse of the evolving storage industry as the Head of Research Product Strategy at GigaOm. He has worked closely and built relationships with top visionaries, CTOs, and IT decision makers worldwide.

Enrico has also contributed to leading global online sites (with over 40 million readers) for enterprise technology news.

July 7, 2023July 7, 2023

151: GreyBeards talk AI (ML) performance benchmarks with David Kanter, Exec. Dir. MLCommons

Ray’s known David Kanter (@TheKanter), Executive Director, MLCommons, for quite awhile now and has been reporting on MLCommons Mlperf AI benchmark results for even longer. MLCommons releases new benchmark results each quarter and this last week they released new Data Center Training (v3.0) and new Tiny Inferencing (v1.1) results. So, the GreyBeards thought it was time to get a view of what’s new in AI benchmarking and what’s coming later this year.

David’s been around the startup community in the Bay Area for a while now and sort of started at MLPerf early on as a technical guru working on submissions and other stuff and worked his way up to being the Executive Director/CEO. The big news this week from MLCommons is that they have introduced a new training benchmark and updated an older one. The new one simulates training GPT-3 and they also updated their Recommendation Engine benchmark. Listen to the podcast to learn more.

Podcast: Play in new window | Download (Duration: 49:01 — 67.3MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

MLCommons is an industry association focused on supplying recreatable, verifiable benchmarks for machine learning (ML) and AI which they call MLperf benchmarks. Their benchmark suite includes a number of different categories such as data center training, HPC training, data center inferencing, edge inferencing, mobile inferencing and finally tiny (IoT device) inferencing. David likes to say MLperf benchmarks range from systems consuming Megawatts (HPC, literally a supercomputer) to Microwatts (Tiny) solutions.

The challenge holding AI benchmarking back early on was a few industry players had done their own thing but there was way to compare one to another. MLcommons was born out of that chaos, and sought to create a benchmarking regimen that any industry player could use to submit AI work activity and would allow customers to compare their solution to any other submission on a representative sample of ML model training and inferencing activity

MLCommons has both an Open and Closed class of submissions. For the Closed class, submissions have a very strict criteria for submission. These include known open source AI models and data, accuracy metrics that training and inferencing need to hit, and reporting a standard set of metrics information for the benchmark. All of which need to be done in order to create a repeatable and verifiable submission.

Their Open class is a way for any industry participant to submit whatever model they would like, to whatever accuracy level they want, and it’s typically used to benchmark new hardware, software or AI models.

As mentioned above MLcommons training benchmarks use accuracy specification that must be achieved to have a valid submission. Benchmarks also have to be run 3 times. All submissions list hardware (CPU and Accelerators) and software (AI framework). And these could range from 0 accelerators (e.g. CPU only with no GPUs) to 1000’s of GPUs.

The new GPT-3 model is a very complex AI model, that seemed until yesterday, unlikely to ever be benchmarked. But apparently the developers at MLCommons (and their industry partners) have been working on this for some time now. In this round of results there were 3 cloud submissions and 4 on prem submissions for GPT-3 training.

GPT-3, -3.5 & -4 are all OpenAI solutions which power their ChatGPT text transformer Large Language Model (LLM). GPT-3 has 175B parameters and was trained on TBs of data covering web crawls, book crawls, official documentation, code, etc. OpenAI said, at GPT-3 announcement, it took over $10M and months to train.

MLcommons GPT-3 benchmark is not a full training run of GPT-3 but uses a training checkpoint, trained on a subset of data used for the original GPT-3 training, Checkpoints are used for long running jobs (training sessions, weather simulations, fusion energy simulations, etc) and copy all internal state of a job/system while its running (ok, quiesced) at some interval (say every 8hrs, 24 hrs, 48hrs, etc), so that in case of a failure, one could just restart the activity from the last checkpoint rather than the beginning.

MLCommons GPT-3 checkpoint has been trained on a 10B token data set. The benchmark starts with loading this checkpoint and trains on an even smaller subset of the data for GPT-3 and trains to achieve the accuracy baseline.

Accuracy for text transformers is not as simple as other models (correct image classification, object identification, etc.) and uses “perplexity”. Hugging Face defines perplexity as “the exponentiated average negative log-likelihood of a sequence.”

The 4 on-premises submissions for GPT-3 using 45 minutes (768 NVIDIA H100 GPUs) to 442 minutes (64 Habana Guadi2 GPUS). The 3 cloud submissions all used NVIDIA H100 GPUs and ranged from 768 (@47 minutes to train) to 3584 GPUs (@11 min. to train).

Aside from DataCenter training, MLcommons also released a new round of Tiny (IoT) inferencing benchmarks. These generally use smaller ARM processors and no GPUs with much smaller AI models such as Keyword spotting (“Hey SIRI”), visual wake words (door opening), image classification, etc.

We ended our discussion with me asking David why there was no storage oriented MLcommons benchmark. David said creating a storage benchmarks for AI is much different than inferencing or training benchmarks. But MLCommons has taken this on and now have a storage MLcommons series of benchmarks for storage that uses emulated accelerators.

At the moment, anyone with a storage system can submit MLcommons storage benchmark. After some time, MLcommons will only allow submissions from member companies but early on it’s open for all.

For their storage benchmarks, rather than using accuracy as benchmark criteria they use keeping (emulated) accelerators X% busy. This way storage support of the MLops activities can be isolated from the training and inferencing.

The GreyBeards eagerly anticipate the first round of MLcommons storage benchmark results. Hopefully coming out later this year.

March 2, 2023March 2, 2023

143: GreyBeards talk Chia cypto with Jonmichael Hands, VP Storage at Chia Project

Today we interview Jonmichael Hands (@LebanonJon, LinkedIn), VP Storage at Chia Project , who has been in and around the storage business forever, mostly with Intel and their SSD team, before it was sold. He was technical marketing for NVMe. He also ran the security and crypto track at FMS2022. He recently worked on sustainability, helping to create a circular economy for disk and SSD storage. Moreover, he assisted IEEE with their new (media) sanitization standard to make reuse/recycling storage easier.

Chia was born to provide a way to take advantage of storage media for blockchains in a government compliant way so that it could be spun off as a public company someday. Chia is a crypto currency that depends on proof of space (storage space exists) and proof of time (storage space is reserved for a period of time). There have been many crypto coins based on proof of work (running hard cryptographic algorithms to come up with some specific bit pattern). And ETH was forked last year to support proof of stake (where one stakes some amount of ETH for a defined period). But few, if any, have been based on proof of space and time.

Podcast: Play in new window | Download (Duration: 45:59 — 1.1GB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

Disk and SSD commands already exist to provide “Secure Erase” (multiple passes of different bit patterns overwriting the same block) and cryptographic erasure (For encrypted drives, the encryption key is changed). Both approaches insure that customer/organization data is no longer retained on media leaving an organization’s control. And yet, many companies use secure erase/cryptographic erasure and still shred disk drives and SSDs, just to be sure that no data is retained. This is a vast waste of energy and resources.

Jonmichael said that both disk and SSD drives typically have another 5 years beyond their guaranteed (5 year) production life where both can function perfectly well as storage devices (ok may performance may not be the same as current drives). And after using them for another 5 years, they are much easier to recycle, if left un-shredded and returned to manufacturers, who can dismantle them to reuse expensive components and rare earth materials.

We didn’t spend much time on the technical underpinnings of Chia so if you are interested in that we suggest you check out Jonmichael’s FMS2022 presentation video.

But if you’re interested in a high level understanding of Chia and what one can do with it we did cover that. For example, Chia has farmers (not miners). Farmers create (~100GB) Chia plot files and store these on media.

Plot files take some amount of CPU power and memory to create but once created can stay on storage forever. What makes Chia work is that it comes out and checks to see if you have a certain plot file and if you do you get rewarded for that. Jonmichael said that with a typical Chia crypto setup, one could make $0.50/TB/Month farming Chia.

The Chia project currently has about 24EB of plots online and at their peak had over 300EB. They also have 130K farmers in their current network. Bitcoin, at its peak, had about 60K miners. Jonmichael thinks Chia crypto coin may be the most distributed crypto coin in existence today.

A couple of years back Chia accounted for a significant amount of new disk drive purchases but that has died down considerably since then. As discussed earlier, Jonmichael is working to create a circular economy for storage that could lead to media reuse for Chia farming.

Jonmichael mentioned that Chia has matured significantly since peak use. It used to be that creating Chia plot files required high end CPUs and lots of technical skills, but today Jonmichael said you can be a farmer with an RPi. He did say that they have moved to making better use of available memory in the plotting process and have reduced the write load on the storage media.

Another aspect to Chia’s maturation is that they now support Chia smart coins or smart contracts. They have created ChiaLisp, a Turing complete language, as their language to implement Chia smart coins. It turns out that Lisp and other functional languages provide a natural way to implement secure code. Jonmichael mentioned that other crypto coins are starting to move towards using ChiaLisp.

Some recent innovations in Chia smart coins include:

Chia Offer Management – that is anything you wish to trade can be digitally tracked and traded using this Chia Offer Management smart coins.
Chia NFTs (non-fungible token) Management – NFT’s have been used by other blockchains to sell digital rights to assets Chia’s support for NFTs opens Chia up to this as well. The reference implementation for Chia’s NFT management is Chia Friends, where all proceeds are being donated to the Marmot Recovery Foundation.
Chia Data Layer Management, a federated database – here the Chia block chain is being used to support a K-V store, where the block chain stores the Key and a hash of the Value. Users can use this Chia Data Layer to store any key-hash(value) database they wish. It’s important to realize that actual the data or value is stored external to the Chia block chain.

The Data Layer solution is currently being used to develop a way to track carbon credits by the World Bank (see: the Climate Action Data Trust).

Chia has come a long way. In its heyday it was significant consumer of new disk media but with what Jonmichael and others have planned for it is to take advantage of the longer term life of storage media and to use this for the benefit of all humanity.

Jonmichael Hands, VP Storage at Chia Project

Jonmichael Hands partners with the storage vendors for Chia optimized product development, market modeling, and Chia blockchain integration.

Jonmichael spent the last ten years at Intel in the Non-Volatile Memory Solutions group working on product line management, strategic planning, and technical marketing for the Intel data center SSDs.

In addition, he served as the chair for NVM Express (NVMe), SNIA (Storage Networking Industry Association) SSD special interest group, and Open Compute Project for open storage hardware innovation.

Jonmichael started his storage career at Sun Microsystems designing storage arrays (JBODs) and holds an electrical engineering degree from the Colorado School of Mines.

November 8, 2022November 8, 2022

139: GreyBeards talk HPC file systems with Marc-André Vef and Alberto Miranda of GekkoFS

In honor of SC22 conference this month in Dallas, we thought it time to check in with our HPC brethren to find out what’s new in storage for their world. We happened to see that IO500 had some recent (ISC22) results using a relative new comer, GekkoFS (@GekkoFS). So we reached out to the team to find out how they managed to crack into the top 10. We contacted Marc-André Vef (@MarcVef), a Ph.D. student at Johannes Guttenberg University Mainz and Alberto Miranda (@amiranda_hpc) Ph.D. of Barcelona Supercomputing Center two of the authors on the GekkoFS paper.

GekkoFS is a new burst file system that is tailor made to create, process and tear down scratch data sets for HPC workloads. It turns out that HPC does lots of work using scratch files as working data sets. Burst file systems typically use another parallel file systems to (stage) read (permanent) data into the scratch files and write (permanent) result data out. But during processing, the burst file system handles all scratch data access. Listen to the podcast to learn more

Podcast: Play in new window | Download (Duration: 44:12 — 60.7MB) | Embed

Subscribe: Apple Podcasts | Spotify | RSS

We had never heard of a burst file system before but it’s been around for a while now in HPC. For example, BeeGFS provides one (check out our GreyBeards podcast on BeeGFS). BeeGFS supports both a PFS and a burst file system. GekkoFS only offers a burst file system.

GekkoFS is a distributed burst file systems which operates across nodes to stitch together a single global file system. GekkoFS is strictly open source at the moment and can be downloaded (see: GekkoFS Gitlab) and used by anyone.

They are considering in the future of supplying professional support but at the moment if you have an issue, Marc and Alberto suggest you use the GekkoFS GitLab incident tracking system to tell them about it.

Turns out Lustre, IBM Spectrum Scale, DAOS and other HPC file systems take gobs of overhead to create scratch files. And even though it takes a lot of IO to load scratch file data and write out results, there’s a whole lot more IO that gets done to scratch files during HPC jobs.

This sort of IO also occurs for AI/ML/DLL where training data is staged into a sort of scratch area (typically in memory, depending on size) and then repeatedly (re-)processed there. GekkoFS can offer significant advantages to AI/ML/DL work when training data is very large. Normally without a burst file system, one would need to shard this data across nodes and then deal with the partial training that results. But with GekkoFS, all you need do is stage it into the burst file system and read it from there.

GekkoFS is partially posix compliant. They install a client-side interposer library that intercepts those posix requests destined for GekkoFS files.

GekkoFS has no central metadata server, which means that all nodes in the GekkoFS cluster support metadata services. Filenames are hashed to tell GekkoFS which node has its (metadata &) data.

GekkoFS stores their data and metadata on local disks, SSDs or in memory (tempfs) storage. All local node storage in the cluster is stitched together into a single global file system.

GekkoFS supports strict consistency for IO and file creation/deletion within nodes. They use an internal transaction database to enforce this strict consistency.

Across nodes they support eventual consistency. Which means files created on one node may not be immediately viewable/accessible by other nodes in the cluster for a short period of time while (meta) data updates are propagated across the cluster.

As part of their consistency paradigm, GekkoFS doesn’t support directory locking. Jason mentioned that HPC “LS” (directory listings) commands can sometimes take forever due to directory locking No directory locking makes LS commands happen faster but may show inconsistent results (due to eventual consistency).

We had some discussion on this lack of directory locking and eventual consistency in file systems, but we agreed to disagree. They did say that for the HPC workloads (and probably AI/ML/DLL) workloads, their approach seems appropriate as they are way more read intensive than write intensive.

In any case, they must be doing something right as they have a screaming scratch file system for HPC work.

Marc will be attending SC22 in Dallas this month, so if your attending please look him up and say hello from us.

Marc-André Vef, Ph.D. student

Marc-André Vef is a Ph.D. candidate at the Johannes Gutenberg University Mainz. He started his Ph.D. in 2016 after receiving his B.Sc. and M.Sc. degrees in computer science from the Johannes Gutenberg University Mainz. His master’s thesis was in cooperation with IBM Research about analyzing file create performance in the IBM Spectrum Scale parallel file system (formerly GPFS).

During his Ph.D., he has worked on several projects focusing on file system tracing (in collaboration with IBM Research) and distributed file systems, among others. Most notably, he designed two ad-hoc distributed file systems: DelveFS (in collaboration with OpenIO), which won the Best Paper in its category, and GekkoFS (in collaboration with the Barcelona Supercomputing Center). GekkoFS placed fourth in its first entry in the 10-node challenge of the IO500 benchmark. The file system is actively developed in the scope of the EuroHPC ADMIRE project.

His research interests focus on file systems and system analytics.

Alberto Miranda, Ph.D., Senior Researcher, Barcelona Supercomputing Center

Dr. Eng. Alberto Miranda is a Senior Researcher in
advanced storage systems in the Computer Science Department of the Barcelona Supercomputing Center (BSC) and co-leader of the Storage Systems Research Group since 2019. Dr. Eng. Miranda received a diploma in Computer Engineering (2004), a M.Sc. degree in Computer Science (2006) and a M.Sc. degree in Computer Architectures, Networks and Systems (2008) from the Technical University of Catalonia (UPC-BarcelonaTech). He later received a Ph.D. degree Cum Laude in Computer Science from the Technical University of Catalonia in 2014 with his thesis “Scalability in Extensible and Heterogeneous Storage Systems”.

His current research interests include efficient file and storage systems, operating systems, distributed system architectures, as well as information retrieval systems. Since he started his work at BSC in 2007, he has published 14 papers in international conferences and journals, as well as 5 white papers and technical reports and 1 book chapter. Dr. Eng. Miranda is currently involved in several European and national research projects and has participated in competitively funded EU projects XtreemOS, IOLanes, Prace2IP, IOStack, Mont-Blanc 2, EUDAT2020, Mont-Blanc 3, and NEXTGenIO.

	GreyBeards talk Agen… on 169: GreyBeards talk AgenticAI…
	Computational (DNA)… on 155: GreyBeards SDC23 wrap up…
	155: GreyBeards SDC2… on 155: GreyBeards SDC23 wrap up…
	J Metz on 134: GreyBeards talk (storage)…
	Administrator on 68: GreyBeards talk NVMeoF/TCP…