38: GreyBeards talk with Rob Peglar, Senior VP and CTO, Symbolic IO

In this episode, we talk with Rob Peglar (@PeglarR), Senior VP and CTO of Symbolic IO, a computationally defined storage vendor. Rob has been around almost as long as the GreyBeards (~40 years) and most recently was with Micron and prior to that, EMC Isilon. Rob is also on the board of SNIA.

Symbolic IO has emerged out of stealth earlier this year and intends to be shipping products by late this year/early next.  Rob joined Symbolic IO in July of 2016.

What’s computational storage?

It’s all about symbolic representation of bits. Symbolic IO has  come up with a way to encode bit streams into unique symbols that offer significant savings in memory space, beyond standard data compression techniques.

All that would be just fine if it was at the end of a storage interface and we would probably just call it a new form of data reduction. But Symbolic IO also incorporates persistent memory (NV-DIMMs, in the future 3D XPoint, RERam, others) and provides this symbolic data inside a server, directly through its processor data cache, in (decoded) raw data form.

Symbolic IO provides a translation layer between persistent memory and processor cache that decodes the symbolic representation of the data in persistent memory for data reads on the way into data cache and encodes the symbolic representation of the raw data for data writes on the way out of cache to persistent memory.

Rob says that the mathematics are there to show that Symbolic IO’s data reduction is significant and that the decode/encode functionality can be done in a matter of a few clock cycles per cache (line) access on modern (Intel) processors.

The system continually monitors the data it sees to determine what the optimum encoding should be and can change its symbolic table to provide more memory savings for new data written to persistent memory.

All this reminds the GreyBeards of Huffman encoding algorithms for data compression (which one of us helped deploy on a previous [unnamed] storage product). Huffman encoding transformed ASCII (8-bit) characters into variable length bit streams.

Symbolic IO will offer 3 products:,

  • IRIS™ Compute, which provides a persistent memory storage, accessed using something like the Linux pmem library and includes Symbolic StoreModules™ (persistent memory hardware);
  • IRIS Vault, which is an appliance with its own (IRIS) infused Linux (Symbolic’s SymCE™) OS plus Symbolic IO StoreModules, that can run any Linux application without change accessing the persistent memory and offers full data security, next generation snapshot-/clone-like capabilities with BLINK™ full storage backups, and offers enhanced physical security with the removable, IRIS Advanced EYE ASIC; and
  • IRIS Store, which extends the IRIS Vault and IRIS Compute above with more tiers of storage, using Symbolic IO StoreModules as Tier1, PCIe (flash) storage as Tier 2 and external SSD storage as Tier 3 storage.

For more information on Symbolic IO’s three products, so we would encourage you to read their website (linked above).

The podcast runs long, over 47 minutes, and was wide ranging, discussing some of the history of processor/memory/information technologies. It was very easy to talk with Rob and both Howard and I have known Rob for years, across multiple vendors & organizations.  Listen to the podcast to learn more.

peglar_robert_160x200Rob Peglar, Senior VP and CTO, Symbolic IO

Rob Peglar is the Senior Vice President and Chief Technology Officer of Symbolic IO. Rob is a seasoned technology executive with 39 years of data storage, network and compute-related experience, is a published author and is active on many industry boards, providing insight and guidance. He brings a vast knowledge of strategy and industry trends to Symbolic IO. Rob is also on the Board of Directors for the Storage Networking Industry Association (SNIA) and an advisor for the Flash Memory Summit. His role at Symbolic IO will include working with the management team to help drive the future product portfolio, executive-level forecasting and customer/partner interaction from early-stage negotiations through implementation and deployment.

Prior to joining Symbolic IO, Rob was the Vice President, Advanced Storage at Micron Technology, where he led next-generation technology and architecture enablement efforts of Micron’s Storage Business Unit, driving storage solution development with strategic customers and partners. Previously he was the CTO, Americas for EMC where he led the entire CTO functions for the Americas. He has also held senior level positions at Xiotech Corporation, StorageTek and ETA Systems.

Rob’s extensive experience in data management, analytics, high-performance computing, non-volatile memory, distributed cluster architectures, filesystems, I/O performance optimization, cloud storage and replication and archiving, networking, virtualization makes him a sought after industry expert and board member. He was named an EMC Elect in 2014, 2015 and 2016. He was one of 25 senior executives worldwide selected for the CRN ‘Storage Superstars’ Award in 2010.

37: GreyBeards discuss blockchains with Donna Dillenberger, IBM Fellow

In this episode, we talk with Donna Dillenberger (@DonnaExplorer), IBM Fellow on IBM’s work with blockchain technology. Ray was at IBM Edge Conference last month where Donna and others presented on what BlockChain technology could do for financial services and asset provenance. Ray wrote a post on Blockchains at IBM after the conference.

Blockchain is the technology behind Bitcoins, the crypto-currency, but the technology has the potential to revolutionize a lot of other activities.

What does blockchain have to do with storage? Probably not that much, but as it’s an up and coming technology with great prospects, the GreyBeards thought it worthwhile to find out more.

Blockchain explained

Blockchain is essentially a software protocol to establish trust where there is none. At another level, it is a programatic way to maintain a shared ledger of information, without compromise.

The funny thing about ledgers and record keeping in general, is that they are everywhere. From, the first record of written language, to double entry accounting, to todays keeping track of financial transactions, ledgers do it all.

Blockchains is just an updated, software protocol version of good ledger keeping.

What’s so special about blockchain ledgers is that they can be maintained correctly and consistently even with entities/persons/servers that are trying to cheat the system.

Donna called this the Byzantine Generals’ Problem.

Byzantine generals are tricky

There’s a group of Byzantine armies surrounding a castle and some want to attack while others want to retreat, and they would all like to coordinate their actions. But some Byzantine generals are traitors and will selective tell some generals to attack while telling others to retreat, in an attempt disrupt any coordinated actions.

Generalizing the problem, when there are a number of independent entities, how does one determine consensus such that no one entity can cheat the system. CS calls this a Byzantine Fault Tolerance (BFT) algorithm.

Algorithmic consensus in blockchain

With Bitcoin blockchain (Donna calls this blockchain V1.0), consensus is achieved by “Proof of work“, a computational problem difficult to produce but easy to verify.

But Proof of work is not the only way to achieve algorithmic consensus for blockchains. HyperLedger, an open source blockchain project  has a pluggable form of consensus. So,  different Hyperledger blockchains can support different forms of consensus.

Currently, Hyperledger support a BFT algorithm, which says that 2/3rds +1 of the nodes must agree on a hash (digitally signed current transaction data and historical info) value to reach consensus.

It turns out that Hyperledger blockchains use a key-value store to record transactions history and other metadata, which is RocksDB.

Other current blockchains

At IBM Edge, Donna discussed an IBM supply chain blockchain where suppliers and consumers record sending, receipt and other movement of parts around IBM’s world wide supply chain. It uses a Hyperledger blockchain.

The  Everledger blockchain is being used to supply diamond provenance/pedigree validation. Each diamond is encoded with a digital barcode as it’s mined, and as the diamond is processed, cut and sent to wholesalers/retailers with each of those transactions maintained in the blockchain. One can easily validate the origin, clarity, color, carrot and cut of a diamond by examining it’s transaction history on the blockchain.

IBM Blockchain activities

IBM wrote the Hyperledger code from scratch to run on z/Linux but their financial services customers wanted it open sourced. So, IBM donated it to the Linux Foundation and sponsored the Hyperledger project. It’s currently the fastest growing Linux Foundation open source project at the moment. You can run a Hyperledger apps an any Linux system.

IBM z/Linux has some unique security characteristics useful for financial services and other  critical organizations/industries. For instance, secure application signing/verification to run, data at rest/in-flight encryption with secured keys and crypto code, and a secure cloud where the hardware is run.

Together these software, hardware and data centers have a FIPS 140-2 level 4 certification.

IBM also offers professional services to help customers create and host their own Hyperledger apps. Moreover. IBM are sponsoring Hyperledger hackathons to add features  and are sponsoring other Hyperledger community events.

The podcast runs long, over 50 minutes and introduces blockchain technology, where it can be used, and what IBM is doing with it. Howard and I could have talked with Donna for hours on the topic but we had to stop sometime. . Listen to the podcast to learn more.

donnaDonna Dillenberger, IBM Fellow

 

Donna Dillenberger is an IBM Fellow at IBM’s Watson Research Center.   She has redesigned many enterprise applications for greater scalability and availability.  She has worked on analytic models for financial, insurance, retail and healthcare industries.

In 2005, she became IBM’s Chief Technology Officer of IT Optimization.   In 2006, she became an Adjunct Professor at Columbia University’s Graduate School of Engineering. She is a Master Inventor and is currently working on cognitive analytics and blockchain.

 

34: GreyBeards talk Copy Data Management with Ash Ashutosh, CEO Actifio

In this episode, we talk with Ash Ashutosh (@ashashutosh), CEO of Actifio a copy data virtualization company. Howard met up with Ash at TechFieldDay11 (TFD11) a couple of weeks back and wanted another chance to talk with him.  Ash seems to have been around forever, the first time we met I was at a former employer and he was with AppIQ (later purchased by HP).  Actifio is populated by a number of industry veterans and since being founded in 2009 is doing really well, with over 1000 customers.

So what’s copy data virtualization (management) anyway?  At my former employer, we did an industry study that determined that IT shops (back in the 90’s) were making 9-13 copies of their data. These days,  IT is making, even more, copies of the exact same data.

Data copies proliferate like weeds

Engineers use snapshots for development, QA and validation. Analysts use data copies to better understand what’s going on in their customer-partner interactions, manufacturing activities, industry trends, etc. Finance, marketing , legal, etc. all have similar needs which just makes the number of data copies grow out of sight. And we haven’t even started to discuss backup.

Ash says things reached a tipping point when server virtualization become the dominant approach to running applications, which led to an ever increasing need for data copies as app’s started being developed and run all over the place. Then came along data deduplication which displaced tape in IT’s backup process, so that backup data (copies) now could reside on disk.  Finally, with the advent of disk deduplication, backups no longer had to be in TAR (backup) formats but could now be left in-app native formats. In native formats, any app/developer/analyst could access the backup data copy.

Actifio Copy Data Virtualization

So what is Actifio? It’s essentially a massively distributed object storage with a global name space, file system on top of it. Application hosts/servers run agents in their environments (VMware, SQL Server, Oracle, etc.) to provide change block tracking and other metadata as to what’s going on with the primary data to be backed up. So when a backup is requested, only changed blocks have to be transferred to Actifio and deduped. From that deduplicated change block backup, a full copy can be synthesized, in native format, for any and all purposes.

With change block tracking, backups become very efficient and deduplication only has to work on changed data so that also becomes more effective. Data copying can also be done more effectively since their only tracking deduplicated data. If necessary, changed blocks can also be applied to data copies to bring them up to date and current.

With Actifio, one can apply SLA’s to copy data. These SLA’s can take the form of data governance, such that some copies can’t be viewed outside the country, or by certain users. And they can also provide analytics on data copies. Both of these capabilities take copy data to whole new level.

We didn’t get into all Actifio’s offerings on the podcast but Actifio CDS is as a high availability appliance which runs their  object/file system and contains data storage. Actifio also comes in a virtual appliance as Actifio SKY, which runs as a VM under VMware, using anyone’s storage.  Actifio supports NFS, SMB/CIFS, FC, and iSCSI access to data copies, depending on the solution chosen. There’s a lot more information on their website.

It sounds a little bit like PrimaryData but focused on data copies rather than data migration and mostly tier 2 data access.

The podcast runs ~46 minutes and  covers a lot of ground. I spent most of the time asking Ash to explain Actifio (for Howard, TFD11 filled this in). Howard had some technical difficulties during the call which caused him to go offline but then came back on the call. Ash and I never missed him :), listen to the podcast to learn more.

Ash Ashutosh, CEO Actifio

Ash Ashutosh Hi Res copy-resizedAsh Ashutosh brings more than 25 years of storage industry and entrepreneurship experience to his role of CEO at Actifio. Ashutosh is a recognized leader and architect in the storage industry where he has spearheaded several major industry initiatives, including iSCSI and storage virtualization, and led the authoring of numerous storage industry standards. Ashutosh was most recently a Partner with Greylock Partners where he focused on making investments in enterprise IT companies. Prior to Greylock, he was Vice President and Chief Technologist for HP Storage.

Ashutosh founded and led AppIQ, a market leader of Storage Resource Management (SRM) solutions, which was acquired by HP in 2005. He was also the founder of Serano Systems, a Fibre Channel controller solutions provider, acquired by Vitesse Semiconductor in 1999. Prior to Serano, Ashutosh was Senior Vice President at StorageNetworks, the industry’s first Storage Service Provider. He previously worked as an architect and engineer at LSI and Intergraph.

33: GreyBeards talk HPC storage with Frederic Van Haren, founder HighFens & former Sr. Director of HPC at Nuance

IMG_6319In episode 33 we talk with Frederic Van Haren (@fvha), founder of HighFens, Inc. (@HighFens), a new HPC consultancy and former Senior Director of HPC at Nuance Communications. Howard and I got a chance to talk with Frederic at a recent HPE storage deep dive event, I met up with him again during SFD10, where he was talking on behalf of Kaminario, and he was also at HPE Discover conference last week.

Nuance is the backend speech recognition engine for a number of popular service offerings. Nuance looks very similar to a lot of other hyper-scale customers and ultimately, we feel may be the way of the future for all IT over the coming decades.  Nuance’s data storage journey since Frederic’s tenure with the company holds many lessons for all of us in the storage industry

Nuance currently has ~6PB usable (~16PB raw) of speech wave files as well as uncountable text and other files, all inside IBM SpectrumScale (GPFS).  They have both lots of big files and lots of small files. These days, Spectrum Scale is processing 2-3M files/second. They have doubled capacity for each of the last 9 years, and today handle a billion new files a month. GPFS stripes data across storage, provides data protection, migration, snapshotting and storage tiering across a diverse mix of storage. At the end of the podcast we discussed some open source alternatives to Spectrum Scale but at the time Nuance started down this path,  GPFS was found to be the only thing that could do the job. This proved to be a great solution as they have completely swapped out the underlying storage at least 3 times and all their users were none the wiser.

The first storage that Frederic talked about was Coraid (no longer in business) and their ATA over Ethernet storage solution. This used a SuperMicro with 24 SATA drives/shelf and they bought 40 shelves. Over time this grew to 1000s of SATA drives and was easily scaleable but hard to manage, as it was pretty dumb storage. In fact, they had to deploy video cameras, focused on drive shelves, to detect when drives failed!

Overtime, Nuance came to the realization that they had to do something more manageable and brought in HPE MSA storage to replace their Coraid storage. The MSA was a great solution for them which had 96 SAS drives, were able to support both faster “SCRATCH” storage using fast SAS 300GB/15KRPM drives and slower “STATIC” storage with slower SATA 760GB/7.2KRPM drives and was much more manageable than the Coraid solution.

Although MSA storage worked great, after a while, Nuance’s sprawling FC environment which was doubling yearly, caused them to rethink their storage once again. This led them to swap out all their HPE MSA storage, for HPE 3PAR to consolidate their FC network and storage footprint.

For metadata, Nuance uses a 76 node, Hadoop cluster for sophisticated search queries as doing an LS on the GPFS file system would take days. Their file meta-data is essentially a textual, row by row database and they use queries over the Hadoop cluster to determine things like which files have american english, spoken by females, with 8Khz recording.  Not sure when, but eventually Nuance deployed HPE Vertica SQL over Hadoop for their metadata engine and dropped average query from 12 minutes to 73 sec.(!!)

Nuance, because of their extreme growth and more open environment to storage innovation, had become a favorite for storage startups and major vendors to do Proofs of Concepts (PoC) on new storage offerings. One PoC, Nuance did was for Kamanario storage. There is a standard metric that says a CPU core requires so many IOPS, so that when CPU cores  increase,  you need to supply more IOPS. They went with Kaminario for their test-dev environment and more performance intensive storage. Nuance appreciates Kamanario’s reliability, high availability and highly predictable performance. (See the SFD10 video feed for Frederic’s session)

We talked a bit about how speech recognition’s Hidden Markov Chain statistical model was heavily dependent on CPU cores. Today, if you want to do a recognition task, you assigned it to one core and waited until it was done, a serial process dependent on the # of CPU cores you had available. This turned out to be quite a problem as you had to scale CPU cores if you wanted to do more concurrent speech recognition activities. Then came GPUs and you could do speech recognition work on a GPU core. With the new GPU cards,   instead of a server having ~16 CPU cores,  you could have a server with multiple Graphic cards having 3000-GPU cores. This scaled a lot easier. Machine learning and deep neural nets have the potential to parallelize this, so that it will scale even better

In the end, HPC trials, tribulations and ways of doing business are starting to become  mainstream. I was recently talking to one vendor that said, most HPC groups start out in isolation to support one application but over time they either subsume corporate IT or get absorbed into corp. IT or continue to be a standalone group (while waiting until one of the other two happen).

The podcast runs ~41 minutes and  covers a lot of ground about one HPC organization’s evolution of their storage environment over time, what was driving some of that evolution and the tools they chose to master it.  Listen to the podcast to learn more.

0F2A7849 - Copyv2-resizedFrederic Van Haren, founder HighFens, Inc.

Frederic Van Haren is the Chief Technology Officer @Highfens and known for his insights in the HPC and storage industry. He has over 20 years of experience in High Tech providing technical leadership and strategic direction in Telecom and Speech markets. Frederic spent the last decade at  Nuance Communications building large HPC environments from the ground up. He is frequently invited to speak at events to provide his insights on the HPC and storage markets. He has played leading roles as President of a variety of technology user groups promoting the use of innovative technology. As an Engineer he enjoys working with the engineering teams from technology vendors providing feedback on new and upcoming products.

Frederic lives in Massachusetts,  USA but grew up in the northern part of Belgium where he received his Masters in Electrical Engineering, Electronics and Automation.

GreyBeards talk with Lee Caswell and Dave Wright of NetApp

In our 30th episode, we talk with Dave Wright (@JungleDave), SolidFire founder, VP & GM SolidFire of NetApp and Lee Casswell (@LeeCaswell), VP Products, Solution & Services Marketing NetApp. Dave’s been on before as CEO of SolidFire back in May of 2014, but this is the first time for Lee. Dave’s also been a prominent guest at Storage Field Day, most recently at SFD9 with Dave Hitz from NetApp. Unclear how Lee managed to avoid TFD/SFD duty but it’s only a matter of time.

Solidfire was recently acquired by NetApp in their largest acquisition ever, signaling a new direction for them (acquisition closed 2 Feb. 2016). Since we had spent a prior podcast on another recent storage acquisition, we thought it only appropriate to talk with these two as well. We started the discussion with Dave and how it feels to be within the NetApp umbrella.

Another topic that came up was how flash gets used in the cloud. Old school had it that flash was just high IO performance but nowadays, next gen application development has a range of IO requirements which all need consistent performance to data. Flash with scale out and QoS can handle this wide range of requirements across cloud applications. Lee mentioned how flash adoption is changing from application specific to more general purpose storage which is removing the “IO bottleneck”.

Google had written a study saying that for the next decade there will not be a flash-disk crossover but the differences are small enough that you almost have to be hyper-scale customers to see significant economic advantages.

We discussed the lack of lot’s of AFA’s doing well on throughput intensive benchmarks. Dave mentioned that throughput was one of disk’s better performing modes and in the past, storage interfaces 3Gbps-6Gbps hid a lot of flash performance. But benchmarks of synthesized pure workloads aren’t real world, workloads in real data centers are much messier.

IO density (IOPS/GB) came up as another discussion topic.  At low IO density, disk may still make sense but as IO density increases, all flash makes much more sense.

Google also mentioned the importance of tail-end IO latency (IO latency at 99.9%). Poor tail IO latency has been an ongoing problem holding back the adoption of hybrid storage. All flash has same advantages here but are not all AFAs are immune to the problems in tail-end latency.

The podcast runs just over 39 minutes and episode covers a lot of ground about their products, flash technology advantages, and market dynamics.  Listen to the podcast to learn more.

Dave Wright, SolidFire Founder, Vice President, and GM

Dave Wright_201506-0063Dave Wright left Stanford in 1998 to help start GameSpy Industries, a leader in online video game media, technology, and software. While at GameSpy, Dave led the team that created a backend infrastructure powering thousands of games and millions of gamers. GameSpy merged with IGN Entertainment in 2004 to create one of the largest Internet gaming & entertainment media companies. Dave served as Chief Architect for IGN and led technology integration with FIM / MySpace after IGN was acquired by NewsCorp in 2005.

In 2007 Dave founded Jungle Disk, a pioneer and early leader in cloud-based storage and backup solutions for consumers and businesses. Jungle Disk was acquired by leading cloud provider Rackspace in 2008 and Dave worked closely with the Rackspace Cloud division to build a cloud platform supporting tens of thousands of customers. In December 2009 Dave left Rackspace to start SolidFire.

Lee Caswell, Vice President Product, Solutions, and Services Marketing

LeeLee Caswell is vice president of Product, Solutions and Services Marketing at NetApp, where he leads a team that speeds the customer adoption of new products, partnerships, and integrations. Lee joined NetApp in 2014 and has extensive experience in executive leadership within the storage, flash and virtualization markets.

Lee was previously vice president of Marketing at Fusion-IO (now SanDisk). Prior to Fusion-IO Lee was a founding member of Pivot3, a company considered to be an early innovator in hyper-converged systems, where he served as the CEO and CMO. Earlier in his career, Lee held marketing leadership positions at VMware, Adaptec, and SEEQ Technology (now LSI Logic). He started his career at General Electric in Corporate Consulting.

Lee holds a bachelor of arts degree in economics from Carleton College and a master of business administration degree from Dartmouth College. Lee is a New York native and has lived in northern California for many years. He and his wife live in Palo Alto and have two children. In his spare time Lee enjoys cycling, playing guitar, and hiking the local hills.

Disclaimer: NetApp and SolidFire have been clients of DeepStorageNet and NetApp is a current client of Silverton Consulting.