76: GreyBeards talk backup content, GDPR and cyber security with Jim McGann, VP Mkt & Bus. Dev., Index Engines

In this episode we talkindexing old backups, GDPR and CyberSense, a new approach to cyber security, with Jim McGann, VP Marketing and Business Development, Index Engines.

Jim’s an old industry hand that’s been around backups, e-discovery and security almost since the beginning. Index Engines solution to cyber security, CyberSense, is also offered by Dell EMC and Jim presented at a TFDx event this past October hosted by Dell EMC (See Dell EMC-Index Engines TFDx session on CyberSense).

It seems Howard’s been using Index Engines for a long time but keeping them a trade secret. In one of his prior consulting engagements he used Index Engines technology to locate a a multi-million dollar email for one customer.

Universal backup data scan and indexing tool

Index Engines has long history as a tool to index and understand old backup tapes and files. Index Engines did all the work to understand the format and content of NetBackup, Dell EMC Networker, IBM TSM (now Spectrum Protect), Microsoft Exchange backups, database vendor backups and other backup files. Using this knowledge they are able to read just about anyone’s backup tapes or files and tell customers what’s on them.

But it’s not just a backup catalog tool, Index Engines can also crack open backup files and index the content of the data. In this way customers can search backup data, with Google like search terms. This is used day in and day out, for E-discovery and the occasional consulting engagement.

Index Engines technology is also useful for companies complying with GDPR and similar legislation. When any user can request information about them be purged from corporate data, being able to scan, index and search backups is great feature.

In addition to backup file scanning, Index Engines has a multi-PB, indexing solution which can be used to perform the same, Google-like searching on a data center’s file storage. Once again, Index Engines has done the development work to implement their own, highly parallelized metadata and content search engine, demonstratively falter than any open source (Lucene) search solution available today.

CyberSense

All that’s old news, what Jim presented at a TFDx event was their new CyberSense solution. CyberSense was designed to help organizations detect and head off ransomware, cyber assaults and other data corruption attacks.

CyberSense computes a data entropy (randomness) score as well as ~39 other characteristics for every file in backups or online in a custmer’s data center. It then uses that information to detect when a cyber attack is taking place and determine the extent of the corruption. With current and previous entropy and other characteristics on every data file, CyberSense can flag files that look like they have been corrupted and warn customers that a cyber attack is in process before it corrupts all of customers data files.

One typical corruption is to change file extensions. CyberSense cracks open file contents and can determine if it’s an office or other standard document type and then check to see if its extension matches its content. Another common corruption is to encrypt files. Such files necessarily have an increased entropy and can be automatically detected by CyberSense

When CyberSense has detected some anomaly, it can determine who last accessed the file and what executable was used to modify it. In this way CyberSecurity can be used to provide forensics on who, what, when and where about a corrupted file, so that IT can shut the corruption activity down before it’s gone to far.

CyberSense can be configured to periodically scan files online as well as just examine backup data (offline) during or after it’s backed up. Their partnership with Dell EMC is to do just that with Data Domain and Dell EMC backup software.

Index Engines proprietary indexing functionality has been optimized for parallel execution and for reduced index size. Jim mentioned that their content indexes average about 5% of the full storage capacity and that they can index content at a TB/hour.

Index Engines is a software only offering but they also offer services for customers that want a turn key solution. They also are available through a number of partners, Dell EMC being one.

The podcast runs ~44 minutes. Jim’s been around backups, storage and indexing forever. And seems to have good knowledge on data compliance regimes and current security threats impacting customers, across the world today . Listen to our podcast to learn more.

Jim McGann, VP Marketing and Business Development, Index Engines

Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. Before joining Index Engines in 2004, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes.

In recent years he has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, Jim was responsible for the business development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Jim is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery and records management.