Reinventing HPC

STREAMLINING THE DATA PIPELINE THAT FUELS AI/ML

Nages Sieslack: What challenges are you helping customers to solve?

Liran Zvibel: Today, organizations are increasingly looking to harness the power of artificial intelligence (AI) and machine learning (ML) to support their research and discovery initiatives, drive business innovation, and increase operational efficiencies. But running AI/ML projects at enterprise scale requires massive amounts of data and performance to fuel them, and many organizations struggle to move their mission critical business initiatives and research projects forward as a result. A key reason is their legacy data infrastructure is not equipped to support the insatiable performance and scalability demands of these next generation workloads – and it’s holding them back.

WEKA is on a mission is to address this challenge with a software-based data platform that is purpose-built to streamline and accelerate the data pipelines that fuel AI/ML and other modern performance-intensive workloads.

The WEKA Data Platform delivers radical simplicity, epic performance, infinite scale, and seamless data portability to support enterprise AI workloads in virtually any location. Whether on-premises, in the cloud, at the edge or bursting between platforms, WEKA accelerates every step of the enterprise AI data pipeline, from data ingestion, cleansing and modeling, to training validation or inference.

Sieslack: Can you provide an example of how you helped a customer with a significant challenge?

Zvibel: In the U.S., the Oklahoma Medical Research Foundation (OMRF) computing team was looking to architect a system that could deliver more compute power, faster storage, and bigger volumes of data with increased velocity to support its growing informatics needs for scientific research.

A common workflow is next-generation sequencing (NGS) analysis using the GATK pipeline for sequence alignment and variant calling. However, the cluster supports numerous research jobs running simultaneously with unique toolsets that need to be carefully orchestrated so as not to negatively impact other jobs or workloads.

By implementing the WEKA Data Platform, the OMRF team was able to achieve better throughput and run more research jobs concurrently. As a result, their research outcomes are no longer limited by how much data can be stored locally and their research workflows have been greatly simplified. Further, the complexity of staging in and out data in a compute node’s local SSD was eliminated; turnaround times were better because jobs finished faster, getting results to their scientists quicker and accelerating the next stage of their research.

Ultimately, OMRF’s research jobs were reduced by 10x – one job was reduced from 70 days to seven, and another common analysis workflow was reduced from 12 hours to two. OMRF’s researchers no longer need to think about their data infrastructure environment —instead, they’re free to focus on saving lives.

To read the full interview, please visit https://insidehpc.com/2022/05/wekas-zvibel-on-streamlining-the-data-pipeline-that-fuels-ai-ml/

Go back