By Jason Waxman
Today is a milestone in advancing genomics research, and Intel is thrilled to be involved in these three important developments:
- The Broad Institute of MIT and Harvard is open-sourcing the world’s most popular and now much-improved genome analysis software, GATK4.
- Intel and Broad have developed a breakthrough architecture, called the Broad-Intel Genomics Stack (BIGstack), which currently delivers a 5x improvement to Broad’s genomics analytics pipeline using Intel’s CPUs, Omni-Path Fabric and SSDs. The stack also includes optimizations for the forthcoming release of Intel’s integrated CPU + FPGA products.
- China’s industry leader in genomics, BGI, is announcing adoption of the most current GATK tools, including Broad and Intel optimizations — a groundbreaking step toward global alignment in the rapidly growing genomics community.
I want to do justice to these tremendous achievements, so let me expand on them.
First, Intel and Broad share the common vision of harnessing the power of genomic data and making it widely accessible for research around the world to yield important discoveries. Genomics offers insights into the inner workings of DNA within organisms. Advances in genomics are fueling discovery-based research to better understand the complexities of biological systems.
Nearly everyone has experienced cancer and its devastating effects in their family, some more than others. With today’s announcements, we can take new steps toward understanding the molecular drivers of cancer and other diseases and accelerating the promise of precision medicine.
That’s why Intel and Broad are making the BIGstack available to run the new GATK4 Best Practices pipeline up to five times faster than the previous versions, supporting data volumes at truly unprecedented scale and simplifying deployment with production-ready scripts. The architecture yields performance based on the combination of Intel CPUs, Omni-Path Fabric and SSDs. The BIGstack also includes optimizations for Intel FPGAs with early results showing a potential for more than 35x improvement in the PairHMM algorithm.
Version 1.0 of the Broad-Intel Genomics stack is the kind of breakthrough in affordability for the genomics analytics community that we sought to create as part of the Intel-Broad Center for Genomic Data Engineering, a five-year $25 million collaboration announced in November. The stack is now available to the 45,000 registered academic, nonprofit and commercial users of the GATK, the Broad’s popular genomics analysis toolkit.
There’s more on the Intel website about this new reference architecture announced today at Bio-IT World Conference & Expo. Additionally, we wanted to share that:
- Broad announced it will open source Version 4 of GATK (GATK4), which is great news for the genomics research, biotech and pharmaceutical communities.
- BGI announced it will provide access to GATK4, Broad’s workflow management system Cromwell and the WDL (workflow definition language) on the BGI Online platform with its cloud partner Alibaba Cloud in China.
I’m tremendously excited by BGI’s announcement – it means the leading genomics institutions in China and the United States will be using the same set of open source software tools. And this expanded access will facilitate the standardization and sharing of data for bigger and better research in the future.
It’s rewarding that GATK4 includes key optimizations made possible through collaboration at the Intel-Broad Center for Genomic Data Engineering in Cambridge, which I had the pleasure of visiting last month. I hope that the BIGstack will be a common platform for advanced analytics workloads that the world’s leading genomics institutions can utilize to facilitate collaboration and scientific breakthroughs.
Finally, this turnkey solution will be available as a reference architecture and through original equipment manufacturers (OEMs) and system integrators (SIs), including Lenovo, HPE, Inspur and Colfax, with more to follow.
I’m proud of the accomplishments of the Intel team in enabling Intel technology as a facilitator of scientific breakthroughs. Moments such as these make me believe that in our lifetime we will see the cure for cancer, and I am so honored to be partnering with great institutions such as Broad and BGI to help them make this happen.
Looking ahead, it’s clear that the complex interplay of genetic variants and how treatments can affect molecular pathways is an area of study ripe for machine learning because of the need to learn by example, over and over. Working with some of the world’s most brilliant minds, Intel engineers are eager to apply artificial intelligence to this grand challenge.
Learn more by visiting the Intel-Broad Center for Genomic Data Engineering website.
Jason Waxman is corporate vice president and general manager of Data Center Solutions Group at Intel Corporation.
Photo courtesy of Broad Institute of MIT and Harvard/Kelly Davidson Photography