Aug 14, 2018
Uduak Grace Thomas
NEW YORK (GenomeWeb) – OmniTier, a developer of application-specific, high-performance data products, has shared some results from a joint study done with researchers at the Mayo Clinic's Center for Individualized Medicine demonstrating the efficacy of the company's solution for generating variant caller-ready de novo genome assemblies quickly and at low cost.
OmniTier is currently beta testing the so-called CompStor compute cluster solution ahead of a full product launch planned for the fourth quarter of this year. According to the company, CompStor offers comparable performance to standard alignment-based approaches and uses standard low-cost servers, which should make it an affordable option for customers with limited budgets. When the product launches, OmniTier intends to offer CompStor as an appliance that can be installed locally at customer sites. However, the company will also offer its software on the cloud for customers who prefer that option.
"CompStor is bringing the de novo analytics era to genomics," Hemant Thapar, OmniTier's founder and CEO, said in a statement. "Our current results in the joint study with the Mayo Clinic highlight the potential to integrate de novo assembly methodology in genomic medicine and achieve higher accuracy and shorter assembly times on affordable cloud or on-premise infrastructure" rather than large supercomputing resources.
For the Mayo study, which used the NA12878 genome dataset, the partners used CompStor to complete a variant caller-ready assembly at 50x coverage in less than two hours, according to results shared this week. The company claims that the solution can scale up to 800x coverage, which will make it possible to reliably identify new and infrequent variants from de novo assemblies. It performs comparably to standard alignment-based approaches but sidesteps the reference bias that plague these approaches, according to OmniTier. Furthermore, the assemblies generated by CompStor can be used to call all types of variants, but where it really shines is in the context of identifying complex variants such as structural variants, according to OmniTier Chief Technology Officer Jonathan Coker.
The possibility of de novo assembling genomes quickly and efficiently, even at high coverage, was what drew Alexej Abyzov to test OmniTier's solution in his laboratory. Abyzov is a computational genomicist and biologist, senior associate consultant and assistant professor of biomedical informatics at the Mayo Clinic.
"De novo sequence assembly for better variant discovery and characterization has remained elusive due to the exceedingly long assembly times and resources requirement of existing assemblers. CompStor holds the promise to change that paradigm," he said in a statement. "In benchmarking, CompStor has enabled us with fast, robust, accurate, and unbiased analysis of individualized high-coverage, whole-genome sequencing data. We expect to apply CompStor's unique capabilities to analyzing point nucleotide substitutions as well as larger structural variants and indels in several future studies."
The CompStor system features a tiered-memory architecture that utilizes DRAM and flash memory resources in a cluster configuration to overcome the limitations of existing de novo assembly implementations. It uses standard, low-cost x86 Intel servers with software designed around expansive computational memory to achieve its results.
CompStor clusters also have optimized methods for handling data. For example, the system can ingest raw genome sequence datasets at up to 10 GB/s. The solution can be easily integrated with existing genomics workflows and provides command line, API, and web-based job control mechanisms. Once the assembly is complete, users can choose their own variant callers or choose from those built into the solution.
OmniTier first demonstrated the efficacy of CompStor's assembly technology back in 2017 when it first unveiled a prototype of the solution. It claimed at the time that CompStor reduced sequence assembly times and improved assembly quality compared to available de novo assemblers. In one comparison that used eight CompStor server nodes, the company claimed to have assembled a human genome de novo in about eight minutes at 50x coverage. The CompStor cluster used for the task featured standard x86 Intel processors and required 128GB DRAM and 330GB of NVMe SSD on each server. OmniTier noted at the time that these results were comparable to those previously achieved using a Cray XC30 supercomputer with 15,360 processor cores published in a separate study.
"In terms of speed, it is extremely impressive," Abyzov said in an interview this week. There is still some work to be done to improve the solution so that it is "completely competitive" with existing standards, "but in terms of speed, it is a big breakthrough compared to other assembly-based approaches right now," he said.
At the time of the prototype's release, OmniTier also reported the results of using the solution to perform de novo assembly on several smaller genomes including those from organisms such as Staphylococcus aureus and Apergillus nidulans. Compared to some commonly used, open-source assemblers, CompStor Assembly offered about 17x acceleration in assembly time using a single node, the company said. An eight-node CompStor Assembly compute cluster offered up to 100x acceleration in the assemblies.
Those early results gave OmniTier the impetus it needed to pursue the partnership with Mayo. It was also an opportunity to further improve CompStor and move it into the research context with clinical applications as the ultimate goal. "Its fundamental function is de novo assembly, but there is quite a bit of processing and computing that needs to occur in addition to what is typically referred to as de novo assembly to make it compatible with variant calling. That's what we've been working on in the last year," Coker said in an interview.
OmniTier already has one product on the market called MemStac, which is a caching solution for in-memory applications that offers reduced memory cost per server, lower power consumption, and faster application speeds, and is available on Amazon Web Services cloud. Coker said that OmniTier is currently working with a number of large companies to integrate MemStac into their systems, but he declined to name them due to existing non-disclosure agreements.
OmniTier is not disclosing how much it will charge for CompStor appliances or the cloud option at the present time, but it will make those numbers available when the product is released. Coker did say that the company's price point is one of the key components of it business strategy and that costs will be comparable to those offered by the competition.
In addition to customers who want cheaper solutions, it should also be a boon to customers who don't want to deal with the hassles of uploading large quantities of data to the cloud or who have private datasets that they are unwilling to place in a public cloud infrastructure.
Other companies targeting this group of customers with similar arguments include BioTeam, which launched an appliance for microbial genome analysis in partnership with Qiagen in 2016. "What I believe is that many customers will enjoy not having to deal with the exact hardware configurations that we require with our software," Coker said. "We've been getting feedback that an appliance will be good, and because of that we have [that] option available but we can also provide [it] on the cloud."
OmniTier's chief competition in the market, according to Coker, is Edico Genome, which is now owned by Illumina. Edico markets the DRAGEN platform, which uses field programmable gate array (FPGA) technology in combination with proprietary software algorithms to reduce genomic data footprint and enable faster speeds. Like CompStor, users have the option to run DRAGEN on site, in the Amazon cloud, or in a hybrid mode. Like OmniStor, Edico is focusing on the personalized genomics space and has forged partnerships with groups such as Rady Children's Institute for Genomic Medicine, for example, to bring DRAGEN to bear on whole-genome sequencing protocols for things like neonatal diagnostics.
OmniStor's solutions offer a viable alternative to Edico's DRAGEN – not on the hardware but on optimizing memory use, according to Coker. Much of OmniTier's current technical team worked in the hardware space prior to joining the company, so "we fully understand the temptation to use FPGAs and GPUs for these kind of problems. We've done that many times," he said.
However, OmniTier believes that managing compute memory is more critical. "We are using hardware solutions, but [we] are really coming from the memory point of view and how to manage that. … What we are bringing is our knowledge of those different memory technologies like solid-state drives or hard-disk drives or other alternative technologies to bear to solve these particular problems."
Furthermore, because OmniStor's solutions use commercially available, off-the-shelf servers, it has more flexibility in terms of pricing compared to the competition, he added. "The main thing that we want to show is that we can bring this new and complex technology into standard hardware outside of the supercomputer realm to anybody who wants to be able to do this."
OmniTier is seeking additional beta testers for CompStor ahead of its Q4 launch. It hopes to partner with other whole-genome sequencing providers willing to put the platform through its paces. The company also intends to add functionality that enables CompStor to work with data from other sequencing platforms besides Illumina's, Coker said.