Laboratories around the world are generating raw sequencing data at an unprecedented rate. Today it is possible to generate hundreds of millions of relatively short sequences in a single run in a short period of time with low per base cost. These technological developments have enabled widespread adoption of sequencing. At the same time, the challenges of aligning, analyzing, annotating and understanding the terabytes of data generated represent a major bottleneck in biological discovery and clinical adoption. Indeed, a plot of Moore’s Law compared to sequencing throughput shows the cost of DNA sequencing has declined much faster than the cost of disk storage and processing power.
The 2017 Market for NGS Informatics: Probing the Commercial Landscape is designed to help companies understand these challenges from the perspective of NGS users. The scale of the data generated is not simply an obstacle for individual researchers trying to interpret it, but it presents significant informatics issues for reproducibility and even collaboration. NGS users are well aware that simply generating data does not lead to a proportionate increase in knowledge.
Sequencing an individual human genome involves generating short reads and mapping them to a known human reference genome. Read alignment maps hundreds of millions of short strings onto a reference string that can be three billion in length. Scientists and bioinformaticians are challenged to continually develop advanced data structures and highly parallel algorithms.
The direction of genomic research is changing. In the past, a single or just a handful of genomes were analyzed. Many translation research projects and the promise of personalized medicine now involve the analysis of hundreds of genomes in a single run. Storing and transmitting genomic data is another major challenge. According to sequencing expert Shawn Baker, Ph.D., the BAM file (a semi compressed alignment file) for a single 30X human whole-genome sample is about 90 GB. A relatively modest project of 100 samples would generate nine terabytes of BAM files. The problem is further exacerbated by the creation of metadata files to ensure data associated with a genome is not lost, and the integration with other types of information such as transcriptomic, methylomic and metabolomic data.
NGS data analysis, transmission and storage must also take into account the issues of data privacy and security which has inhibited the widespread adoption of cloud-based solutions and is particularly sensitive in clinical applications.
The scope and scale of sequencing projects will only continue to grow as speed and read lengths increase. But continued advancements in sequencing technology are offset by the ability of scientists to interpret biologically or clinically relevant information. The 2017 Market for NGS Informatics: Probing the Commercial Landscape provides insights into the needs of end-users so that solutions to the challenges can be overcome.
Please download the complimentary Report Brochure (upper left corner of your screen) to review the detailed objectives, table of contents, and sample data.
Additional questions, please contact: