Known as the blueprint of life, the genomic sequence contains instructions for controlling a species’ growth, development, survival, and reproduction. Next-generation sequencing (NGS) technologies, which produce vast amount of sequencing data for various life forms, have provided tremendous information for tackling grand challenges from finding more effective treatment for human diseases to improving biofuel energy production. In particular, combined with modern molecular biology techniques, NGS allows scientists to sequence both culturable and unculturable microbes in human microbiota and natural environments at unprecedented depth and resolution (i.e. metagenomic sequencing). However, in contrast to the rapid accumulation of the microbial community data, computational analysis methods and tools that can take full advantage of this sequencing power seriously lag. Thus, there is a pressing need to convert the BIG NGS data into knowledge.
In this talk, I will present our recent work on composition analysis of RNA viruses for viral metagenomic data. Many clinically important RNA viruses such as HIV, HCV, SARS-coV, Influenza have high mutation rates during replication and thus form a population of related but different viral strains, which are referred to as quasispecies. Characterizing the viral quasispecies in their natural environments now becomes possible because of metagenomic sequencing. We have thus developed a suite of tools for viral quasispecies analysis.
The first part of my talk will focus on classifying reads of RNA viruses of interest to users. As the RNA viruses often take only a small percentage of the whole metagenomic data set, it is important to screen reads for all downstream analysis. The state-of-the-art method of read classification is based on read mapping against known viral genomes. However, the limited availability of the sequenced viral genomes can lead to low sensitivity for read mapping-based tools. We implement a hybrid method that can classify reads with only partial or remotely related genomes. In the second part of my talk, I will present our work on reconstructing viral haplotypes. Reconstruction of each strain sequence is highly important for development of clinic prevention and treatment. I will present our effective de novo reconstruction of all haplotypes in quasispecies using NGS data.
Yanni Sun is an Associate Professor in the Department of Electronic Engineering at City University of Hong Kong. Before she moved to Hong Kong, she was an Associate Professor in Computer Science and Engineering Department at Michigan State University, USA. She received the B.S. and M.S. degrees from Xi'an JiaoTong University (China), both in Computer Science. She received the Ph.D. degree in Computer Science from Washington University in Saint Louis, USA in 2008. Her research field is bioinformatics and computational biology. In particular, her recent research projects include sequence analysis, next-generation sequencing data analysis, metagenomics, protein domain annotation, plant genomics, and noncoding RNA annotation. She was a recipient of NSF CAREER Award in 2010.