Presentation
A Reference Based Genomic Compression Algorithm
Stefano Rini
TU Munich
Tuesday, October 9, 2012, 2:00 pm
Uni West, Room 43.2.227
Abstract:
The advent of fast and affordable sequencing technologies has resulted in a deluge of genomic data.
The problem of efficiently store and access such data has become a major challenge in genomic research.
A key observation in developing effective storage systems is the high degree of similarity among sequences of different individuals within one species. By expressing a genomic sequence as a series of transformations from a reference sequence, it is possible to drastically reduce the storage space. Further compression of this series of transformations is made possible by the statistical model according to which difference in DNA sequences occur. We propose a novel compression tool for storing and accessing genome sequencing data using a reference genome sequence. Our algorithm outperforms other compression with reference programs such as GReEn and GRS in compression rate in all the examples we have considered.