About

On this site, you find the test data that were used for efficient dynamic construction of a compressed de Bruijn subgraph for pan-genome analysis.

E.coli

The following E.coli sequences (or when suitable the reverse complement) were downloaded from the page www.ncbi.nlm.nih.gov/nuccore/ by the following accession numbers.

FM180568 01.FM180568.sequence.fasta (md5sum: 3939f45d6d97afa76a2fbb603307a237)  
FN554766 02.FN554766.sequence.fasta (md5sum: b5726c3bae898831d5240f8897736c12)  
CP000247 03.CP000247.sequence.fasta (md5sum: 08643a4078ec97b36ea3da402bef95f6)  
CU928145 04.CU928145.sequence.fasta (md5sum: d6e80db065ddf221b5925888ff8edd67)  
CP001671 05.CP001671.sequence.fasta (md5sum: 97f209b1693b222e97a828d3c5a9c449)  
CP000468 06.CP000468.sequence.fasta (md5sum: 80d233b0ffab129579f55789b136ad2e)  

Human Genome

An exemplary plot drawn by the program can be viewed here.

The test file was created by concatenation of the following 10 files in the following order:

hg16 (NCBI34) from July 2003

Download (md5sum: 9c4567258b47b6dd466225c58da65eb4)

Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg16/chromosomes/

Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.

 

hg17 (NCBI35) from May 2004

Download (md5sum: 57f5af6e6004497f82b284b75a712486)

Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg17/chromosomes/

Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.

 

hg18 (NCBI36) from Mar. 2006

Download (md5sum: f37590f3007ac483488891113f222dc8)

Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/

Comment: Modified file - converted lowercase to uppercase and removed 3 characters (RR and M) from chromosome 3.

 

hg19 (GRch37) from Feb. 2009

Download (md5sum: 55c0eb9b019d9f727b0d0ae42b5ca237)

Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/

Comment: Modified file - converted lowercase to uppercase.

 

hg38 (GRch38) from Dec. 2013

Download (md5sum: ea47ff706942f5e58b327aac61e528d6)

Src: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/chromosomes/

Comment: Modified file - converted lowercase to uppercase.

 

maternal haplotype of NA12878

The Gerstein Lab at Yale University has created a version of the NA12878 genome based on NCBI build 36 and incororating SNPs, indels and SVs identified by the 1000 Genomes project. This genome sequence is available at http://sv.gersteinlab.org/NA12878_diploid.

Download (md5sum: 4a5e7ffec07364de66e56022d5864107)

Src: http://sv.gersteinlab.org/NA12878_diploid/NA12878_diploid_genome_2012/NA12878_diploid_genome_2012_dec16.zip

Comment: Users of this assembly are requested to cite: Rozowsky J et al. (2011). AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology, 7, 522.

paternal haplotype of NA12878

The Gerstein Lab at Yale University has created a version of the NA12878 genome based on NCBI build 36 and incororating SNPs, indels and SVs identified by the 1000 Genomes project. This genome sequence is available at http://sv.gersteinlab.org/NA12878_diploid.

Download (md5sum: 75e170b383de42aeb14732cabeab9a00)

Src: http://sv.gersteinlab.org/NA12878_diploid/NA12878_diploid_genome_2012/NA12878_diploid_genome_2012_dec16.zip

Comment: Users of this assembly are requested to cite: Rozowsky J et al. (2011). AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology, 7, 522.

GRCh38.p12

Download (md5sum: d4f40c80dd774652f18367f62f3421eb)

Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.27_GRCh38.p12/

Comment: Modified file - joined chromosomes into one fasta sequence

HuRef

Download (md5sum: 4c0bf63c64fcd205d59683cb1554c4c8)

Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/

Comment: Modified file - joined chromosomes into one fasta sequence

CHM1_1.1

Download (md5sum: 8eca87e0b52f9b60a059cd09a53ccc29)

Src: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/

Comment: Modified file - joined chromosomes into one fasta sequence