Sequence Analysis

BWT Tunnel Planning is hard but manageable

This software implements the algorithms presented in "BWT Tunnel Planning is hard but manageable", hopefully to appear at DCC 2019.

Installation:

1. Download this .tar.bz2  archive

2. Unpack the .tar.bz2 file with the command "tar xfj tbwt_heuristic.tar.bz2".

3. Follow the install instructions in the README.md file of the archive.

Trickier XBWT Tricks

This software implements the algorithms presented in "Trickier XBWT Tricks", hopefully to appear at SPIRE 2018.

Installation:

1. Download this .tar.bz2  archive

2. Unpack the .tar.bz2 file with the command "tar xfj trickyxbwt.tar.bz2".

3. Follow the install instructions in the README.md file of the archive.

A Lempel-Ziv-style compression method for repetitive texts

This software implements the algorithm presented in "A Lempel-Ziv-style compression method for repetitive texts" at the Prague Stringology Conference 2017.

Installation:

1. Download this .tar.xz  archive and unpack it with the command "tar xf lcpcompress.tar.xz".

2. Follow the install instructions in the README file of the archive.

Space-efficient Parallel Construction of Succinct Representations of Suffix Tree Topologies

This software contains algorithms for a sequential and parallel construction of succinct representations of tree topologies.

Installation:

1. Download this .tar.bz2  file

2. Unpack the .tar.bz2 file with the command "tar xfj bps.tar.bz2".

3. Follow the install instructions in the README file of the archive.

compressed deBruijn Graph - Tools for computing the compressed deBruijn Graph

The following steps describe how to use the algorithms presented in "A representation of a compressed de Bruijn graph for pan-genome analysis that enables search" at Algorithms for Molecular Biology:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_search.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


The following steps describe how to use the algorithms presented in "Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform" at Bioinformatics:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_bioinformatics.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


The following steps describe how to use the algorithms presented in "Efficient Construction of a Compressed de Bruijn Graph for Pan-Genome Analysis" at CPM2015:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_cpm2015.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


 The test data can be found here.

context-sensitive-repeats - Tools for computing context-diverse and near supermaximal repeats

Installation:

1. Download this tar.bz2-file.

2. Unpack the .tar.bz2 file with the command "tar xfj context-sensitive-repeats.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

construct_bwt - A tool for computing the BWT space-efficiently

Installation:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj construct_bwt.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

bwt_reverse - A tool for computing the BWT of the reverse string

Installation:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj bwt_reverse.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

bwt based LACA - A tool for computing the LCP array directly from the BWT

Installation:

1. Install libdivsufsort of Yuta Mori

2. Download this bz2-file.

3. Unpack the bz2 file with the command "tar xfvj bwt_based_laca.bz2".

4. Change directory to sdsl2/build and execute "cmake .." and "make".

5. Execute "make" in the root directory of the archive.

The last "make" will produce the executables bwt_based_laca, bwt_based_laca2 and others. You'll find more information in the readme.txt file (at the root directory of the archive).

Note: You can use the tool dbwt of Kunihiko Sadakane to compute the BWT directly from the input.

backwardMEM - A tool for computing maximal exact matches

Installation:

1. Download this tar.gz-file.

2. Unpack the tar.gz file with "tar -xzvf calcMEM.tar.gz".

3. Change directory to backwardMEM and unpack sdsl-0.7.3.tar.gz with "tar -xzvf sdsl-0.7.3.tar.gz"  and execute "./configure" and "make" in the sdsl-0.7.3/ directory.

4. Change directory to sparseMEM/ directory and execute 'make'

4. Change directory to backwardMEM and execute 'make'

The last 'make' will produce the executable backwardMEM1, backwardMEM2, backwardMEM4, backwardMEM8, and backwardMEM16.

Execute ./backwardMEM[1|2|4|8|16] -h to get information about the usage.

You can run the small example from the paper with "./backwardMEM1 l=2 example1.fasta example2.fasta"

Note:  We use the algorithm and original implementation of Larsson and Sadakane (article: "Faster suffix sorting", 2007) for the construction of the index. The algorithm works only for sequences of length < 2GB. We will replace the algorithm in the next version of backwardMEM to handle input of more than 2GB.

backwardSK - A tool to calculate string kernels

Installation:

1. Download this tar.gz-file.

2. Unpack the tar.gz file with "tar -xzvf backwardSK.tgz".

3. Change directory to sdsl2/skernel and read the README file to get further instructions

Bidirectional search in a string

 

C++ source code for the bidirectional search in a string with wavelet trees:

bidirectionalsearch.tar.gz

The documentation is available here.

Linear Time Algorithms for Generalizations of the Longest Common Substring Problem

Here, one can find pseudo-code and implementations of the algorithms described in the article "Linear Time Algorithms for Generalizations of the Longest Common Substring Problem" by Michael Arnold and Enno Ohlebusch, Algorithmica, 60(4):806--818, 2011.

All Pairs Suffix-Prefix Problem

In this tar.gz-file, one can find the implementation of an efficient algorithm for the all pairs suffix-prefix problem based on a generalized enhanced suffix array, as described in the article "Efficient Algorithms for the All Pairs Suffix-Prefix Problem and the All Pairs Substring-Prefix Problem" by Enno Ohlebusch and Simon Gog, Information Processing Letters 110(3):123-128, 2010. It also contains an alternative implementation based on a generalized suffix tree and the EST database of C. elegans for test purposes.