Sequence Analysis

A Lempel-Ziv-style compression method for repetitive texts

This software contains the algorithm presented in "A Lempel-Ziv-style compression method for repetitive texts" submitted to the Prague Stringology Conference 2017.

Installation:

1. Download this .tar.xz  archive and unpack it with the command "tar xf lcpcompress.tar.xz".

2. Follow the install instructions in the README file of the archive.

Space-efficient Parallel Construction of Succinct Representations of Suffix Tree Topologies

This Software contains Algorithms for sequential and parallel construction of succinct representations of tree topologies.

Installation:

1. Download this .tar.bz2  file

2. Unpack the .tar.bz2 file with the command "tar xfj bps.tar.bz2".

3. Follow the install instructions in the README file of the archive.

compressed deBruijn Graph - Tools for computing the compressed deBruijn Graph

The following steps describe how to use the algorithms presented in "A representation of a compressed de Bruijn graph for pan-genome analysis that enables search" at Algorithms for Molecular Biology:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_search.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


The following steps describe how to use the algorithms presented in "Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform" at Bioinformatics:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_bioinformatics.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


The following steps describe how to use the algorithms presented in "Efficient Construction of a Compressed de Bruijn Graph for Pan-Genome Analysis" at CPM2015:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj cdbg_cpm2015.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.


 The test data can be found here.

context-sensitive-repeats - Tools for computing context-diverse and near supermaximal repeats

Installation:

1. Download this tar.bz2-file.

2. Unpack the .tar.bz2 file with the command "tar xfj context-sensitive-repeats.tar.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

construct_bwt - A tool for computing the BWT space-efficiently

Installation:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj construct_bwt.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

bwt_reverse - A tool for computing the BWT of the reverse string

Installation:

1. Download this bz2-file.

2. Unpack the bz2 file with the command "tar xfvj bwt_reverse.bz2".

3. Follow the install instructions in the readme.txt file of the archive.

bwt based LACA - A tool for computing the LCP array directly from the BWT

Installation:

1. Install libdivsufsort of Yuta Mori

2. Download this bz2-file.

3. Unpack the bz2 file with the command "tar xfvj bwt_based_laca.bz2".

4. Change directory to sdsl2/build and execute "cmake .." and "make".

5. Execute "make" in the root directory of the archive.

The last "make" will produce the executables bwt_based_laca, bwt_based_laca2 and others. You'll find more information in the readme.txt file (at the root directory of the archive).

Note: You can use the tool dbwt of Kunihiko Sadakane to compute the BWT directly from the input.

backwardMEM - A tool for computing maximal exact matches

Installation:

1. Download this tar.gz-file.

2. Unpack the tar.gz file with "tar -xzvf calcMEM.tar.gz".

3. Change directory to backwardMEM and unpack sdsl-0.7.3.tar.gz with "tar -xzvf sdsl-0.7.3.tar.gz"  and execute "./configure" and "make" in the sdsl-0.7.3/ directory.

4. Change directory to sparseMEM/ directory and execute 'make'

4. Change directory to backwardMEM and execute 'make'

The last 'make' will produce the executable backwardMEM1, backwardMEM2, backwardMEM4, backwardMEM8, and backwardMEM16.

Execute ./backwardMEM[1|2|4|8|16] -h to get information about the usage.

You can run the small example from the paper with "./backwardMEM1 l=2 example1.fasta example2.fasta"

Note:  We use the algorithm and original implementation of Larsson and Sadakane (article: "Faster suffix sorting", 2007) for the construction of the index. The algorithm works only for sequences of length < 2GB. We will replace the algorithm in the next version of backwardMEM to handle input of more than 2GB.

backwardSK - A tool to calculate string kernels

Installation:

1. Download this tar.gz-file.

2. Unpack the tar.gz file with "tar -xzvf backwardSK.tgz".

3. Change directory to sdsl2/skernel and read the README file to get further instructions

Bidirectional search in a string

 

C++ source code for the bidirectional search in a string with wavelet trees:

bidirectionalsearch.tar.gz

The documentation is available here.

Linear Time Algorithms for Generalizations of the Longest Common Substring Problem

Here, one can find pseudo-code and implementations of the algorithms described in the article "Linear Time Algorithms for Generalizations of the Longest Common Substring Problem" by Michael Arnold and Enno Ohlebusch, Algorithmica, 60(4):806--818, 2011.

All Pairs Suffix-Prefix Problem

In this tar.gz-file, one can find the implementation of an efficient algorithm for the all pairs suffix-prefix problem based on a generalized enhanced suffix array, as described in the article "Efficient Algorithms for the All Pairs Suffix-Prefix Problem and the All Pairs Substring-Prefix Problem" by Enno Ohlebusch and Simon Gog, Information Processing Letters 110(3):123-128, 2010. It also contains an alternative implementation based on a generalized suffix tree and the EST database of C. elegans for test purposes.