Example of FlexTyper Indexing
How to generate the index for a read set.
Cmd Line:
You can display the input options using
$ ./flextyper index -h
Usage: ./flextyper [options] readFileName
Description: flextyper enables the user to quickly search for kmers inside the FmIndex
Options:
-h, --help Displays this help.
-r, --readFile <readFileName> Please provide the name of the read file
-o, --outputDir <directory> output index directory
-x, --indexFileName <file> index filename (!without .fm9 extension)
-p, --readPairfile <file> name of the paired read file
-n, --numOfIndexes <value> split the reads into n indexes
-l, --readLength <value> read length
--fq, --fastq input file is in fq format
--fa, --fasta input file is in fasta format
--gz, --fq.gz input file is in fq.gz format
-c, --revComp include the rev comp in the index
--dfq, --delFQ delete the fq files once the index is built
--dfa, --delFasta delete the fa fastas once the index is built
-v, --verbose prints debugging messages
Arguments:
readFileName contains the name of the read file
Test Example
To generate the index for the Mixed Virus sample in the Test_Example folder:
cd Test_Example
../build/flextyper index -r MixedVirus_100_1.fq.gz -p MixedVirus_100_2.fq.gz --gz
This will show on std::cout something similar to (with file paths adjusted):
"./Test_Example"
build directory "../build"
preprocessing with:
-r readFile MixedVirus_100_1.fq.gz
-f readSetName MixedVirus_100
-o outputDir ./tmp_ppf
-z zippedReads 1
-n numberOfIndexes 1
-c reverseComplement 0
-u pathToUtils OpenFlexTyper/build/bin/
using paired reads 1
-p readPairFile MixedVirus_100_2.fq.gz
createFasta: ./tmp_ppf/MixedVirus_100.fasta created
processReadFile: output saved to ./tmp_ppf/MixedVirus_100.fasta
createFasta: ./tmp_ppf/MixedVirus_100_pair.fasta created
processReadFile: output saved to ./tmp_ppf/MixedVirus_100_pair.fasta
main: preprocessing complete ./tmp_ppf/MixedVirus_100.fasta
Running FM Index
Output Files
And generates the following files:
tmp_ppf/MixedVirus_100.fasta
This contains the pre processed reads, that have been stripped of everything but the sequences.
Index.log
======== Wed Oct 21 14:57:00 2020
Running ../build/flextyper index -r MixedVirus_100_1.fq.gz -p MixedVirus_100_2.fq.gz --gz
Build directory ../build
R1 MixedVirus_100_1.fq.gz
R2 MixedVirus_100_2.fq.gz
read set Name MixedVirus_100
Output Folder not set
Setting Output Folder to current path
Output Folder .
PPF Folder ./tmp_ppf
Index File Name not set
Default Index Name set: Index
bash args bash ../build/preprocess.sh -r MixedVirus_100_1.fq.gz -o ./tmp_ppf -f MixedVirus_100 -u ../build/bin/ -z 1 -p MixedVirus_100_2.fq.gz
Preprocess path ../build/preprocess.sh
running preprocess.sh with bash ../build/preprocess.sh -r MixedVirus_100_1.fq.gz -o ./tmp_ppf -f MixedVirus_100 -u ../build/bin/ -z 1 -p MixedVirus_100_2.fq.gz
indexing "tmp_ppf/MixedVirus_100.fasta"
creating index for MixedVirus_100 at "Index_MixedVirus_100.fm9"
index created "Index_MixedVirus_100.fm9" with offset 0
Index_MixedVirus_100.fm9
This is the index binary itself.
Index_MixedVirus_100_index.ini
This contains the auto generated properties of the index file and should be kept in the same directory as the index file
[General]
R1=MixedVirus_100_1.fq.gz
R2=MixedVirus_100_2.fq.gz
buildDirectory=../build
delFQ=false
delFasta=false
indexDirectory=.
indexFileName=Index
numOfIndexes=1
numOfReads=16500
pairedReads=true
readLength=150
readSetName=MixedVirus_100
revComp=false
[IndexFiles]
1\fileName=Index_MixedVirus_100.fm9
1\offset=0
size=1