DaisySuite configuration

The configuration of DaisySuite is done by a yaml file. You can generate a template containing all options with explanation by using

DaisySuite_template .

to create a template in the current working directory. Additionally, the options are explained on this page.

General configuration

There are three parameters that determine which steps of the pipeline are run (read simulation, DaisyGPS, Daisy). Possible values are true and false. To run, for example, DaisyGPS and Daisy on an existing read dataset, use:

sim: false
daisygps: true
daisy: true

Also, an output directory must be defined via the outputdir variable:

outputdir: 'path/to/output/directory'

If sim is set to true this directory can be empty, otherwise the reads are expected to be in the read directory of the output directory. The reads must be named <sample>.1.fq and <sample>.2.fq, see also Using DaisySuite.

Simulation configuration

The simulation needs the NCBI folder created by the setup (see Using DaisySuite), e.g.:

ncbidir: 'path/to/DaisySuite/data/ncbi'

You can futher specify how many simulations are to be run, the coverage and the read length:

simulates: 1
coverage: 100
readlength: 150

This will create one simulated read data set where the reference is covered about 100 times by reads with an average length of 150.

You can also choose to not incorporate any HGT event, e.g. to test for specificity:

negative: true

Furthermore, there are various parameters to configure the read simulation done by mason, e.g. SNP rate:

# Acceptor: Mason variator SNP rate
a_snp: 0.01
# Acceptor: Mason variator small indel rate
a_sir: 0.001
# Acceptor: Mason variator sv indel rate
a_svir: 0.00001
# Donor: Mason variator SNP rate
d_snp: 0.001
# Donor: Mason variator small indel rate
d_sir: 0.001
# Donor: Mason variator max small indel size
d_msis: 4

DaisyGPS configuration

You can select a mapper (bwa or yara) and the number of threads used by the mapper:

bwa: false
threads: 20

Those settings are shared with Daisy.

According to the choice of your mapper, you need to specify the location of the index. The indices are generated during the setup (see Using DaisySuite):

yara_index: 'path/to/yara_index'
bwa_index: 'path/to/bwa_index'

Note that you can specify both indices, the pipeline will pick the index according to your chosen mapper.

You also need to specify the NCBI directory as described in the Simulation section.

You can select taxons per sample that should not be included in the candidate selection. You can either blacklist single taxons, a whole species or all children of a taxon:

taxon_blacklist:
  - sample1:
    - 672612
  - sample2:
    - 726312

species_blacklist:
  - sample2:
    - 1270

parent_blacklist:
  - sample2:
    - 3173

You can also opt to only report candidates that are classified more specific than species level, i.e. if a taxon has the same taxonomic ID as the corresponding species taxonomic ID, the taxon is not reported. If the resulting candidates list would be empty with this filter activated, the filter is ignored.

filter_species: true

Furthermore, you can set how many acceptors, donors and acceptor-like donors are reported:

number_acceptors: 2
number_donors: 3
number_accdons: 2

Daisy configuration

The mapper and threads choices are shared with DaisyGPS.

You can either use a sensitive mode (using Stellar and Gustaf) or less sensitive mode (using laser).

sensitivemode: true

In case you are using the sensitive mode, the reads are mapped and filtered for not properly aligned pairs. There is an option to only filter for reads where at least one mate is unmapped, if the filtering for not properly aligned pairs should yield more than 750000 entires. This will decrease the runtime at the cost of decreased sensitivity.

samflag_filter: true

Another option for the sensitive mode is to define the needed number of reads that support a breakpoint for Gustaf. The default is 2 and increasing it will lead to a more strict search for breakpoints. This option is set per sample.

gustaf_st:
  - sample1:
    - 4

Daisy can also search against a phage database. An empty entry will result in no search against the phage DB. If you wish to search against a database, inlclude the full path to the fasta file.

phage: ''

Furthermore, you can set parameters regarding the HGT detection, i.e. the minimum/maximum HGT size and the needed support by sampling.

hgt_min: 100
hgt_max: 55000
hgt_sens: 90

Command line help

DaisySuite -h provides a short command overview

DaisySuite Pipeline (powered by Snakemake)

Usage: DaisySuite --configfile FILE [Snakemake options]

 Useful Snakemake parameters:
   -j, --cores            number of cores
   -k, --keep-going       go on with independent jobs if a job fails
   -n, --dryrun           do not execute anything
   -p, --printshellcmds   print out the shell commands that will be executed

 Full list of parameters:
   --help                 show Snakemake help (or snakemake -h)