DaisySuite example¶
To run DaisySuite on an example dataset, first copy the example into a directory of your choice by running
DaisySuite_example .
to copy the folder example
into the current working directory.
Next, you need to edit the following parameters in the example/example.yaml:
- outputdir (full path to example/output/)
- ncbidir
- bwa (in case you are using bwa)
- yara_index or bwa_index
Finally, you can run DaisySuite:
DaisySuite --configfile example/example.yaml
You can also use multiple threads by adding --cores <thread_number>
, e.g. --cores 10
, to the command.
DaisyGPS results¶
You will find the acceptor Escherichia coli str. K-12 substr. DH10B [NC_010473.1] and the donor Helicobacter pylori [NZ_AP014710.1] in the example/output/candidates/sim1HP_candidates.tsv
file.
Type | Name | Accession.Version | TaxID | Parent TaxID | Species TaxID | Abundance | Num. Reads | Unique Reads | Coverage | Validity | Homogeneity | Mapping Error | Property Score | Property |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Acceptor | Escherichia coli str. K-12 substr. DH10B | NC_010473.1 | 316385 | 83333 | 562 | 0.946692320209823 | 197800 | 136 | 36.8658152545 | 0.25427675964 | 0.08146934189377475 | 0.021496966633 | 0.0030079101341754125 | 0.17280741774622527 |
Acceptor | Escherichia coli K-12 | NZ_CP010445.1 | 83333 | 562 | 562 | 0.8952416506332022 | 187050 | 0 | 35.7282033189 | 0.237248626581 | 0.07496815501320375 | 0.0214397576406 | 0.0026711615990942022 | 0.16228047156779624 |
Donor | [Haemophilus] ducreyi | NZ_CP015434.1 | 730 | 724 | 730 | 0.0015411270328997118 | 322 | 0 | 24.7946611905 | 0.00109715387698 | 0.9254876430438622 | 0.0265838509317 | -2.619313788987036e-05 | -0.9243904891668824 |
Donor | Salmonella enterica subsp. enterica serovar Anatum str. USDA-ARS-USMARC-1676 | NZ_CP014620.1 | 1454587 | 58712 | 28901 | 0.0006030497085259743 | 126 | 0 | 0.0645783951687 | 0.00108528243123 | 0.9193443687114572 | 0.013492063492100002 | -1.0181504759172115e-05 | -0.9182590862802272 |
Donor | Klebsiella oxytoca KONIH1 | NZ_CP008788.1 | 1333852 | 571 | 571 | 0.008571920856904919 | 1791 | 0 | 41.2926529358 | 0.00105750960227 | 0.7946631781949077 | 0.0263539921831 | -0.00012507673507004732 | -0.7936056685926377 |
Donor | Helicobacter pylori | NZ_AP014710.1 | 210 | 209 | 210 | 0.043812039935291806 | 9154 | 9091 | 47.3515414856 | 0.0177922034096 | 0.7999996890279752 | 0.00920544752749 | -0.0006300993983310352 | -0.7822074856183753 |
Acceptor-like Donor | Escherichia coli | NZ_CP016182.1 | 562 | 561 | 562 | 0.35694799414180284 | 74580 | 0 | 6.99537943225 | 0.0939148463399 | 0.0879420602326042 | 0.0211231786896 | 3.919904897022354e-05 | 0.00597278610729579 |
Daisy results¶
Furthermore, you will find the base pair positions of the transfer in example/output/hgt_eval/sim1HP.vcf
.
Bases 1322000 to 1350000 of the donor have been inserted at base 1120262 of the acceptor. This is indicated by two breakpoints in the vcf, one representing the beginning of the insert (acceptor 1120261, donor 1322000) and one representing the end of the insert (acceptor 1120263, donor 1350000).
The example/output/hgt_eval/sim1HP.tsv
also provides a more intuitive representation of putative transferred regions, but please note that those candidates have not been filtered by the sampling values.
#AN: Acceptor name
#DN: Donor name
#AS: Acceptor start position
#AE: Acceptor end position
#DS: Donor start position
#DE: Donor end position
#MC: Mean coverage in region
#Split: Total number split-reads per region (including duplicates!)
#PS-S: Pairs spanning HGT boundaries
#PS-W: Pairs within HGT boundaries
#Phage: PS-S and PS-W reads mapping to phage database
#BS:MC/PS-S/PS-W: Percent of bootstrapped random regions with MC/PS-S/PS-W smaller than candidate
AN | DN | AS | AE | MC | BS:MC | DS | DE | MC | Split | PS-S | PS-W | Phage | BS:MC | BS:PS-S | BS:PS-W |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NZ_CP010445.1 | NZ_AP014710.1 | 1880235 | 1880237 | 44.00 | 7 | 1322002 | 1350000 | 94.62 | 152 | 182 | 8712 | 0.0000 | 100 | 100 | 100 |
NZ_CP010445.1 | NZ_CP015434.1 | 3904873 | 3904886 | 40.54 | 3 | 114928 | 126957 | 30.41 | 871 | 156 | 884 | 0.0000 | 100 | 100 | 100 |
NZ_CP010445.1 | NZ_CP015434.1 | 3904873 | 3916007 | 97.63 | 20 | 125626 | 126957 | 129.68 | 1571 | 108 | 258 | 0.0000 | 100 | 100 | 100 |
NZ_CP010445.1 | NZ_CP015434.1 | 3904885 | 3916007 | 97.69 | 18 | 114927 | 125626 | 18.06 | 253 | 43 | 279 | 0.0000 | 100 | 99 | 100 |
NC_010473.1 | NZ_AP014710.1 | 1120261 | 1120263 | 43.00 | 3 | 1322002 | 1350000 | 94.62 | 154 | 182 | 8712 | 0.0000 | 100 | 100 | 100 |
##fileformat=VCFv4.2
##source=DAISY
##INFO=<ID=EVENT,Number=1,Type=String,Description="Event identifier for breakends.">
##contig=<ID=NC_010473.1>
##contig=<ID=NZ_CP010445.1>
##contig=<ID=NZ_CP015434.1>
##contig=<ID=NZ_CP014620.1>
##contig=<ID=NZ_CP008788.1>
##contig=<ID=NZ_AP014710.1>
##contig=<ID=NZ_CP016182.1>
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT |
---|---|---|---|---|---|---|---|---|
NZ_CP010445.1 | 1880235 | BND_1_1 | A | A[NZ_AP014710.1:1322002[ | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |
NZ_CP010445.1 | 1880237 | BND_1_2 | G | ]NZ_AP014710.1:1350000]G | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |
NZ_CP010445.1 | 3904873 | BND_1_1 | T | T[NZ_CP015434.1:114928[ | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |
NZ_CP010445.1 | 3904886 | BND_1_2 | C | ]NZ_CP015434.1:126957]C | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |
NC_010473.1 | 1120261 | BND_1_1 | A | A[NZ_AP014710.1:1322002[ | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |
NC_010473.1 | 1120263 | BND_1_2 | G | ]NZ_AP014710.1:1350000]G | PASS | SVTYPE=BND;EVENT=HGT1 | . | 1 |