Export reference database for DECIPHER (IDTAXA) — refdb_export

Write a reference database in file formats which can be used to train the IDTAXA classifier implemented in DECIPHER.

refdb_export_idtaxa(x, file, taxid = FALSE)

Arguments

x: a reference database.
file: a file path without extension. This will be used to create a .fasta file and two .txt files.
taxid: should the taxid file be generated (can be very slow with large databases)

Value

No return value, called for side effects.

Details

The functions generates three files.

- A fasta files containing the sequences with their IDs. This file must be imported as a DNAStringSet to be used with DECIPHER, using eg:
Biostrings::readDNAStringSet("ex_seqs.fasta")

- A text files containing the sequence taxonomic assignment. This file must be imported as a character vector to be used with DECIPHER, using eg:
readr::read_lines("ex_taxo.txt")

- A text file ("taxid") containing the taxonomic ranks associated with each taxon. This is an asterisk delimited file which must be imported as a dataframe (see LearnTaxa), using eg:
readr::read_delim("ex_ranks.txt", col_names = c('Index', 'Name', 'Parent', 'Level', 'Rank'), delim = "*", quote = "")

The taxid file can be very slow to write for large datasets. Therefore it is not generated by default.

Examples

lib <- read.csv(system.file("extdata", "baetidae_bold.csv", package = "refdb"))
lib <- refdb_set_fields_BOLD(lib)
refdb_export_idtaxa(lib, tempfile())