Write a reference database in file formats which can be used to train the IDTAXA classifier implemented in DECIPHER.

refdb_export_idtaxa(x, file, taxid = FALSE)

Arguments

x

a reference database.

file

a file path without extension. This will be used to create a .fasta file and two .txt files.

taxid

should the taxid file be generated (can be very slow with large databases)

Value

No return value, called for side effects.

Details

The functions generates three files.

- A fasta files containing the sequences with their IDs. This file must be imported as a DNAStringSet to be used with DECIPHER, using eg:
Biostrings::readDNAStringSet("ex_seqs.fasta")

- A text files containing the sequence taxonomic assignment. This file must be imported as a character vector to be used with DECIPHER, using eg:
readr::read_lines("ex_taxo.txt")

- A text file ("taxid") containing the taxonomic ranks associated with each taxon. This is an asterisk delimited file which must be imported as a dataframe (see LearnTaxa), using eg:
readr::read_delim("ex_ranks.txt", col_names = c('Index', 'Name', 'Parent', 'Level', 'Rank'), delim = "*", quote = "")

The taxid file can be very slow to write for large datasets. Therefore it is not generated by default.

Examples

lib <- read.csv(system.file("extdata", "baetidae_bold.csv", package = "refdb"))
lib <- refdb_set_fields_BOLD(lib)
refdb_export_idtaxa(lib, tempfile())