Write a reference database in file formats which can be used
to train the IDTAXA classifier implemented in DECIPHER.
refdb_export_idtaxa(x, file, taxid = FALSE)No return value, called for side effects.
The functions generates three files.
- A fasta files containing the sequences with their IDs.
This file must be imported as a DNAStringSet
to be used with DECIPHER, using eg:Biostrings::readDNAStringSet("ex_seqs.fasta")
- A text files containing the sequence taxonomic assignment.
This file must be imported as a character vector
to be used with DECIPHER, using eg:readr::read_lines("ex_taxo.txt")
- A text file ("taxid") containing the taxonomic ranks
associated with each taxon. This is an asterisk delimited file
which must be imported as a dataframe (see LearnTaxa), using eg:readr::read_delim("ex_ranks.txt",
col_names = c('Index', 'Name', 'Parent', 'Level', 'Rank'),
delim = "*", quote = "")
The taxid file can be very slow to write for large datasets. Therefore it is not generated by default.
lib <- read.csv(system.file("extdata", "baetidae_bold.csv", package = "refdb"))
lib <- refdb_set_fields_BOLD(lib)
refdb_export_idtaxa(lib, tempfile())