This function allows to search and download data from the the NCBI Nucleotide database. Additionally it uses the NCBI Taxonomy database to get the sequence taxonomic classification.

refdb_import_NCBI(
  query,
  full = FALSE,
  max_seq_length = 10000,
  seq_bin = 200,
  verbose = TRUE,
  start = 0L
)

Arguments

query

a character string with the query.

full

a logical. If FALSE (the default), only a subset of the most important fields is included in the result.

max_seq_length

a numeric giving the maximum length of sequences to retrieve. Useful to exclude complete genomes.

seq_bin

number of sequences to download at once.

verbose

print information in the console.

start

an integer giving the index where to start to download. For debugging purpose mainly.

Value

A tibble.

Details

This function uses several functions of the rentrez package to interface with the NCBI's EUtils API.*

Errors

Error in curl::curl_fetch_memory(url, handle = handle) : transfer closed with outstanding read data remaining
This error seems to appear with long sequences. You can try to decrease max_seq_length to exclude them.

Examples

if (FALSE) { # \dontrun{
silo_ncbi <- refdb_import_NCBI("Silo COI")
} # }