Cluster sequences by similarity
seq_cluster(x, threshold = 0.05, method = "complete")
a DNA, RNA or AA vector of sequences to clustered.
Threshold value (range in [0, 1]).
the clustering method (see details).
An integer vector with group memberships.
The function uses ape dist.dna
and
dist.aa
functions to compute pairwise distances among sequences and
hclust
for clustering.
Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.
Supported methods are:
"single"
(= Nearest Neighbour Clustering)
"complete"
(= Farthest Neighbour Clustering)
"average"
(= UPGMA)
"mcquitty"
(= WPGMA)
Function seq_consensus
to compute consensus
and representative sequences for clusters.
Other aggregation operations:
seq_consensus()