Skip to contents

This functions calculates three indices (Davies-Bouldin, Calinsky-Harabasz and average Silhouette score) for each k. Calculations are made for a single sample and for a default range of k that goes from 3 to 10.


  samples_col = "Sample",
  abundance_col = "Abundance",
  range = 3:10,
  with_plot = FALSE,



a data.frame with, at least, the classification, abundance and sample information for each phylogenetic unit.


String with name of the sample to apply this function.


String with name of column with sample names.


string with name of column with abundance values. Default is "Abundance".


The range of values of k to test, default is from 3 to 10.


If FALSE (default) returns a vector, but if TRUE will return a plot with the scores.


Extra arguments.


A data.frame (or plot) with several indices for each number of clusters.


Note: To get the indices for all samples, use evaluate_k() instead.

Data input

This function takes a data.frame with a column for samples and a column for abundance (minimum), but can take any number of other columns. It will then filter the specific sample that you want to analyze. You can also pre-filter for your specific sample, but you still need to provide the sample ID (sample_id) and the table always needs a column for Sample and another for Abundance (indicate how you name them with the arguments samples_col and abundance_col).

Output options

The default option returns a data.frame with Davies-Bouldin, Calinsky-Harabasz and average Silhouette scores for each k. This is a simple output that can then be used for other analysis. However, we also provide the option to show a plot (set with_plot = TRUE).

Three indices are calculated by this function:


evaluate_sample_k(nice_tidy, sample_id = "ERR2044662")
#>          DB        CH average_Silhouette  k
#> 1 0.3721866  1821.426          0.9521452  3
#> 2 0.5271704  2054.887          0.8820316  4
#> 3 0.4131651  4933.956          0.8561774  5
#> 4 0.4292696  5465.134          0.8398216  6
#> 5 0.3350836 17589.032          0.8479872  7
#> 6 0.3892966 17179.809          0.7843358  8
#> 7 0.3948026 18083.313          0.7740169  9
#> 8 0.3294451 30332.345          0.7701163 10

# To change range
evaluate_sample_k(nice_tidy, sample_id = "ERR2044662", range = 4:11)
#>          DB        CH average_Silhouette  k
#> 1 0.5271704  2054.887          0.8820316  4
#> 2 0.4131651  4933.956          0.8561774  5
#> 3 0.4292696  5465.134          0.8398216  6
#> 4 0.3350836 17589.032          0.8479872  7
#> 5 0.3892966 17179.809          0.7843358  8
#> 6 0.3948026 18083.313          0.7740169  9
#> 7 0.3294451 30332.345          0.7701163 10
#> 8 0.2782100 55354.240          0.7629755 11

# To make simple plot
evaluate_sample_k(nice_tidy, sample_id = "ERR2044662", range = 4:11, with_plot =TRUE)