Skip to contents

Identify the Best Parameters For Your Dataset

Usage

find_best_params(
  x,
  genelist,
  bins_count_range = c(5, 10, 20, 40),
  gene_count_range = c(10, 20, 40, 80),
  bootstrap_iterations = 200,
  BPPARAM = BiocParallel::SerialParam(),
  ...
)

Arguments

x

The object to create `BlaseData“ from

genelist

Vector of strings. The list of genes to use (ordered by descending goodness)

bins_count_range

Integer vector. The n_bins list to try out

gene_count_range

Integer vector. The n_genes list to try out

bootstrap_iterations

Integer. Iterations for bootstrapping when calculating strong mappings.

BPPARAM

The BiocParallel::BiocParallelParam. Defaults to BiocParallel::SerialParam

...

params to be passed to child functions, see as.BlaseData()

Value

A dataframe of the results.

  • bin_count: Integer. The bin count for this attempt

  • gene_count: Integer. The top n genes to use for this attempt

  • min_convexity: Decimal. The worst convexity for these parameters

  • mean_convexity: Decimal. The mean convexity for these parameters

  • strong_mapping_pct: Decimal. The percent of bins which were strongly mapped to themselves for these parameters. If this value is low, then it is likely that in real use, few or no results will be strongly mapped.

See also

plot_find_best_params_results() for plotting the results of this function.

Examples

ncells <- 70
ngenes <- 100
counts_matrix <- matrix(
    c(seq_len(3500) / 10, seq_len(3500) / 5),
    ncol = ncells,
    nrow = ngenes
)
sce <- SingleCellExperiment::SingleCellExperiment(assays = list(
    normcounts = counts_matrix, logcounts = log(counts_matrix)
))
colnames(sce) <- paste0("cell", seq_len(ncells))
rownames(sce) <- paste0("gene", seq_len(ngenes))
sce$cell_type <- c(
    rep("celltype_1", ncells / 2),
    rep("celltype_2", ncells / 2)
)

sce$pseudotime <- seq_len(ncells) - 1
genelist <- rownames(sce)

# Finding the best params for the BlaseData
best_params <- find_best_params(
    sce, genelist,
    bins_count_range = c(2, 3),
    gene_count_range = c(20, 50),
    pseudotime_slot = "pseudotime",
    split_by = "pseudotime_range"
)
best_params
#>   column_label bin_count gene_count min_convexity mean_convexity
#> 1            1         2         20             0              0
#> 2            2         2         50             0              0
#> 3            1         3         20             0              0
#> 4            2         3         50             0              0
#>   strong_mapping_pct
#> 1                  0
#> 2                  0
#> 3                  0
#> 4                  0
plot_find_best_params_results(best_params)