omicverse.single.autoResolution

omicverse.single.autoResolution#

omicverse.single.autoResolution(adata, resolutions=None, *, method='bootstrap-ari', n_subsamples=5, subsample_frac=0.8, use_null_correction=True, n_null_subsamples=3, null_seed=42, null_layer=None, gamma_min=0.05, gamma_max=3.0, n_partitions=30, min_clusters=3, key_added='leiden', random_state=0, verbose=True)[source]#

Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004).

Algorithm#

For each candidate resolution \(r\):

Run Leiden on the full AnnData → reference labels at r.
Take :paramref:`n_subsamples` independent without-replacement subsamples of size subsample_frac × n_obs.
For each subsample run Leiden on the induced subgraph and compute Adjusted Rand Index against the reference restricted to the subsample.
\(s_\mathrm{real}(r) = \mathrm{mean\,ARI}\) across the :paramref:`n_subsamples` bootstraps — the observed reproducibility.

Bootstrap stability alone is biased toward fine resolutions: small tight clusters are mechanically reproducible under any subsampling procedure regardless of whether they reflect biological structure. The Lange–Buhmann fix is to subtract a procedurally-matched null:

Build a null AnnData by independently permuting each gene’s expression across cells. This preserves per-gene marginal distributions but destroys all cell-cell co-expression — there is no cluster structure left.
Run the same PCA → kNN-graph → bootstrap-stability pipeline on the null with :paramref:`n_null_subsamples` subsamples. \(s_\mathrm{null}(r)\) is the chance-level reproducibility of leiden at this resolution given the data’s marginal-only statistical structure.
The excess stability is

\[\Delta(r) = s_\mathrm{real}(r) - s_\mathrm{null}(r)\]

and the chosen resolution is \(\arg\max_r \Delta(r)\) subject to producing at least :paramref:`min_clusters` clusters on the full data.

Setting :paramref:`use_null_correction` =False falls back to plain bootstrap-ARI; it’s exposed mostly for diagnostics. The default is the null-adjusted variant.

type adata:: AnnData
param adata:: AnnData with a precomputed neighbor graph (adata.obsp['connectivities']).
type resolutions:: Optional[Sequence[float]] (default: None)
param resolutions:: Candidate resolutions to test. Defaults to np.round(np.arange(0.2, 1.6, 0.1), 2).
type n_subsamples:: int (default: 5)
param n_subsamples:: Bootstrap subsamples per resolution on the real data.
type subsample_frac:: float (default: 0.8)
param subsample_frac:: Fraction of cells in each subsample. Default 0.8.
type use_null_correction:: bool (default: True)
param use_null_correction:: If True (default), subtract null-pipeline stability per Lange et al. 2004. False returns plain bootstrap stability.
type n_null_subsamples:: int (default: 3)
param n_null_subsamples:: Bootstrap subsamples per resolution on the null data — can be smaller than :paramref:`n_subsamples` because the null is low-variance.
type null_seed:: int (default: 42)
param null_seed:: Seed for the per-gene permutation generating the null. Held separate from :paramref:`random_state` so the null is reproducible independently of the real-data search.
type null_layer:: Optional[str] (default: None)
param null_layer:: adata.layers key to permute. Defaults to adata.X.
type min_clusters:: int (default: 3)
param min_clusters:: Lower bound on the number of clusters the chosen resolution must produce on the full data; degenerate resolutions are excluded from the argmax.
type key_added:: str (default: 'leiden')
param key_added:: adata.obs column to write the chosen resolution’s labels to.
type random_state:: int (default: 0)
param random_state:: Seed for subsample selection and Leiden on the real data.
type verbose:: bool (default: True)
param verbose:: Stream per-resolution scores during the search.
returns:: (adata, best_resolution, scores_df). scores_df is indexed by resolution with columns stability_real, stability_null, excess_stability, std_real, n_clusters. Also writes adata.obs[key_added] and adata.uns['autoResolution'].
rtype:: Tuple[anndata.AnnData, float, pandas.DataFrame]

References

Lange, Roth, Braun, Buhmann. “Stability-based validation of clustering solutions.” Neural Computation 16(6):1299–1323, 2004.

Weir, Emmons, Wakefield, Hopkins, Mucha. “Post-processing partitions to identify domains of modularity optimization.” Algorithms 10(3):93, 2017 (used when method='champ').

omicverse.single.autoResolution

Contents

omicverse.single.autoResolution#

Algorithm#