omicverse.single.autoResolution

Contents

omicverse.single.autoResolution#

omicverse.single.autoResolution(adata, resolutions=None, *, method='bootstrap-ari', n_subsamples=5, subsample_frac=0.8, use_null_correction=True, n_null_subsamples=3, null_seed=42, null_layer=None, gamma_min=0.05, gamma_max=3.0, n_partitions=30, min_clusters=3, key_added='leiden', random_state=0, verbose=True)[source]#

Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004).

Algorithm#

For each candidate resolution \(r\):

  1. Run Leiden on the full AnnData → reference labels at r.

  2. Take :paramref:`n_subsamples` independent without-replacement subsamples of size subsample_frac × n_obs.

  3. For each subsample run Leiden on the induced subgraph and compute Adjusted Rand Index against the reference restricted to the subsample.

  4. \(s_\mathrm{real}(r) = \mathrm{mean\,ARI}\) across the :paramref:`n_subsamples` bootstraps — the observed reproducibility.

Bootstrap stability alone is biased toward fine resolutions: small tight clusters are mechanically reproducible under any subsampling procedure regardless of whether they reflect biological structure. The Lange–Buhmann fix is to subtract a procedurally-matched null:

  1. Build a null AnnData by independently permuting each gene’s expression across cells. This preserves per-gene marginal distributions but destroys all cell-cell co-expression — there is no cluster structure left.

  2. Run the same PCA → kNN-graph → bootstrap-stability pipeline on the null with :paramref:`n_null_subsamples` subsamples. \(s_\mathrm{null}(r)\) is the chance-level reproducibility of leiden at this resolution given the data’s marginal-only statistical structure.

  3. The excess stability is

    \[\Delta(r) = s_\mathrm{real}(r) - s_\mathrm{null}(r)\]

    and the chosen resolution is \(\arg\max_r \Delta(r)\) subject to producing at least :paramref:`min_clusters` clusters on the full data.

Setting :paramref:`use_null_correction` =False falls back to plain bootstrap-ARI; it’s exposed mostly for diagnostics. The default is the null-adjusted variant.

type adata:

AnnData

param adata:

AnnData with a precomputed neighbor graph (adata.obsp['connectivities']).

type resolutions:

Optional[Sequence[float]] (default: None)

param resolutions:

Candidate resolutions to test. Defaults to np.round(np.arange(0.2, 1.6, 0.1), 2).

type n_subsamples:

int (default: 5)

param n_subsamples:

Bootstrap subsamples per resolution on the real data.

type subsample_frac:

float (default: 0.8)

param subsample_frac:

Fraction of cells in each subsample. Default 0.8.

type use_null_correction:

bool (default: True)

param use_null_correction:

If True (default), subtract null-pipeline stability per Lange et al. 2004. False returns plain bootstrap stability.

type n_null_subsamples:

int (default: 3)

param n_null_subsamples:

Bootstrap subsamples per resolution on the null data — can be smaller than :paramref:`n_subsamples` because the null is low-variance.

type null_seed:

int (default: 42)

param null_seed:

Seed for the per-gene permutation generating the null. Held separate from :paramref:`random_state` so the null is reproducible independently of the real-data search.

type null_layer:

Optional[str] (default: None)

param null_layer:

adata.layers key to permute. Defaults to adata.X.

type min_clusters:

int (default: 3)

param min_clusters:

Lower bound on the number of clusters the chosen resolution must produce on the full data; degenerate resolutions are excluded from the argmax.

type key_added:

str (default: 'leiden')

param key_added:

adata.obs column to write the chosen resolution’s labels to.

type random_state:

int (default: 0)

param random_state:

Seed for subsample selection and Leiden on the real data.

type verbose:

bool (default: True)

param verbose:

Stream per-resolution scores during the search.

returns:

(adata, best_resolution, scores_df). scores_df is indexed by resolution with columns stability_real, stability_null, excess_stability, std_real, n_clusters. Also writes adata.obs[key_added] and adata.uns['autoResolution'].

rtype:

Tuple[anndata.AnnData, float, pandas.DataFrame]

References

Lange, Roth, Braun, Buhmann. “Stability-based validation of clustering solutions.” Neural Computation 16(6):1299–1323, 2004.

Weir, Emmons, Wakefield, Hopkins, Mucha. “Post-processing partitions to identify domains of modularity optimization.” Algorithms 10(3):93, 2017 (used when method='champ').