omicverse.metabol.biomarker_panel

omicverse.metabol.biomarker_panel#

omicverse.metabol.biomarker_panel(adata, *, group_col, features=10, classifier='rf', cv_outer=5, cv_inner=3, pos_group=None, neg_group=None, n_permutations=0, layer=None, seed=0)[source]#

Nested-CV evaluation of a multi-metabolite biomarker panel.

Parameters:

group_col (str) – Column in adata.obs with the class labels.
features (Union[list, int] (default: 10)) – Either a list of metabolite names to use as-is, or an integer N to pre-screen the top-N metabolites by univariate AUC on the full dataset (note: this leaks test-fold information — see “Caveats” below). Default 10.
classifier (str (default: 'rf')) – "rf" (RandomForest, default), "lr" (L2-logistic regression), or "svm" (RBF SVM).
cv_outer (int (default: 5)) – Stratified K-fold counts. Default 5-outer × 3-inner.
cv_inner (int (default: 3)) – Stratified K-fold counts. Default 5-outer × 3-inner.
pos_group (Optional[str] (default: None)) – Class labels to contrast. None → first two unique values.
neg_group (Optional[str] (default: None)) – Class labels to contrast. None → first two unique values.
n_permutations (int (default: 0)) – If > 0, compute a permutation null by shuffling labels and re-running nested CV that many times. Reported as permutation_pvalue. Expensive (each permutation costs cv_outer × cv_inner fits).
layer (Optional[str] (default: None)) – AnnData layer name (default None → adata.X).
seed (int (default: 0)) – RNG / fold-assignment seed.

Return type:

BiomarkerPanelResult

Returns:

BiomarkerPanelResult
Caveats
——-
When features is an integer the pre-screen uses the full
dataset, which overestimates AUC on the same folds. For
publication-grade estimates pass an explicit feature list chosen
from an independent screening cohort, or pre-screen inside a
wrapper that repeats the full pipeline per fold.

omicverse.metabol.biomarker_panel

Contents

omicverse.metabol.biomarker_panel#