omicverse.metabol.biomarker_panel#
- omicverse.metabol.biomarker_panel(adata, *, group_col, features=10, classifier='rf', cv_outer=5, cv_inner=3, pos_group=None, neg_group=None, n_permutations=0, layer=None, seed=0)[source]#
Nested-CV evaluation of a multi-metabolite biomarker panel.
- Parameters:
group_col (
str) – Column inadata.obswith the class labels.features (
Union[list,int] (default:10)) – Either a list of metabolite names to use as-is, or an integerNto pre-screen the top-Nmetabolites by univariate AUC on the full dataset (note: this leaks test-fold information — see “Caveats” below). Default 10.classifier (
str(default:'rf')) –"rf"(RandomForest, default),"lr"(L2-logistic regression), or"svm"(RBF SVM).cv_outer (
int(default:5)) – Stratified K-fold counts. Default 5-outer × 3-inner.cv_inner (
int(default:3)) – Stratified K-fold counts. Default 5-outer × 3-inner.pos_group (
Optional[str] (default:None)) – Class labels to contrast.None→ first two unique values.neg_group (
Optional[str] (default:None)) – Class labels to contrast.None→ first two unique values.n_permutations (
int(default:0)) – If > 0, compute a permutation null by shuffling labels and re-running nested CV that many times. Reported aspermutation_pvalue. Expensive (each permutation costscv_outer × cv_innerfits).layer (
Optional[str] (default:None)) – AnnData layer name (defaultNone→adata.X).seed (
int(default:0)) – RNG / fold-assignment seed.
- Return type:
BiomarkerPanelResult- Returns:
BiomarkerPanelResult
Caveats
——-
When
featuresis an integer the pre-screen uses the fulldataset, which overestimates AUC on the same folds. For
publication-grade estimates pass an explicit feature list chosen
from an independent screening cohort, or pre-screen inside a
wrapper that repeats the full pipeline per fold.