omicverse.space.nmf_tissue_zones#
- omicverse.space.nmf_tissue_zones(adata, obsm_key='q05_cell_abundance_w_sf', n_factors=10, *, cell_type_names=None, top_k=5, obsm_added='X_tissue_zones', factor_prefix='zone', normalize=None, init='nndsvd', max_iter=500, tol=0.0001, seed=0)[source]#
Discover tissue zones via NMF on a per-spot cell-abundance matrix.
- Parameters:
adata (
AnnData) – Spatial AnnData with a cell-abundance matrix inadata.obsm.obsm_key (
str(default:'q05_cell_abundance_w_sf')) – Where to read the cell-abundance matrix from. Cell2location writesq05_cell_abundance_w_sf(q05 posterior estimate of absolute abundance × size factor) — the recommended input for downstream interpretation. Other deconvolvers may use different keys; pass whatever 2-D float arrayadata.obsmentry you want to factorise.n_factors (
int(default:10)) – Number of tissue zones to extract. Cell2location’s tutorial recommends trying several values (n_fact = [5, 10, 15, 20]) and picking the one that separates biology best. Start with around the number of cell types you expect to cluster together.cell_type_names (
Optional[Sequence[str]] (default:None)) – Explicit names for the cell-type axis. When omitted we readadata.uns[f'{obsm_key}_names'](Cell2location writes this) or fall back tocell_type_0 ... cell_type_{C-1}.top_k (
int(default:5)) – For each factor, report thetop_kcell types with the highest loadings inresult.factor_top_cell_types. Default 5.obsm_added (
str(default:'X_tissue_zones')) – Where to write the per-spot factor activations. DefaultX_tissue_zones. Also written back toadata.obsm[obsm_added]so downstream plotting (Scanpysc.pl.spatial,ov.pl.embedding) can colour spots by zone directly.factor_prefix (
str(default:'zone')) – Prefix for factor names; the i-th factor isf'{factor_prefix}_{i+1}'.normalize (
Optional[str] (default:None)) –None(default) to feed the matrix straight to NMF, or'rows'to row-sum-normalise each spot to 1 first — recommended whenobsm_keycarries mapping probabilities or any non-abundance quantity where per-spot total-signal is not biologically meaningful (for example Tangram’stangram_ct_pred). Skip for Cell2location’sq05_cell_abundance_w_sf, which is an absolute-abundance scale and should be fed as-is so inter-spot abundance differences are preserved.init (
str(default:'nndsvd')) – Forwarded tosklearn.decomposition.NMF.init='nndsvd'gives deterministic initialisation;'random'+seedgives multiple-restart behaviour if you loop manually.max_iter (
int(default:500)) – Forwarded tosklearn.decomposition.NMF.init='nndsvd'gives deterministic initialisation;'random'+seedgives multiple-restart behaviour if you loop manually.tol (
float(default:0.0001)) – Forwarded tosklearn.decomposition.NMF.init='nndsvd'gives deterministic initialisation;'random'+seedgives multiple-restart behaviour if you loop manually.seed (
int(default:0)) – Forwarded tosklearn.decomposition.NMF.init='nndsvd'gives deterministic initialisation;'random'+seedgives multiple-restart behaviour if you loop manually.
- Returns:
See the dataclass for attribute details.
- Return type:
TissueZones
Notes
The abundance matrix must be non-negative. Most deconvolvers produce non-negative output; if your matrix has small negative values (numerical noise) they are clipped to 0 here.
This is a lightweight sibling of Cell2location’s
run_colocation(pyro-backed, Bayesian). The output is qualitatively equivalent — factor loadings and spot activations with the same interpretation — but the sklearn NMF runs in seconds and has no pyro/torch dependency.
References