MetaCell

MetaCell#

Metacells are small, transcriptionally homogeneous groups of single cells that are treated as the unit of analysis instead of individual cells. They denoise sparse single-cell counts while preserving cell-state granularity — unlike Leiden clusters (too coarse for state-level analysis) or single cells (too sparse for many downstream tools).

This section has three layers:

Layer	Tutorial	When
1. Recommended workflow	t_metacell_recommended	Day-one user. Run the recommended backend (SEACells, soft membership) end-to-end and drive a downstream pipeline (DEG, pseudobulk, marker dotplot).
2. Multi-sample workflow	t_metacell_multisample	You have ≥2 batches / donors / conditions. Build per-sample-aware metacells on a Harmony-corrected embedding.
3. Backend zoo	zoo/index	You want to compare all 7 backends side-by-side on your own data (`ov.single.compare_metacell_backends`), or read why each method exists.

Metacell vs pseudobulk — what’s the difference?#

Both metacells and pseudobulk produce aggregated count profiles, but they answer different questions and have different statistical properties.

	Pseudobulk	Metacell
Granularity	One profile per sample × celltype (or sample × cluster) — usually 5–50 profiles total.	One profile per metacell — typically `N // 50` profiles, i.e. hundreds to thousands.
Aggregation key	Pre-existing labels (sample, celltype).	Learned partition based on transcriptional similarity (graph / archetype / VQ-VAE / …).
Within-group purity	Whatever the labels imply (often messy — “Beta cells from sample 3” still has substate variation).	Optimized to be transcriptionally homogeneous — each metacell ≈ one cell state.
Sample / batch awareness	Native — sample is the aggregation key.	Optional — most backends are sample-agnostic by default; multi-sample workflows need a corrected embedding (see t_metacell_multisample).
What it’s good for	Cohort-level DE (DESeq2 / edgeR / limma): “does gene X change between healthy and IBD donors averaged over T cells?”	State-level analysis with denoised counts: cell-cell communication, GRN inference, RNA velocity, marker discovery, trajectory smoothing.
What it’s NOT good for	Within-celltype state granularity, trajectory inference, cell-cell communication.	Cohort-level DE (you’d be testing thousands of “samples”, inflating power and breaking the variance model).
Typical N profiles	~10s	~100s–1000s

Rule of thumb: if your statistical model is expression ~ condition and treats samples as the unit of replication, you want pseudobulk. If your model is “give me per-cell-state expression but with less noise”, you want metacell.

The two are also composable: you can pseudobulk metacells per sample (e.g. for cross-sample DE within a metacell type), or compute metacells within each sample independently and then concatenate. omicverse’s t_metacell_multisample shows the latter.

Architecture#

ov.single.MetaCell(adata, method=...) dispatches to seven backends, each writing a unified AnnData schema:

adata.obs['metacell_id']      categorical — universal
adata.obs['metacell_conf']    float       — universal
adata.obsm['X_metacell']      latent      — when backend has 'latent' capability
adata.obsm['metacell_soft']   sparse      — when backend has 'soft' capability
adata.uns['metacell']         metadata    — method, n_metacells, runtime, ...

Downstream tools (CellPhoneDB / LIANA / SCENIC / DESeq2) consume any backend’s output via this schema — you never need an if method == ... branch.

The seven backends, with their differentiating capability:

seacells — soft kernel archetypal analysis (Persad 2023, Nat Biotech). Default recommendation.
metaq — VQ-VAE codebook with closed-form out-of-sample projection (Li 2025, Nat Comms). Use when new samples will arrive after the metacell map is built.
supercell — kNN + walktrap with graining hierarchy cache (Bilous 2022, BMC Bioinf).
kmeans — sklearn baseline (fast, codebook, out-of-sample).
random — honest lower bound.
geosketch — density-aware sketching (Hie 2019, Cell Systems).
mc2 — divide-and-conquer (Ben-Kiki 2022, Genome Biology). pip install metacells.

See zoo/index for the full per-backend tour.

MetaCell

Contents

MetaCell#

Metacell vs pseudobulk — what’s the difference?#

Architecture#