Consensus annotation with CellVote — PBMC3k#

This tutorial runs ov.single.CellVote to combine five independent cell type annotation methods into a single consensus label per cluster, addressing issue #694 (updated CellVote to pick the best celltype).

The input adata already has five annotation columns produced by the two upstream tutorials:

source notebook

obs columns added

t_anno_noref.ipynb (reference-free)

celltypist_prediction, scsa_prediction, gpt4celltype_prediction

t_anno_ref.ipynb (reference-based)

harmony_prediction, scVI_prediction

CellVote then takes a per-cluster majority within each of these five columns and asks an LLM to arbitrate the final label using the cluster’s top marker genes as context.

Setup: API key for the arbitration step#

CellVote calls an LLM in the last step to pick between candidate labels. Any OpenAI-compatible chat endpoint works. Sign up at one of the providers below, export your key as AGI_API_KEY before launching Jupyter, and configure provider / base_url / model on the vote() call.

Provider

Sign up / get a key

Recommended provider / model

DeepSeek (cheap, used here)

https://platform.deepseek.com/api_keys

provider="openai", base_url="https://api.deepseek.com/v1", model="deepseek-chat"

OpenAI

https://platform.openai.com/api-keys

provider="openai", model="gpt-4o-mini"

Qwen / Aliyun DashScope

https://help.aliyun.com/zh/dashscope/developer-reference/get-api-key

provider="qwen", model="qwen-plus"

Kimi / Moonshot

https://platform.moonshot.cn/console/api-keys

provider="kimi", model="moonshot-v1-8k"

# in your shell, before launching jupyter:
export AGI_API_KEY=sk-your-actual-key
import os

# `AGI_API_KEY` is the env var omicverse reads for the LLM arbitration step.
# Export it in your shell before launching Jupyter (see the section above).
assert os.environ.get('AGI_API_KEY'), (
    'AGI_API_KEY is not set. See the previous cell for provider links.'
)

import anndata as ad
import scanpy as sc
import omicverse as ov

ov.plot_set(font_path='Arial')
🔬 Starting plot initialization...
Using already downloaded Arial font from: /tmp/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
    • [CUDA 0] NVIDIA H100 80GB HBM3
      Memory: 79.1 GB | Compute: 9.0

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.3rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

1) Load the pre-annotated pbmc3k#

data/pbmc3k_5anno.h5ad was produced by running the noref + ref tutorials end-to-end (celltypist + scsa + gpt4celltype + harmony + scVI) and merging their outputs on shared cell barcodes.

adata = ad.read_h5ad('data/pbmc3k_5anno.h5ad')
print(adata)
print()
anno_cols = ['celltypist_prediction', 'scsa_prediction', 'gpt4celltype_prediction',
             'harmony_prediction', 'scVI_prediction']
for c in anno_cols:
    print(f'  {c}: {adata.obs[c].nunique()} unique values')
AnnData object with n_obs × n_vars = 2562 × 2000
    obs: 'nUMIs', 'mito_perc', 'ribo_perc', 'hb_perc', 'detected_genes', 'cell_complexity', 'n_counts', 'total_counts', 'n_genes', 'n_genes_by_counts', 'pct_counts_mt', 'pct_counts_ribo', 'pct_counts_hb', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'predicted_doublet', 'doublet_score', 'scdblfinder_doublet', 'scdblfinder_score', 'leiden', 'celltypist_prediction', 'scsa_prediction', 'gpt4celltype_prediction', 'harmony_prediction', 'scVI_prediction'
    var: 'gene_ids', 'mt', 'ribo', 'hb', 'n_cells', 'percent_cells', 'robust', 'highly_variable_features', 'means', 'variances', 'residual_variances', 'highly_variable_rank', 'highly_variable'
    uns: 'REFERENCE_MANU', '_ov_provenance', 'history_log', 'hvg', 'leiden', 'log1p', 'neighbors', 'over_clustering', 'pca', 'rank_genes_groups', 'scaled|original|cum_sum_eigenvalues', 'scaled|original|pca_var_ratios', 'status', 'status_args', 'umap'
    obsm: 'X_pca', 'X_umap', 'celltypist_decision_matrix', 'celltypist_probability_matrix', 'scaled|original|X_pca'
    varm: 'PCs', 'scaled|original|pca_loadings'
    layers: 'counts', 'scaled'
    obsp: 'connectivities', 'distances'

  celltypist_prediction: 11 unique values
  scsa_prediction: 7 unique values
  gpt4celltype_prediction: 8 unique values
  harmony_prediction: 18 unique values
  scVI_prediction: 17 unique values

2) Compare the five annotations on UMAP#

ov.pl.umap(
    adata,
    color=['leiden'] + anno_cols,
    frameon='small', palette=ov.pl.sc_color, ncols=4,
)
X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False
../_images/aefd5fd7f6c2747bf90ec76cfde7ce0c3ca56ed870b03d9e53d7413332fe511f.png

3) Compute marker genes per leiden cluster#

CellVote uses the top markers as context when arbitrating between candidate labels. ov.single.get_celltype_marker returns a dict[cluster_id -> list[str]].

marker_dict = ov.single.get_celltype_marker(
    adata, clustertype='leiden', topgenenumber=10,
    foldchange=1.5, pval_cutoff=0.05,
)
for cid, genes in list(marker_dict.items())[:3]:
    print(f'  leiden={cid}: {genes[:6]}')
...get cell type marker
leiden=0: ['CCR7', 'EPHX2', 'PRKCQ-AS1', 'FHIT', 'LEF1', 'NELL2']
  leiden=1: ['CRIP2', 'CD2', 'AQP3', 'CD40LG', 'LTB', 'TRAT1']
  leiden=10: ['PF4', 'PPBP', 'SDPR', 'CALM3', 'RGS10', 'GPX1']

4) Run CellVote with all five annotation columns#

CellVote.vote() does two things:

  1. Per cluster, per method — picks the most-frequent label from that method’s annotation column. This yields one candidate per method (so up to 5 candidates per cluster here).

  2. LLM arbitration — sends {candidates, top markers, species, tissue} to the configured chat model and asks it to choose the most plausible identity. The result is written back to adata.obs[result_key] (default CellVote_celltype).

cv = ov.single.CellVote(adata)

result_dict = cv.vote(
    clusters_key='leiden',
    cluster_markers=marker_dict,
    celltype_keys=anno_cols,         # ← all 5 annotation sources
    provider='openai',                # use the OpenAI-compatible client
    base_url='https://api.deepseek.com/v1',
    model='deepseek-chat',
    species='human',
    organization='PBMC',
)
print()
print('CellVote consensus per cluster:')
for cid, label in sorted(result_dict.items(), key=lambda x: int(x[0]) if str(x[0]).isdigit() else x[0]):
    print(f'  leiden={cid}: {label}')
CellVote consensus per cluster:
  leiden=0: tcm/naive helper t cell
  leiden=1: cd4-positive, alpha-beta memory t cell
  leiden=2: cd14-positive monocyte
  leiden=3: b cell
  leiden=4: cd8-positive, alpha-beta memory t cell
  leiden=5: non-classical monocyte
  leiden=6: cd16+ nk cell
  leiden=7: cd8-positive, alpha-beta memory t cell
  leiden=8: cd14-positive monocyte
  leiden=9: conventional dendritic cell
  leiden=10: megakaryocyte/platelet

5) Final consensus + per-cluster confidence#

# layout='flow' guarantees no panel-to-panel legend overlap
ov.pl.umap(
    adata,
    color=['leiden', 'CellVote_celltype'] + anno_cols,
    frameon='small', palette=ov.pl.sc_color, ncols=4,
    layout='flow',
)
X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False
../_images/23bdc122869e3586141a6656e6a89141e400d709057a3332afe8efcfb189d56c.png

6) CellVote confidence (per-cluster consensus score)#

CellVote.vote() now writes a per-cell confidence column adata.obs['CellVote_celltype_confidence'] and a per-cluster score table to adata.uns['CellVote_celltype_score_table'].

The score is built from three metrics on the 5 input methods:

  • n_unique — distinct labels after normalisation (1..5)

  • plurality — fraction of methods agreeing with the most common label

  • vote_agreement — fraction of methods whose label is semantically consistent with the final CellVote pick (token-set Jaccard ≥ 0.34, with synonym + plural normalisation)

  • confidence(plurality + vote_agreement) / 2 ∈ [0, 1]

Clusters with confidence < 0.6 are the ones most worth a manual look.

import pandas as pd

score_df = pd.DataFrame(adata.uns['CellVote_celltype_score_table'])
score_df = score_df.sort_values('confidence', ascending=False)
score_df
cluster n_cells cellvote_label n_unique plurality vote_agreement confidence methods_supporting
3 3 328 b cell 1 1.0 1.0 1.0 5/5
10 10 16 megakaryocyte/platelet 2 0.8 1.0 0.9 5/5
2 2 376 cd14-positive monocyte 2 0.6 1.0 0.8 5/5
8 8 80 cd14-positive monocyte 2 0.6 1.0 0.8 5/5
9 9 38 conventional dendritic cell 2 0.6 1.0 0.8 5/5
5 5 165 non-classical monocyte 3 0.6 0.8 0.7 4/5
0 0 545 tcm/naive helper t cell 4 0.4 0.8 0.6 4/5
1 1 513 cd4-positive, alpha-beta memory t cell 4 0.4 0.6 0.5 3/5
4 4 283 cd8-positive, alpha-beta memory t cell 4 0.4 0.6 0.5 3/5
6 6 128 cd16+ nk cell 3 0.6 0.4 0.5 2/5
7 7 90 cd8-positive, alpha-beta memory t cell 4 0.4 0.6 0.5 3/5
# Visualise where consensus is strong vs weak
ov.pl.umap(
    adata,
    color=['CellVote_celltype', 'CellVote_celltype_confidence'],
    frameon='small', palette=ov.pl.sc_color, ncols=4,
    cmap='Reds',
    layout='flow',
)
X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False
../_images/1abf2df42767fcab4d43c10cf69287a5f9b19f2f94a1ff8cf3240d010157a1a1.png

7) Per-cluster consensus table (5 methods + final + confidence)#

cols_for_table = anno_cols + ['CellVote_celltype']
summary = (
    adata.obs.groupby('leiden', observed=True)[cols_for_table]
    .agg(lambda s: s.value_counts().index[0] if len(s) else '')
)
summary['n_cells'] = adata.obs.groupby('leiden', observed=True).size()
summary['confidence'] = pd.DataFrame(
    adata.uns['CellVote_celltype_score_table']
).set_index('cluster')['confidence']
summary = summary[['n_cells'] + cols_for_table + ['confidence']]
summary
n_cells celltypist_prediction scsa_prediction gpt4celltype_prediction harmony_prediction scVI_prediction CellVote_celltype confidence
leiden
0 545 Tcm/Naive helper T cells T cell naive CD4+ T cell naive T cell naive T cell Tcm/naive helper t cell 0.6
1 513 Tcm/Naive helper T cells T cell CD4+ T cell CD4-positive, alpha-beta memory T cell CD4-positive, alpha-beta memory T cell Cd4-positive, alpha-beta memory t cell 0.5
2 376 Classical monocytes Monocyte monocyte CD14-positive monocyte CD14-positive monocyte Cd14-positive monocyte 0.8
3 328 B cells B cell B cell B cell B cell B cell 1.0
4 283 Tem/Trm cytotoxic T cells Natural killer cell CD8+ T cell CD8-positive, alpha-beta memory T cell CD8-positive, alpha-beta memory T cell Cd8-positive, alpha-beta memory t cell 0.5
5 165 Non-classical monocytes Monocyte NK cell monocyte monocyte Non-classical monocyte 0.7
6 128 CD16+ NK cells Natural killer cell NK cell natural killer cell natural killer cell Cd16+ nk cell 0.5
7 90 Tem/Trm cytotoxic T cells Activated CD8+ T cell CD8+ T cell CD8-positive, alpha-beta memory T cell CD8-positive, alpha-beta memory T cell Cd8-positive, alpha-beta memory t cell 0.5
8 80 Classical monocytes Monocyte monocyte CD14-positive monocyte CD14-positive monocyte Cd14-positive monocyte 0.8
9 38 DC Dendritic cell dendritic cell conventional dendritic cell conventional dendritic cell Conventional dendritic cell 0.8
10 16 Megakaryocytes/platelets Megakaryocyte platelet platelet platelet Megakaryocyte/platelet 0.9

8) Notes#

  • Replace provider/base_url/model with any OpenAI-compatible chat endpoint (openai, kimi, qwen, doubao, or a self-hosted model).

  • result_key overrides the output column name if you want to keep multiple CellVote runs side-by-side. The confidence column is named f"{result_key}_confidence" accordingly.

  • cluster_markers is required — the LLM uses gene context to break ties between candidate labels.

  • The confidence score is purely token-level (no extra LLM call); if you want stricter ontology-aware scoring, swap the threshold or run the cellvote_consensus_score() helper with a custom jaccard_threshold.