omicverse.single.CellOntologyMapper#
- omicverse.single.CellOntologyMapper(cl_obo_file=None, embeddings_path=None, model_name='all-mpnet-base-v2', local_model_dir=None, auto_download=True)[source]#
Map free-text cell-type annotations to the Cell Ontology (CL) via NLP.
Sentence-transformer encoder over the CL terms; cosine-similarity matching of every annotated cell name to the closest ontology term. Optional add-ons:
Abbreviation expansion (LLM-driven, optional) — turns
"TIL-1"into"tissue-resident memory CD8+ T cell"before matching, dramatically improving recall on author-shorthand labels.Cell Taxonomy resource (Jin et al. 2023) — a second ontology keyed by species + tissue, with marker genes; provides
map_*_with_taxonomyvariants.Marker-gene search — find ontology terms whose marker set overlaps your cluster’s top-N markers (search_by_marker).
Lifecycle: construct with a CL OBO file (or download via
download_cl()); the encoder is loaded lazily on firstmap_*call. Pre-computed sentence embeddings can be cached viaembeddings_pathto skip the ~30 s encode step on repeated calls.Typical workflow#
>>> ov.single.download_cl(output_dir='cl_dir', filename='cl.json') >>> mapper = ov.single.CellOntologyMapper( ... cl_obo_file='cl_dir/cl.json', ... embeddings_path='cl_dir/ontology_embeddings.pkl', ... local_model_dir='./my_models', ... ) >>> results = mapper.map_adata(adata, cell_name_col='cell_label') >>> mapper.print_mapping_summary(results, top_n=15)
Adds (after
map_adata)#adata.obs['cell_ontology']— best-match CL term name.adata.obs['cell_ontology_cl_id']— CL ID (e.g.CL:0000084).adata.obs['cell_ontology_score']— cosine similarity.
Use
map_adata_with_expansion(...)when annotations contain abbreviations andsetup_llm_expansionhas been configured. Usemap_adata_with_taxonomy(...)when species + tissue context is needed (Cell Taxonomy adds species-aware mappings).