User API#
Import OmicVerse as:
import omicverse as ov
This page is auto-generated from @register_function entries in the OmicVerse registry.
Public registry entries listed here: 414
Top-Level API#
Generate a standardized reference table from |
Settings#
Initialize CPU-GPU mixed mode for accelerated single-cell analysis. |
|
Initialize GPU mode with RAPIDS for accelerated single-cell analysis. |
Data IO#
Load serialized Python object from disk. |
|
Read common omics file formats into AnnData or pandas DataFrame. |
|
Read a 10x Genomics HDF5 matrix file. |
|
Read a 10x Genomics Matrix Market directory. |
|
Read a CSV / TSV file via |
|
Read an |
|
Read Nanostring formatted dataset. |
|
Read 10x Visium HD outputs with a single entry point. |
|
Read Visium HD bin-level output and attach spatial metadata. |
|
Read Visium HD cell-segmentation output and attach geometries + spatial metadata. |
|
Read a 10x Xenium |
|
Save Python object to file using pickle fallback strategy. |
|
Read 10x-Genomics-formatted Visium dataset. |
Alignment#
Run the full 16S amplicon pipeline. |
|
Compose |
|
Build a phylogenetic tree end-to-end. |
|
Run a complete bulk RNA-seq pipeline from SRA accessions or local FASTQs. |
|
Quantify expression matrices from FASTQ files via |
|
Run cutadapt to remove amplicon PCR primers. |
|
Run pydada2 end-to-end and return an AnnData. |
|
Run fastp QC. |
|
Run FastTree to infer a phylogenetic tree. |
|
Run featureCounts on BAM files. |
|
Alias for |
|
Alias for |
|
Download a SINTAX-formatted 16S reference FASTA. |
|
Convert SRA accessions to FASTQ. |
|
Run MAFFT multiple sequence alignment. |
|
Download SRA data in parallel using parallel-fastq-dump. |
|
Prefetch SRA accessions with validation. |
|
Build kallisto index and transcript-to-gene mapping files via |
|
Run STAR alignment. |
Preprocessing (pp)#
Migrate AnnData objects from GPU back to CPU memory after analysis. |
|
Migrate AnnData objects to GPU memory for accelerated processing. |
|
Infer a size factor from one normalized expression vector. |
|
Pick the modularity-stablest Leiden partition via CHAMP (Weir et al. 2017). |
|
Filter cell outliers based on counts and numbers of genes expressed. |
|
Filter genes based on number of cells or counts. |
|
Select highly variable features (HVF/HVG) for downstream modeling. |
|
Annotate highly variable genes (Satija 2015 / Zheng 2017 / Stuart 2019). |
|
Identify robust genes for downstream HVG selection. |
|
leiden clustering |
|
Log-transform expression values with |
|
Run Louvain clustering on the precomputed kNN graph. |
|
Run MDE (Minimum Distortion Embedding) from a latent representation. |
|
Compute a neighborhood graph of observations [McInnes18]. |
|
Normalize a count matrix using analytic Pearson residuals (Lause 2021). |
|
Performs Principal Component Analysis (PCA) on the data stored in a scanpy AnnData object. |
|
Preprocesses the AnnData object adata using either a scanpy or a pearson residuals workflow for size normalization and highly variable genes (HVGs) selection, and calculates signature scores if necessary. |
|
Perform quality control on a dictionary of AnnData objects. |
|
Given log-normalized gene expression data, recover the raw read/UMI counts by inferring the unknown size factors. |
|
Regress out technical covariates ( |
|
Scale the regressed layer and store it as a new analysis layer. |
|
Remove cell-cycle-correlated genes from |
|
Scale the input AnnData object. |
|
Score cell cycle phases using predefined or custom gene sets. |
|
Predict cell doublets using Scrublet with optional GPU acceleration. |
|
Simulate synthetic doublets from random cell pairs. |
|
Select highly variable features with the Pegasus strategy. |
|
SUDE (Scalable Unsupervised Dimensionality reduction via Embedding) dimensionality reduction. |
|
Compute t-SNE coordinates for cells, dispatching by |
|
Compute UMAP embedding, dispatching to the best backend for |
Single-cell (single)#
Unified single-cell annotation manager for cell-type labeling. |
|
Reference-based label transfer helper for single-cell annotation. |
|
Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004). |
|
Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004). |
|
Run batch-effect correction for single-cell data integration. |
|
Map free-text cell-type annotations to the Cell Ontology (CL) via NLP. |
|
Ensemble cell-type annotation manager with multiple backends. |
|
Consensus NMF workflow wrapper for robust gene-program discovery. |
|
Convert a human-symbol interaction network to mouse symbols. |
|
Identify cluster-specific marker genes with COSG. |
|
Predict developmental potency with CytoTRACE2. |
|
Differential cell-type abundance testing wrapper. |
|
Differential gene-expression testing wrapper for single-cell datasets. |
|
Download CellPhoneDB database with fallback URLs. |
|
📥 Download Cell Ontology file from multiple sources with automatic fallback |
|
Predict drug sensitivity from single-cell transcriptomes using CaDRReS models. |
|
Fit GAM-based pseudotime trends for one or more datasets or groups. |
|
Score MOFA-factor enrichment across annotated groups. |
|
Add MOFA latent factors from model file into |
|
Adaptive ridge-regression framework for pseudotime-associated gene discovery. |
|
Find marker genes for each cluster / group in single-cell data. |
|
Format LIANA results into the communication AnnData expected by |
|
Fit and visualize smooth feature trends along pseudotime. |
|
Generate a MultiQC-style HTML report for single-cell RNA-seq analysis. |
|
Calculate the AUC-ell score for a given gene set. |
|
Get marker genes for each cluster/cell type. |
|
Resolve one final cell type for each cluster with LLM calls. |
|
Extract top marker genes from |
|
Transfer per-cell annotations/statistics to metacells. |
|
Extract feature loadings for one factor from a MOFA model. |
|
Pair RNA and ATAC cells using GLUE latent embeddings and neighbor matching. |
|
Annotate cluster cell types with a remote LLM service. |
|
Annotate cell types with a local instruction-tuned LLM. |
|
Run a one-click single-cell analysis pipeline with resumable steps. |
|
Load one of the packaged human prior interaction networks. |
|
Unified metacell wrapper with dispatchable backends. |
|
MetaTiME wrapper for tumor microenvironment cell-state annotation. |
|
Monocle2-style single-cell trajectory analysis. |
|
Load Nestorowa16 mouse HSC reference data used by CEFCON. |
|
Calculate the area under the curve (AUC) for a set of pathways in an AnnData object. |
|
Enrich cell annotations with pathway activity scores using the AUC-ell method. |
|
Perform pathway enrichment analysis on gene expression data. |
|
Visualize the pathway enrichment analysis results as a heatmap. |
|
Plot metacell centroids on a given embedding axis. |
|
CEFCON workflow wrapper for driver-regulator discovery. |
|
Train MOFA models for latent factor discovery across multiple omics layers. |
|
Load pretrained MOFA models for downstream factor interpretation. |
|
Automated cell-type annotation using SCSA marker-enrichment scoring. |
|
SIMBA wrapper for single-cell batch integration and graph-embedding construction. |
|
TOSICA wrapper for pathway-informed transformer-based cell-type annotation. |
|
Run CellPhoneDB statistical analysis with automatic database download |
|
Run LIANA ligand-receptor inference on an AnnData object. |
|
Add cell type annotation from dict to anndata object. |
|
Trajectory inference class for single-cell data analysis. |
|
RNA velocity analysis wrapper for directional cell-state transition inference. |
Bulk RNA-seq (bulk)#
Perform batch effect correction using ComBat algorithm. |
|
Bulk RNA-seq deconvolution class for inferring cell-type fractions from single-cell references. |
|
Perform pathway enrichment analysis using Enrichr-compatible gene-set libraries. |
|
Plot enrichment results as a bubble plot. |
|
Plot multiple enrichment result tables in a unified dot-clustermap panel. |
|
Map gene IDs in the input data to gene symbols using a reference table. |
|
Differential-expression analysis helper for bulk RNA-seq count tables. |
|
Gene Set Enrichment Analysis (GSEA) wrapper for ranked gene lists. |
|
Protein-protein interaction (PPI) analysis wrapper based on STRING. |
|
TCGA (The Cancer Genome Atlas) data analysis module. |
|
Weighted Gene Co-expression Network Analysis. |
|
Load a previously saved WGCNA object from disk. |
|
Analyze protein-protein interaction network using STRING database. |
Metabolomics (metabol)#
Collapse the matrix to class-level totals. |
|
Parse each |
|
Map a list of m/z peaks to candidate KEGG compounds via adduct search. |
|
Per-metabolite test across 3+ groups. |
|
ASCA — ANOVA-Simultaneous Component Analysis (Smilde 2005). |
|
Horizontal bars of per-effect variance-explained fractions. |
|
Nested-CV evaluation of a multi-metabolite biomarker panel. |
|
Drop features whose sample-mean intensity isn't at least ``ratio``× the blank-mean intensity. |
|
Pairwise metabolite correlation network within a single condition. |
|
Draw an edge DataFrame as a NetworkX spring-layout plot. |
|
Drop features with coefficient-of-variation above |
|
Differential correlation between two groups. |
|
Bar chart of DC-class counts (+/+, +/0, +/-, -/+, ...). |
|
Run a univariate two-group test across all metabolites. |
|
Correct systematic signal drift using LOESS regression on QC samples. |
|
Build a compound master table from ChEBI's flat-file TSVs. |
|
Resolve a metabolite name → HMDB / KEGG / ChEBI / PubChem CID. |
|
Fetch the full KEGG compound→pathway map via KEGG REST. |
|
Fetch the full LION lipid↔ontology associations. |
|
Impute missing values (NaN / 0) in |
|
LION-style over-representation for lipid classes / properties. |
|
Return |
|
Resolve metabolite names to external database IDs. |
|
Per-feature Hotelling T-squared for two-group time-course comparison. |
|
Per-feature |
|
GSEA-style ranked enrichment via |
|
Over-representation analysis via Fisher's exact test. |
|
Pure-Python mummichog — pathway enrichment from m/z peaks. |
|
Normalize each sample (row) of |
|
Orthogonal Projection to Latent Structures — Discriminant Analysis. |
|
Parse a LIPID MAPS-shorthand lipid name. |
|
Horizontal bar chart of pathway enrichment p-values. |
|
Dot plot of pathway enrichment — the de-facto standard figure. |
|
Partial Least Squares Discriminant Analysis (wraps sklearn PLS). |
|
Lifecycle class for a metabolomics analysis. |
|
Load an LC-MS peak table with |
|
Load a MetaboAnalyst-format CSV into AnnData. |
|
Load a generic wide (samples × metabolites) table into AnnData. |
|
Per-feature AUC for a binary class. |
|
Train MOFA+ on sample-aligned metabolomics + other-omics views. |
|
OPLS-DA S-plot: p(cov) vs p(corr), i.e. covariance vs correlation between each feature and the predictive component. |
|
Hotelling T-squared + DModX sample-level outlier detection. |
|
Scatter of Hotelling T² vs DModX with critical-value lines. |
|
SERRF — QC-based Random Forest drift correction (Fan 2019). |
|
Apply a feature-level transformation to |
|
Horizontal bar chart of top- |
|
Metabolomics volcano plot — log2FC vs -log10(padj) (or pvalue). |
Microbiome (micro)#
Compute and store per-sample alpha-diversity metrics on |
|
Attach a phylogenetic tree to |
|
Compute sample × sample distance matrices. |
|
CLR transform: |
|
Collapse ASVs to a taxonomic rank. |
|
Stitch a list of per-study AnnDatas into a single cross-cohort table. |
|
Per-feature differential abundance across sample groups. |
|
Download + parse the Franzosa et al. 2019 paired IBD dataset. |
|
Filter rare features by prevalence. |
|
ILR transform — orthonormal coordinate system after closure removal. |
|
Per-study DA + inverse-variance meta-analysis. |
|
MMvec (Morton et al. 2019) in ~80 lines of PyTorch. |
|
Reduce a sample × sample distance matrix to 2-D / 3-D coords. |
|
Run sklearn CCA on the paired tables. |
|
Rank correlation between every (microbe, metabolite) pair. |
|
Biplot of microbe + metabolite embeddings in the MMvec latent space. |
|
Training (and validation) loss curve for a fitted |
|
Rarefy counts to a common depth. |
|
Build a paired microbe + metabolite cohort with planted producer pairs. |
Spatial transcriptomics (space)#
Aggregate binned Visium signals into cell-level profiles. |
|
Construct spatial neighbor networks for spatial integration. |
|
Build a marker-gene signature table for each cell type in a reference scRNA-seq dataset. |
|
CAST (Cell Annotation for Spatial Transcriptomics) embedding for multiple spatial samples. |
|
Run a minimal CellCharter workflow on a spatial AnnData object. |
|
SpatRio CellLoc class for probabilistic cell localization. |
|
SpatRio CellMap class for mapping single cells to spatial coordinates. |
|
Perform clustering analysis on spatial transcriptomics data using multiple methods. |
|
Build a CellChat-style communication AnnData from commot outputs. |
|
Crop Visium spatial data to a specific region of interest. |
|
Spatial deconvolution pipeline that aligns scRNA-seq references with spatial transcriptomics. |
|
GASTON spatial depth estimation and clustering. |
|
Automatically map and align spatial transcriptomics data. |
|
Manually adjust spatial transcriptomics data alignment. |
|
Merge clusters based on hierarchical clustering of their representation. |
|
Compute Moran's I spatial autocorrelation for gene expression. |
|
Discover tissue zones via NMF on a per-spot cell-abundance matrix. |
|
SpaceFlow spatial flow analysis class. |
|
A class representing the PyTorch implementation of STAGATE (Spatial Transcriptomics Analysis using Graph Attention autoEncoder). |
|
STAligner for spatial transcriptomics data integration. |
|
Read and standardize 10x Visium data with bin2cell-compatible loader. |
|
Rotate Visium spatial data image and coordinates by a specified angle. |
|
Merge primary and secondary segmentation labels. |
|
Compute spatial autocorrelation statistics for gene expression. |
|
Build a spatial neighborhood graph from coordinates stored in |
|
Spatial Transition Tensor (STT) analysis class. |
|
Identify spatially variable genes using multiple methods. |
|
Synchronize |
|
Tangram spatial deconvolution class for cell type mapping. |
|
Update communication interaction annotations from commot database metadata. |
|
Expand segmentation labels from nuclei to nearby bins. |
|
Run expression-image segmentation and map labels back to spatial bins. |
|
Convert Visium 10x data to cell-level data. |
Bulk-to-Single (bulk2single)#
VAE-based bulk-to-single framework for reconstructing pseudo single cells from bulk RNA-seq. |
|
Plot cell-type proportions in generated single-cell data. |
|
Plot correlation matrix between reference and generated single-cell data. |
|
Integrate bulk and single-cell information to infer transitional cell-state trajectories. |
|
Deep-learning mapper that projects single-cell profiles onto spatial coordinates. |
Plotting (pl)#
Add KDE-based density contours to an existing matplotlib plot. |
|
Add p-value annotation with connecting line to a matplotlib plot. |
|
Overlay per-spot pie charts of cell-type composition on a spatial map. |
|
Overlay velocity streamlines on a low-dimensional embedding. |
|
Render a branch-aware pseudotime stream plot. |
|
Create a combined bar-and-dot summary plot by groups. |
|
Create a boxplot with jittered points to visualize data distribution across categories. |
|
Render a branch-aware pseudotime stream plot. |
|
Calculate weighted kernel density estimates for gene expression on 2D embeddings. |
|
Plot communication matrices as heatmaps, dot plots, or bubble maps. |
|
Plot cell-cell communication networks with multiple graph styles. |
|
Plot communication summaries, distributions, and pathway statistics. |
|
Compute pairwise correlation/similarity between cell groups and plot as heatmap. |
|
Visualization helper for CellPhoneDB cell-cell communication outputs. |
|
Plot cell proportion of each cell type in each visual cluster. |
|
Generate a complex annotated heatmap using PyComplexHeatmap. |
|
Overlay a KDE contour for selected clusters on embedding axes. |
|
Plot the ConvexHull for a cluster in embedding. |
|
Build a transparent-to-opaque LinearSegmentedColormap of a single colour. |
|
Make a dot plot of the expression values of var_names. |
|
Plot dynamic feature trends along pseudotime, optionally by lineage. |
|
Plot GAM-fitted pseudotime trends for one or more genes. |
|
Scatter plot for user specified embedding basis (e.g. umap, pca, etc). |
|
Get locations of cluster median and adjust text labels accordingly. |
|
Render large-scale embeddings with Datashader. |
|
Plot embedding with celltype color by omicverse. |
|
Plot cluster-specific density on an existing embedding. |
|
Plot cell-level feature expression ordered by groups or metadata. |
|
Forbidden City traditional-color palette utility. |
|
Add cluster labels at median positions in embedding plots with automatic text positioning. |
|
Build cluster-wise gene-set word clouds along pseudotime. |
|
Plot grouped mean expression as a Marsilea heatmap. |
|
Create a dot plot heatmap showing marker gene expression using PyComplexHeatmap. |
|
Dot plot of marker genes — clean drop-in for |
|
Returns the default OmicVerse color palette. |
|
Circular UMAP with metadata tracks. |
|
Plot stacked bar chart showing cell type proportions across groups. |
|
Create combined embedding plot with cell type legend and counts. |
|
Create a flowsig network visualization showing GEM modules and gene flows. |
|
Plot grouped cell-fraction summaries as stacked bars. |
|
Plot PCA variance ratio to determine optimal number of principal components. |
|
Configure plotting settings for OmicVerse. |
|
Create spatial plot from Visium data with color gradient and interpolation. |
|
Format text for plotting by adding line breaks. |
|
Create a dot plot from rank_genes_groups results. |
|
adata (AnnData object): The data object containing the information for plotting. |
|
Plot t-SNE embedding. |
|
Plot UMAP embedding. |
|
Create a Venn diagram to visualize set overlaps. |
|
Enhanced violin plot compatible with omicverse's interface. |
|
Create a volcano plot for differential expression analysis. |
Datasets#
Processed single-cell data PFC adult mice under cocaine self-administration. |
|
Gaussian Blobs dataset. |
|
The BM dataset used in http://pklab.med.harvard.edu/velocyto/notebooks/R/SCG71.nb.html |
|
The bone marrow dataset used in |
|
Bulk data with conditions ulcerative colitis (UC) and Crohn's disease (CD). |
|
The chromaffin dataset used in http://pklab.med.harvard.edu/velocyto/notebooks/R/chromaffin2.nb.html |
|
Placeholder for CITE-seq dataset loader. |
|
Create a mock single-cell dataset for testing statistical functions. |
|
COVID-19 PBMC bulk data from Decov et al. 2020. |
|
COVID-19 PBMC single-cell data from Decov et al. 2020. |
|
The Dentate Gyrus dataset used in velocyto-team/velocyto-notebooks. |
|
The Dentate Gyrus dataset used in theislab/scvelo_notebooks. |
|
Download a dataset file to local storage. |
|
Download data with custom headers to reduce HTTP 403 failures. |
|
Download example data to local folder. |
|
TODO: add data here |
|
The Haber dataset used in velocyto-team/velocyto-notebooks |
|
Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed. |
|
Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed. |
|
The hgForebrainGlutamatergic dataset used in velocyto-team/velocyto-notebooks |
|
TODO: add data here |
|
Download human transcription factors. |
|
Simulated myeloid progenitors (Krumsiek et al. 2011). |
|
Hematopoiesis in early mouse embryos (Moignard et al. 2015). |
|
Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed. |
|
TODO: add data here |
|
The pancreas cellrank dataset used in theislab/scvelo_notebooks. |
|
Pancreatic endocrinogenesis. |
|
Development of Myeloid Progenitors (Paul et al. 2015). |
|
Load PBMC 3k dataset from URL. |
|
PBMC 8k dataset from 10x Genomics. |
|
SC reference data for Lymph Node. |
|
Download organoid dataset from Battich, et al (2020) via a figshare link. |
|
Download rpe1 dataset from Battich, et al (2020) via a figshare link. |
|
TODO: add data here |
|
The neuron splicing data is from Qiu, et al (2020). |
|
The neuron splicing data is from Qiu, et al (2020). |
|
TODO: add data here |
|
SeqFish dataset from 10x Genomics. |
|
Simulated toggleswitch data. |
|
The zebrafish is from Saunders, et al (2019). |
External Integrations (external)#
# Author: Yahui Long # File Name: __init__.py # Description: |
Utilities (utils)#
Call any BioContext MCP tool by name. |
|
Convert gene symbol to Ensembl ID. |
|
Get full text from Europe PMC. |
|
Get UniProt accession ID from protein symbol. |
|
List all available BioContext tools. |
|
Query AlphaFold DB for predicted protein structure. |
|
Query Cell Ontology for cell type terms. |
|
Query ChEBI for chemical entities. |
|
Query EFO for disease ontology terms. |
|
Query Gene Ontology terms for a gene. |
|
Query Human Protein Atlas for tissue-level expression. |
|
Query InterPro for protein domains. |
|
Query Open Targets via GraphQL. |
|
Query PanglaoDB for cell type marker genes. |
|
Query Reactome pathway database. |
|
Query STRING for protein-protein interactions. |
|
Query UniProt protein information. |
|
Search ClinicalTrials.gov by condition. |
|
Search FDA drug database. |
|
Search InterPro entries. |
|
Search Europe PMC for biomedical literature. |
|
Search bioRxiv or medRxiv preprints. |
|
Search PRIDE proteomics repository. |
|
Compute a PAGA graph with optional velocity/time priors. |
|
Run a selected clustering backend on single-cell data. |
|
Convert official gene symbols to Ensembl gene IDs using pyensembl. |
|
Convert Ensembl gene IDs to official gene symbols using pyensembl. |
|
Convert Ensembl IDs in |
|
Rewrite an AnnData object as an |
|
Convert Rust-backed dataframe-like objects to |
|
Download pretrained CaDRReS model parameter/output files. |
|
Download GDSC expression and drug mask tables. |
|
Download gene ID annotation mapping files for various organisms. |
|
Download pathway and gene set databases for enrichment analysis. |
|
Download curated GMT files used by TOSICA workflows. |
|
Load and prepare gene sets from GMT/TXT files for enrichment analysis. |
|
Annotate |
|
Convert GTF file to gene ID mapping pairs TSV format. |
|
Latent Dirichlet Allocation (LDA) topic modeling for single-cell data using MIRA. |
|
Load a Metabolights study into a samples × metabolites AnnData. |
|
Util to run |
|
Plot PAGA graph and optional embedding-level annotations. |
|
Refine labels with neighborhood majority voting. |
|
Retrieve previously stored X matrix from adata.uns and restore to adata.X. |
|
Compute the Ro/e (observed/expected) cell-type enrichment matrix. |
|
Store the X matrix of AnnData in adata.uns for later retrieval. |
|
Convert gene symbols in |
|
Trains a weighted KNN classifier on |
|
Annotates |
|
Wrap a Rust-backed dataframe-like object with a pandas-style interface. |