User API#

Import OmicVerse as:

import omicverse as ov

This page is auto-generated from @register_function entries in the OmicVerse registry.

Public registry entries listed here: 414

Top-Level API#

generate_reference_table

Generate a standardized reference table from adata.uns['REFERENCE_MANU'].

Settings#

settings.cpu_gpu_mixed_init

Initialize CPU-GPU mixed mode for accelerated single-cell analysis.

settings.gpu_init

Initialize GPU mode with RAPIDS for accelerated single-cell analysis.

Data IO#

io.load

Load serialized Python object from disk.

io.read

Read common omics file formats into AnnData or pandas DataFrame.

io.read_10x_h5

Read a 10x Genomics HDF5 matrix file.

io.read_10x_mtx

Read a 10x Genomics Matrix Market directory.

io.read_csv

Read a CSV / TSV file via pandas.read_csv with a mandatory duplicate-column scan on the raw header.

io.read_h5ad

Read an .h5ad file.

io.read_nanostring

Read Nanostring formatted dataset.

io.read_visium_hd

Read 10x Visium HD outputs with a single entry point.

io.read_visium_hd_bin

Read Visium HD bin-level output and attach spatial metadata.

io.read_visium_hd_seg

Read Visium HD cell-segmentation output and attach geometries + spatial metadata.

io.read_xenium

Read a 10x Xenium outs directory into an AnnData object.

io.save

Save Python object to file using pickle fallback strategy.

io.spatial.read_visium

Read 10x-Genomics-formatted Visium dataset.

Alignment#

alignment.amplicon_16s_pipeline

Run the full 16S amplicon pipeline.

alignment.build_amplicon_anndata

Compose anndata.AnnData from vsearch stepwise outputs.

alignment.build_phylogeny

Build a phylogenetic tree end-to-end.

alignment.bulk_rnaseq_pipeline

Run a complete bulk RNA-seq pipeline from SRA accessions or local FASTQs.

alignment.count

Quantify expression matrices from FASTQ files via kb count.

alignment.cutadapt

Run cutadapt to remove amplicon PCR primers.

alignment.dada2_pipeline

Run pydada2 end-to-end and return an AnnData.

alignment.fastp

Run fastp QC.

alignment.fasttree

Run FastTree to infer a phylogenetic tree.

alignment.featureCount

Run featureCounts on BAM files.

alignment.fetch_rdp

Alias for fetch_sintax_ref('rdp_16s_v18', db_dir=...).

alignment.fetch_silva

Alias for fetch_sintax_ref('silva_16s_v123', db_dir=...).

alignment.fetch_sintax_ref

Download a SINTAX-formatted 16S reference FASTA.

alignment.fqdump

Convert SRA accessions to FASTQ.

alignment.mafft

Run MAFFT multiple sequence alignment.

alignment.parallel_fastq_dump

Download SRA data in parallel using parallel-fastq-dump.

alignment.prefetch

Prefetch SRA accessions with validation.

alignment.ref

Build kallisto index and transcript-to-gene mapping files via kb ref.

alignment.STAR

Run STAR alignment.

Preprocessing (pp)#

pp.anndata_to_CPU

Migrate AnnData objects from GPU back to CPU memory after analysis.

pp.anndata_to_GPU

Migrate AnnData objects to GPU memory for accelerated processing.

pp.binary_search

Infer a size factor from one normalized expression vector.

pp.champ

Pick the modularity-stablest Leiden partition via CHAMP (Weir et al. 2017).

pp.filter_cells

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_genes

Filter genes based on number of cells or counts.

pp.highly_variable_features

Select highly variable features (HVF/HVG) for downstream modeling.

pp.highly_variable_genes

Annotate highly variable genes (Satija 2015 / Zheng 2017 / Stuart 2019).

pp.identify_robust_genes

Identify robust genes for downstream HVG selection.

pp.leiden

leiden clustering

pp.log1p

Log-transform expression values with log(1 + x).

pp.louvain

Run Louvain clustering on the precomputed kNN graph.

pp.mde

Run MDE (Minimum Distortion Embedding) from a latent representation.

pp.neighbors

Compute a neighborhood graph of observations [McInnes18].

pp.normalize_pearson_residuals

Normalize a count matrix using analytic Pearson residuals (Lause 2021).

pp.pca

Performs Principal Component Analysis (PCA) on the data stored in a scanpy AnnData object.

pp.preprocess

Preprocesses the AnnData object adata using either a scanpy or a pearson residuals workflow for size normalization and highly variable genes (HVGs) selection, and calculates signature scores if necessary.

pp.qc

Perform quality control on a dictionary of AnnData objects.

pp.recover_counts

Given log-normalized gene expression data, recover the raw read/UMI counts by inferring the unknown size factors.

pp.regress

Regress out technical covariates (mito_perc, nUMIs) from each gene.

pp.regress_and_scale

Scale the regressed layer and store it as a new analysis layer.

pp.remove_cc_genes

Remove cell-cycle-correlated genes from highly_variable_features.

pp.scale

Scale the input AnnData object.

pp.score_genes_cell_cycle

Score cell cycle phases using predefined or custom gene sets.

pp.scrublet

Predict cell doublets using Scrublet with optional GPU acceleration.

pp.scrublet_simulate_doublets

Simulate synthetic doublets from random cell pairs.

pp.select_hvf_pegasus

Select highly variable features with the Pegasus strategy.

pp.sude

SUDE (Scalable Unsupervised Dimensionality reduction via Embedding) dimensionality reduction.

pp.tsne

Compute t-SNE coordinates for cells, dispatching by ov.settings.mode.

pp.umap

Compute UMAP embedding, dispatching to the best backend for ov.settings.mode.

Single-cell (single)#

single.Annotation

Unified single-cell annotation manager for cell-type labeling.

single.AnnotationRef

Reference-based label transfer helper for single-cell annotation.

single.auto_resolution

Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004).

single.autoResolution

Pick the most reproducible Leiden resolution via null-adjusted bootstrap-ARI (Lange, Roth, Braun & Buhmann, Neural Computation 2004).

single.batch_correction

Run batch-effect correction for single-cell data integration.

single.CellOntologyMapper

Map free-text cell-type annotations to the Cell Ontology (CL) via NLP.

single.CellVote

Ensemble cell-type annotation manager with multiple backends.

single.cNMF

Consensus NMF workflow wrapper for robust gene-program discovery.

single.convert_human_to_mouse_network

Convert a human-symbol interaction network to mouse symbols.

single.cosg

Identify cluster-specific marker genes with COSG.

single.cytotrace2

Predict developmental potency with CytoTRACE2.

single.DCT

Differential cell-type abundance testing wrapper.

single.DEG

Differential gene-expression testing wrapper for single-cell datasets.

single.download_cellphonedb_database

Download CellPhoneDB database with fallback URLs.

single.download_cl

📥 Download Cell Ontology file from multiple sources with automatic fallback

single.Drug_Response

Predict drug sensitivity from single-cell transcriptomes using CaDRReS models.

single.dynamic_features

Fit GAM-based pseudotime trends for one or more datasets or groups.

single.factor_correlation

Score MOFA-factor enrichment across annotated groups.

single.factor_exact

Add MOFA latent factors from model file into adata.obs.

single.Fate

Adaptive ridge-regression framework for pseudotime-associated gene discovery.

single.find_markers

Find marker genes for each cluster / group in single-cell data.

single.format_liana_results

Format LIANA results into the communication AnnData expected by ov.pl.ccc_*.

single.gene_trends

Fit and visualize smooth feature trends along pseudotime.

single.generate_scRNA_report

Generate a MultiQC-style HTML report for single-cell RNA-seq analysis.

single.geneset_aucell

Calculate the AUC-ell score for a given gene set.

single.get_celltype_marker

Get marker genes for each cluster/cell type.

single.get_cluster_celltype

Resolve one final cell type for each cluster with LLM calls.

single.get_markers

Extract top marker genes from rank_genes_groups results.

single.get_obs_value

Transfer per-cell annotations/statistics to metacells.

single.get_weights

Extract feature loadings for one factor from a MOFA model.

single.GLUE_pair

Pair RNA and ATAC cells using GLUE latent embeddings and neighbor matching.

single.gptcelltype

Annotate cluster cell types with a remote LLM service.

single.gptcelltype_local

Annotate cell types with a local instruction-tuned LLM.

single.lazy

Run a one-click single-cell analysis pipeline with resumable steps.

single.load_human_prior_interaction_network

Load one of the packaged human prior interaction networks.

single.MetaCell

Unified metacell wrapper with dispatchable backends.

single.MetaTiME

MetaTiME wrapper for tumor microenvironment cell-state annotation.

single.Monocle

Monocle2-style single-cell trajectory analysis.

single.mouse_hsc_nestorowa16

Load Nestorowa16 mouse HSC reference data used by CEFCON.

single.pathway_aucell

Calculate the area under the curve (AUC) for a set of pathways in an AnnData object.

single.pathway_aucell_enrichment

Enrich cell annotations with pathway activity scores using the AUC-ell method.

single.pathway_enrichment

Perform pathway enrichment analysis on gene expression data.

single.pathway_enrichment_plot

Visualize the pathway enrichment analysis results as a heatmap.

single.plot_metacells

Plot metacell centroids on a given embedding axis.

single.pyCEFCON

CEFCON workflow wrapper for driver-regulator discovery.

single.pyMOFA

Train MOFA models for latent factor discovery across multiple omics layers.

single.pyMOFAART

Load pretrained MOFA models for downstream factor interpretation.

single.pySCSA

Automated cell-type annotation using SCSA marker-enrichment scoring.

single.pySIMBA

SIMBA wrapper for single-cell batch integration and graph-embedding construction.

single.pyTOSICA

TOSICA wrapper for pathway-informed transformer-based cell-type annotation.

single.run_cellphonedb_v5

Run CellPhoneDB statistical analysis with automatic database download

single.run_liana

Run LIANA ligand-receptor inference on an AnnData object.

single.scanpy_cellanno_from_dict

Add cell type annotation from dict to anndata object.

single.SCENIC

single.TrajInfer

Trajectory inference class for single-cell data analysis.

single.Velo

RNA velocity analysis wrapper for directional cell-state transition inference.

Bulk RNA-seq (bulk)#

bulk.batch_correction

Perform batch effect correction using ComBat algorithm.

bulk.Deconvolution

Bulk RNA-seq deconvolution class for inferring cell-type fractions from single-cell references.

bulk.geneset_enrichment

Perform pathway enrichment analysis using Enrichr-compatible gene-set libraries.

bulk.geneset_plot

Plot enrichment results as a bubble plot.

bulk.geneset_plot_multi

Plot multiple enrichment result tables in a unified dot-clustermap panel.

bulk.Matrix_ID_mapping

Map gene IDs in the input data to gene symbols using a reference table.

bulk.pyDEG

Differential-expression analysis helper for bulk RNA-seq count tables.

bulk.pyGSEA

Gene Set Enrichment Analysis (GSEA) wrapper for ranked gene lists.

bulk.pyPPI

Protein-protein interaction (PPI) analysis wrapper based on STRING.

bulk.pyTCGA

TCGA (The Cancer Genome Atlas) data analysis module.

bulk.pyWGCNA

Weighted Gene Co-expression Network Analysis.

bulk.readWGCNA

Load a previously saved WGCNA object from disk.

bulk.string_interaction

Analyze protein-protein interaction network using STRING database.

Metabolomics (metabol)#

metabol.aggregate_by_class

Collapse the matrix to class-level totals.

metabol.annotate_lipids

Parse each var_name as a lipid and add lipid_class / total_carbons / total_db columns to adata.var.

metabol.annotate_peaks

Map a list of m/z peaks to candidate KEGG compounds via adduct search.

metabol.anova

Per-metabolite test across 3+ groups.

metabol.asca

ASCA — ANOVA-Simultaneous Component Analysis (Smilde 2005).

metabol.asca_variance_bar

Horizontal bars of per-effect variance-explained fractions.

metabol.biomarker_panel

Nested-CV evaluation of a multi-metabolite biomarker panel.

metabol.blank_filter

Drop features whose sample-mean intensity isn't at least ``ratio``× the blank-mean intensity.

metabol.corr_network

Pairwise metabolite correlation network within a single condition.

metabol.corr_network_plot

Draw an edge DataFrame as a NetworkX spring-layout plot.

metabol.cv_filter

Drop features with coefficient-of-variation above cv_threshold.

metabol.dgca

Differential correlation between two groups.

metabol.dgca_class_bar

Bar chart of DC-class counts (+/+, +/0, +/-, -/+, ...).

metabol.differential

Run a univariate two-group test across all metabolites.

metabol.drift_correct

Correct systematic signal drift using LOESS regression on QC samples.

metabol.fetch_chebi_compounds

Build a compound master table from ChEBI's flat-file TSVs.

metabol.fetch_hmdb_from_name

Resolve a metabolite name → HMDB / KEGG / ChEBI / PubChem CID.

metabol.fetch_kegg_pathways

Fetch the full KEGG compound→pathway map via KEGG REST.

metabol.fetch_lion_associations

Fetch the full LION lipid↔ontology associations.

metabol.impute

Impute missing values (NaN / 0) in adata.X.

metabol.lion_enrichment

LION-style over-representation for lipid classes / properties.

metabol.load_pathways

Return {pathway_name: [kegg_id, ...]} — the pathway database used by msea_ora() / msea_gsea() / mummichog_basic().

metabol.map_ids

Resolve metabolite names to external database IDs.

metabol.meba

Per-feature Hotelling T-squared for two-group time-course comparison.

metabol.mixed_model

Per-feature statsmodels.MixedLM fit.

metabol.msea_gsea

GSEA-style ranked enrichment via gseapy.prerank.

metabol.msea_ora

Over-representation analysis via Fisher's exact test.

metabol.mummichog_basic

Pure-Python mummichog — pathway enrichment from m/z peaks.

metabol.normalize

Normalize each sample (row) of adata.X to correct for dilution.

metabol.opls_da

Orthogonal Projection to Latent Structures — Discriminant Analysis.

metabol.parse_lipid

Parse a LIPID MAPS-shorthand lipid name.

metabol.pathway_bar

Horizontal bar chart of pathway enrichment p-values.

metabol.pathway_dot

Dot plot of pathway enrichment — the de-facto standard figure.

metabol.plsda

Partial Least Squares Discriminant Analysis (wraps sklearn PLS).

metabol.pyMetabo

Lifecycle class for a metabolomics analysis.

metabol.read_lcms

Load an LC-MS peak table with m/z/RT feature IDs into AnnData.

metabol.read_metaboanalyst

Load a MetaboAnalyst-format CSV into AnnData.

metabol.read_wide

Load a generic wide (samples × metabolites) table into AnnData.

metabol.roc_feature

Per-feature AUC for a binary class.

metabol.run_mofa

Train MOFA+ on sample-aligned metabolomics + other-omics views.

metabol.s_plot

OPLS-DA S-plot: p(cov) vs p(corr), i.e. covariance vs correlation between each feature and the predictive component.

metabol.sample_qc

Hotelling T-squared + DModX sample-level outlier detection.

metabol.sample_qc_plot

Scatter of Hotelling T² vs DModX with critical-value lines.

metabol.serrf

SERRF — QC-based Random Forest drift correction (Fan 2019).

metabol.transform

Apply a feature-level transformation to adata.X.

metabol.vip_bar

Horizontal bar chart of top-top_n VIP metabolites.

metabol.volcano

Metabolomics volcano plot — log2FC vs -log10(padj) (or pvalue).

Microbiome (micro)#

micro.Alpha

Compute and store per-sample alpha-diversity metrics on adata.obs.

micro.attach_tree

Attach a phylogenetic tree to adata.uns[store_key].

micro.Beta

Compute sample × sample distance matrices.

micro.clr

CLR transform: log(x_i) - mean(log(x)) per sample (post pseudo-count).

micro.collapse_taxa

Collapse ASVs to a taxonomic rank.

micro.combine_studies

Stitch a list of per-study AnnDatas into a single cross-cohort table.

micro.DA

Per-feature differential abundance across sample groups.

micro.fetch_franzosa_ibd_2019

Download + parse the Franzosa et al. 2019 paired IBD dataset.

micro.filter_by_prevalence

Filter rare features by prevalence.

micro.ilr

ILR transform — orthonormal coordinate system after closure removal.

micro.meta_da

Per-study DA + inverse-variance meta-analysis.

micro.MMvec

MMvec (Morton et al. 2019) in ~80 lines of PyTorch.

micro.Ordinate

Reduce a sample × sample distance matrix to 2-D / 3-D coords.

micro.paired_cca

Run sklearn CCA on the paired tables.

micro.paired_spearman

Rank correlation between every (microbe, metabolite) pair.

micro.plot_embedding_biplot

Biplot of microbe + metabolite embeddings in the MMvec latent space.

micro.plot_mmvec_training

Training (and validation) loss curve for a fitted MMvec.

micro.rarefy

Rarefy counts to a common depth.

micro.simulate_paired

Build a paired microbe + metabolite cohort with planted producer pairs.

Spatial transcriptomics (space)#

space.bin2cell

Aggregate binned Visium signals into cell-level profiles.

space.Cal_Spatial_Net

Construct spatial neighbor networks for spatial integration.

space.calculate_gene_signature

Build a marker-gene signature table for each cell type in a reference scRNA-seq dataset.

space.CAST

CAST (Cell Annotation for Spatial Transcriptomics) embedding for multiple spatial samples.

space.cellcharter

Run a minimal CellCharter workflow on a spatial AnnData object.

space.CellLoc

SpatRio CellLoc class for probabilistic cell localization.

space.CellMap

SpatRio CellMap class for mapping single cells to spatial coordinates.

space.clusters

Perform clustering analysis on spatial transcriptomics data using multiple methods.

space.create_communication_anndata

Build a CellChat-style communication AnnData from commot outputs.

space.crop_space_visium

Crop Visium spatial data to a specific region of interest.

space.Deconvolution

Spatial deconvolution pipeline that aligns scRNA-seq references with spatial transcriptomics.

space.GASTON

GASTON spatial depth estimation and clustering.

space.map_spatial_auto

Automatically map and align spatial transcriptomics data.

space.map_spatial_manual

Manually adjust spatial transcriptomics data alignment.

space.merge_cluster

Merge clusters based on hierarchical clustering of their representation.

space.moranI

Compute Moran's I spatial autocorrelation for gene expression.

space.nmf_tissue_zones

Discover tissue zones via NMF on a per-spot cell-abundance matrix.

space.pySpaceFlow

SpaceFlow spatial flow analysis class.

space.pySTAGATE

A class representing the PyTorch implementation of STAGATE (Spatial Transcriptomics Analysis using Graph Attention autoEncoder).

space.pySTAligner

STAligner for spatial transcriptomics data integration.

space.read_visium_10x

Read and standardize 10x Visium data with bin2cell-compatible loader.

space.rotate_space_visium

Rotate Visium spatial data image and coordinates by a specified angle.

space.salvage_secondary_labels

Merge primary and secondary segmentation labels.

space.spatial_autocorr

Compute spatial autocorrelation statistics for gene expression.

space.spatial_neighbors

Build a spatial neighborhood graph from coordinates stored in adata.obsm.

space.STT

Spatial Transition Tensor (STT) analysis class.

space.svg

Identify spatially variable genes using multiple methods.

space.sync_visium_hd_seg_geometries

Synchronize adata.uns["spatial"][sample]["geometries"] with current adata.obs_names.

space.Tangram

Tangram spatial deconvolution class for cell type mapping.

space.update_classification_from_database

Update communication interaction annotations from commot database metadata.

space.visium_10x_hd_cellpose_expand

Expand segmentation labels from nuclei to nearby bins.

space.visium_10x_hd_cellpose_gex

Run expression-image segmentation and map labels back to spatial bins.

space.visium_10x_hd_cellpose_he

Convert Visium 10x data to cell-level data.

Bulk-to-Single (bulk2single)#

bulk2single.Bulk2Single

VAE-based bulk-to-single framework for reconstructing pseudo single cells from bulk RNA-seq.

bulk2single.bulk2single_plot_cellprop

Plot cell-type proportions in generated single-cell data.

bulk2single.bulk2single_plot_correlation

Plot correlation matrix between reference and generated single-cell data.

bulk2single.BulkTrajBlend

Integrate bulk and single-cell information to infer transitional cell-state trajectories.

bulk2single.Single2Spatial

Deep-learning mapper that projects single-cell profiles onto spatial coordinates.

Plotting (pl)#

pl.add_density_contour

Add KDE-based density contours to an existing matplotlib plot.

pl.add_palue

Add p-value annotation with connecting line to a matplotlib plot.

pl.add_pie2spatial

Overlay per-spot pie charts of cell-type composition on a spatial map.

pl.add_streamplot

Overlay velocity streamlines on a low-dimensional embedding.

pl.branch_streamplot

Render a branch-aware pseudotime stream plot.

pl.bardotplot

Create a combined bar-and-dot summary plot by groups.

pl.boxplot

Create a boxplot with jittered points to visualize data distribution across categories.

pl.branch_streamplot

Render a branch-aware pseudotime stream plot.

pl.calculate_gene_density

Calculate weighted kernel density estimates for gene expression on 2D embeddings.

pl.ccc_heatmap

Plot communication matrices as heatmaps, dot plots, or bubble maps.

pl.ccc_network_plot

Plot cell-cell communication networks with multiple graph styles.

pl.ccc_stat_plot

Plot communication summaries, distributions, and pathway statistics.

pl.cell_cor_heatmap

Compute pairwise correlation/similarity between cell groups and plot as heatmap.

pl.CellChatViz

Visualization helper for CellPhoneDB cell-cell communication outputs.

pl.cellproportion

Plot cell proportion of each cell type in each visual cluster.

pl.complexheatmap

Generate a complex annotated heatmap using PyComplexHeatmap.

pl.contour

Overlay a KDE contour for selected clusters on embedding axes.

pl.ConvexHull

Plot the ConvexHull for a cluster in embedding.

pl.create_custom_colormap

Build a transparent-to-opaque LinearSegmentedColormap of a single colour.

pl.dotplot

Make a dot plot of the expression values of var_names.

pl.dynamic_heatmap

Plot dynamic feature trends along pseudotime, optionally by lineage.

pl.dynamic_trends

Plot GAM-fitted pseudotime trends for one or more genes.

pl.embedding

Scatter plot for user specified embedding basis (e.g. umap, pca, etc).

pl.embedding_adjust

Get locations of cluster median and adjust text labels accordingly.

pl.embedding_atlas

Render large-scale embeddings with Datashader.

pl.embedding_celltype

Plot embedding with celltype color by omicverse.

pl.embedding_density

Plot cluster-specific density on an existing embedding.

pl.feature_heatmap

Plot cell-level feature expression ordered by groups or metadata.

pl.ForbiddenCity

Forbidden City traditional-color palette utility.

pl.gen_mpl_labels

Add cluster labels at median positions in embedding plots with automatic text positioning.

pl.geneset_wordcloud

Build cluster-wise gene-set word clouds along pseudotime.

pl.group_heatmap

Plot grouped mean expression as a Marsilea heatmap.

pl.marker_heatmap

Create a dot plot heatmap showing marker gene expression using PyComplexHeatmap.

pl.markers_dotplot

Dot plot of marker genes — clean drop-in for rank_genes_groups_dotplot().

pl.palette

Returns the default OmicVerse color palette.

pl.plot1cell

Circular UMAP with metadata tracks.

pl.plot_cellproportion

Plot stacked bar chart showing cell type proportions across groups.

pl.plot_embedding_celltype

Create combined embedding plot with cell type legend and counts.

pl.plot_flowsig_network

Create a flowsig network visualization showing GEM modules and gene flows.

pl.plot_grouped_fractions

Plot grouped cell-fraction summaries as stacked bars.

pl.plot_pca_variance_ratio

Plot PCA variance ratio to determine optimal number of principal components.

pl.plot_set

Configure plotting settings for OmicVerse.

pl.plot_spatial

Create spatial plot from Visium data with color gradient and interpolation.

pl.plot_text_set

Format text for plotting by adding line breaks.

pl.rank_genes_groups_dotplot

Create a dot plot from rank_genes_groups results.

pl.single_group_boxplot

adata (AnnData object): The data object containing the information for plotting.

pl.tsne

Plot t-SNE embedding.

pl.umap

Plot UMAP embedding.

pl.venn

Create a Venn diagram to visualize set overlaps.

pl.violin

Enhanced violin plot compatible with omicverse's interface.

pl.volcano

Create a volcano plot for differential expression analysis.

Datasets#

datasets.bhattacherjee

Processed single-cell data PFC adult mice under cocaine self-administration.

datasets.blobs

Gaussian Blobs dataset.

datasets.bm

The BM dataset used in http://pklab.med.harvard.edu/velocyto/notebooks/R/SCG71.nb.html

datasets.bone_marrow

The bone marrow dataset used in

datasets.burczynski06

Bulk data with conditions ulcerative colitis (UC) and Crohn's disease (CD).

datasets.chromaffin

The chromaffin dataset used in http://pklab.med.harvard.edu/velocyto/notebooks/R/chromaffin2.nb.html

datasets.cite_seq

Placeholder for CITE-seq dataset loader.

datasets.create_mock_dataset

Create a mock single-cell dataset for testing statistical functions.

datasets.decov_bulk_covid_bulk

COVID-19 PBMC bulk data from Decov et al. 2020.

datasets.decov_bulk_covid_single

COVID-19 PBMC single-cell data from Decov et al. 2020.

datasets.dentate_gyrus

The Dentate Gyrus dataset used in velocyto-team/velocyto-notebooks.

datasets.dentate_gyrus_scvelo

The Dentate Gyrus dataset used in theislab/scvelo_notebooks.

datasets.download_data

Download a dataset file to local storage.

datasets.download_data_requests

Download data with custom headers to reduce HTTP 403 failures.

datasets.get_adata

Download example data to local folder.

datasets.gillespie

TODO: add data here

datasets.haber

The Haber dataset used in velocyto-team/velocyto-notebooks

datasets.hematopoiesis

Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed.

datasets.hematopoiesis_raw

Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed.

datasets.hg_forebrain_glutamatergic

The hgForebrainGlutamatergic dataset used in velocyto-team/velocyto-notebooks

datasets.hl60

TODO: add data here

datasets.human_tfs

Download human transcription factors.

datasets.krumsiek11

Simulated myeloid progenitors (Krumsiek et al. 2011).

datasets.moignard15

Hematopoiesis in early mouse embryos (Moignard et al. 2015).

datasets.multi_brain_5k

Processed dataset originally from https://pitt.box.com/v/hematopoiesis-processed.

datasets.nascseq

TODO: add data here

datasets.pancreas_cellrank

The pancreas cellrank dataset used in theislab/scvelo_notebooks.

datasets.pancreatic_endocrinogenesis

Pancreatic endocrinogenesis.

datasets.paul15

Development of Myeloid Progenitors (Paul et al. 2015).

datasets.pbmc3k

Load PBMC 3k dataset from URL.

datasets.pbmc8k

PBMC 8k dataset from 10x Genomics.

datasets.sc_ref_Lymph_Node

SC reference data for Lymph Node.

datasets.sceu_seq_organoid

Download organoid dataset from Battich, et al (2020) via a figshare link.

datasets.sceu_seq_rpe1

Download rpe1 dataset from Battich, et al (2020) via a figshare link.

datasets.scifate

TODO: add data here

datasets.scnt_seq_neuron_labeling

The neuron splicing data is from Qiu, et al (2020).

datasets.scnt_seq_neuron_splicing

The neuron splicing data is from Qiu, et al (2020).

datasets.scslamseq

TODO: add data here

datasets.seqfish

SeqFish dataset from 10x Genomics.

datasets.toggleswitch

Simulated toggleswitch data.

datasets.zebrafish

The zebrafish is from Saunders, et al (2019).

External Integrations (external)#

external.GraphST

# Author: Yahui Long # File Name: __init__.py # Description:

Utilities (utils)#

utils.biocontext.call_tool

Call any BioContext MCP tool by name.

utils.biocontext.get_ensembl_id

Convert gene symbol to Ensembl ID.

utils.biocontext.get_fulltext

Get full text from Europe PMC.

utils.biocontext.get_uniprot_id

Get UniProt accession ID from protein symbol.

utils.biocontext.list_tools

List all available BioContext tools.

utils.biocontext.query_alphafold

Query AlphaFold DB for predicted protein structure.

utils.biocontext.query_cell_ontology

Query Cell Ontology for cell type terms.

utils.biocontext.query_chebi

Query ChEBI for chemical entities.

utils.biocontext.query_efo

Query EFO for disease ontology terms.

utils.biocontext.query_go

Query Gene Ontology terms for a gene.

utils.biocontext.query_hpa

Query Human Protein Atlas for tissue-level expression.

utils.biocontext.query_interpro

Query InterPro for protein domains.

utils.biocontext.query_opentargets

Query Open Targets via GraphQL.

utils.biocontext.query_panglaodb

Query PanglaoDB for cell type marker genes.

utils.biocontext.query_reactome

Query Reactome pathway database.

utils.biocontext.query_string

Query STRING for protein-protein interactions.

utils.biocontext.query_uniprot

Query UniProt protein information.

utils.biocontext.search_clinical_trials

Search ClinicalTrials.gov by condition.

utils.biocontext.search_drugs

Search FDA drug database.

utils.biocontext.search_interpro

Search InterPro entries.

utils.biocontext.search_literature

Search Europe PMC for biomedical literature.

utils.biocontext.search_preprints

Search bioRxiv or medRxiv preprints.

utils.biocontext.search_pride

Search PRIDE proteomics repository.

utils.cal_paga

Compute a PAGA graph with optional velocity/time priors.

utils.cluster

Run a selected clustering backend on single-cell data.

utils.convert2gene_id

Convert official gene symbols to Ensembl gene IDs using pyensembl.

utils.convert2gene_symbol

Convert Ensembl gene IDs to official gene symbols using pyensembl.

utils.convert2symbol

Convert Ensembl IDs in adata.var_names to official gene symbols.

utils.convert_adata_for_rust

Rewrite an AnnData object as an .h5ad file that ov.read(..., backend='rust') can open.

utils.convert_to_pandas

Convert Rust-backed dataframe-like objects to pandas.DataFrame.

utils.download_CaDRReS_model

Download pretrained CaDRReS model parameter/output files.

utils.download_GDSC_data

Download GDSC expression and drug mask tables.

utils.download_geneid_annotation_pair

Download gene ID annotation mapping files for various organisms.

utils.download_pathway_database

Download pathway and gene set databases for enrichment analysis.

utils.download_tosica_gmt

Download curated GMT files used by TOSICA workflows.

utils.geneset_prepare

Load and prepare gene sets from GMT/TXT files for enrichment analysis.

utils.get_gene_annotation

Annotate adata.var by merging with gene-level GTF attributes.

utils.gtf_to_pair_tsv

Convert GTF file to gene ID mapping pairs TSV format.

utils.LDA_topic

Latent Dirichlet Allocation (LDA) topic modeling for single-cell data using MIRA.

utils.load_metabolights

Load a Metabolights study into a samples × metabolites AnnData.

utils.mde

Util to run pymde.preserve_neighbors() for visualization of scvi-tools embeddings.

utils.plot_paga

Plot PAGA graph and optional embedding-level annotations.

utils.refine_label

Refine labels with neighborhood majority voting.

utils.retrieve_layers

Retrieve previously stored X matrix from adata.uns and restore to adata.X.

utils.roe

Compute the Ro/e (observed/expected) cell-type enrichment matrix.

utils.store_layers

Store the X matrix of AnnData in adata.uns for later retrieval.

utils.symbol2id

Convert gene symbols in adata.var_names to Ensembl gene IDs.

utils.weighted_knn_trainer

Trains a weighted KNN classifier on train_adata.

utils.weighted_knn_transfer

Annotates query_adata cells with an input trained weighted KNN classifier.

utils.wrap_dataframe

Wrap a Rust-backed dataframe-like object with a pandas-style interface.