Release Notes#
v 1.0.0#
First public release.
v 1.1.7#
bulk module:#
Added Deseq2, including
pyDEseqfunctions:deseq2_normalize,estimateSizeFactors,estimateDispersions,Matrix_ID_mapping.Included TCGA with
TCGA.Introduced Enrichment with functions
geneset_enrichment,geneset_plot.
single module:#
Integrated scdrug with functions
autoResolution,writeGEP,Drug_Response.Added cpdb with functions
cpdb_network_cal,cpdb_plot_network,cpdb_plot_interaction,cpdb_interaction_filtered.Included scgsea with functions
geneset_aucell,pathway_aucell,pathway_aucell_enrichment,pathway_enrichment,pathway_enrichment_plot.
v 1.1.8#
single module:#
Addressed errors in cpdb, including import errors and color issues in
cpdb_plot_network.Introduced
cpdb_submeans_exactedin cpdb for easy sub-network extraction.
v 1.1.9#
bulk2single module:#
Added the
bulk2singlemodule.Fixed model load error from bulk2space.
Resolved early stop issues from bulk2space.
Included more user-friendly input methods and visualizations.
Added loss history visualization.
utils module:#
Introduced
pyomic_palettein the plot module.
v 1.1.10#
Updated all code references.
single module:#
Fixed non-valid parameters in
single.mofa.mofa_runfunction.Added layer raw count addition in
single.scanpy_lazyfunction.Introduced
utils.plot_boxplotfor plotting box plots with jittered points.Added
bulk.pyDEseq.plot_boxplotfor plotting box plots with jittered points for specific genes.
v 1.2.0#
bulk module:#
Fixed non-valid
cutoffparameter inbulk.geneset_enrichment.Added modules:
pyPPI,pyGSEA,pyWGCNA,pyTCGA,pyDEG.
bulk2single module:#
Introduced
bulk2single.savefor manual model saving.
v 1.2.1-4#
single module:#
Added
pySCSAmodule with functions:cell_anno,cell_anno_print,cell_auto_anno,get_model_tissue.Implemented doublet cell filtering in
single.scanpy_lazy.Added
single.scanpy_cellanno_from_dictfor easier annotation.Updated SCSA database from CellMarker2.0.
Fixed errors in SCSA database keys:
Ensembl_HGNCandEnsembl_Mouse.
v 1.2.5#
single module:#
Added
pyVIAmodule with functions:run,plot_piechart_graph,plot_stream,plot_trajectory_gams,plot_lineage_probability,plot_gene_trend,plot_gene_trend_heatmap,plot_clustergraph.Fixed warning error in
utils.pyomic_plot_set.Updated requirements, including
pybind11,hnswlib,termcolor,pygam,pillow,gdown.
v 1.2.6#
single module:#
Added
pyVIA.get_piechart_dictandpyVIA.get_pseudotime.
v 1.2.7#
bulk2single module:#
Added
Single2Spatialmodule with functions:load,save,train,spot_assess.Fixed installation errors for packages in pip.
v 1.2.8#
Fixed pip installation errors.
bulk2single module:#
Replaced
deep-forestinSingle2SpatialwithNeuron Networkfor classification tasks.Accelerated the entire Single2Spatial inference process using GPU and batch-level estimation by modifying the
predicted_sizesetting.
v 1.2.9#
bulk module:#
Fixed duplicates_index mapping in
Matrix_ID_mapping.Resolved hub genes plot issues in
pyWGCNA.plot_sub_network.Fixed backupgene in
pyGSEA.geneset_enrichmentto support rare species.Added matrix plot module in
pyWGCNA.plot_matrix.
single module:#
Added
rank_genes_groupscheck inpySCSA.
bulk2single module:#
Fixed import error of
deepforest.
v 1.2.10#
Renamed the package to
omicverse.
single module:#
Fixed argument error in
pySCSA.
bulk2single module:#
Updated plot arguments in
bulk2single.
v 1.2.11#
bulk module:#
Fixed
wilcoxonmethod inpyDEG.deg_analysis.Added parameter setting for treatment and control group names in
pyDEG.plot_boxplot.Fixed figure display issues in
pyWGCNA.plot_matrix.Fixed category correlation failed by one-hot in
pyWGCNA.analysis_meta_correlation.Fixed network display issues in
pyWGCNA.plot_sub_networkand updatedutils.plot_networkto avoid errors.
v 1.3.0#
bulk module:#
Added
DEseq2method topyDEG.deg_analysis.Introduced
pyGSEAmodule inbulk.Renamed raw
pyGSEAtopyGSEinbulk.Added
get_gene_annotationinutilsfor gene name transformation.
v 1.3.1#
single module:#
Added
get_celltype_markermethod.
single module:#
Added
GLUE_pair,pyMOFA,pyMOFAARTmodule.Added tutorials for
Multi omics analysis by MOFA and GLUE.Updated tutorial for
Multi omics analysis by MOFA.
v 1.4.0#
bulk2single module:#
Added
BulkTrajBlendmethod.
single module:#
Fixed errors in
scnocdmodel.Added
save,load, andget_pair_dictinscnocdmodel.
utils module:#
Added
mdemethod.Added
gzformat support forutils.read.
v 1.4.1#
preprocess module:#
v 1.4.3#
#
preprocess module:
Fixed sparse preprocess error in
pp.Fixed trajectory import error in
via.Added gene correlation analysis of trajectory.
v 1.4.4#
single module:#
Added
panglaodbdatabase topySCSAmodule.Fixed errors in
pySCSA.cell_auto_annowhen some cell types are not found in clusters.Fixed errors in
pySCSA.cell_annowhenrank_genes_groupsare not consistent with clusters.Added
pySIMBAmodule in single for batch correction.
preprocess module:#
Added
store_layersandretrieve_layersinov.utils.Added
plot_embedding_celltypeandplot_cellproportioninov.utils.
v 1.4.5#
single module:#
Added
MetaTiMEmodule to perform cell type annotation automatically in TME.
v 1.4.12#
Updated
conda install omicverse -c conda-forge.
single module:#
Added
pyTOSICAmodule to perform cell type migration from reference scRNA-seq in Transformer model.Added
atac_concat_get_index,atac_concat_inner,atac_concat_outerfunctions to merge/concatenate scATAC data.Fixed
MetaTime.predictedwhen Unknown cell type appears.
preprocess module:#
Added
plot_embeddinginov.utilsto plot UMAP in a special color dictionary.
v 1.4.13#
bulk module:#
Added
mad_filteredto filter robust genes when calculating the network inov.bulk.pyWGCNAmodule.Fixed
string_interactioninov.bulk.pyPPIfor string-db updates.
preprocess module:#
Changed
modeargument ofpp.preprocessto control preprocessing steps.Added
ov.utils.embedding,ov.utils.neighbors, andov.utils.stacking_vol.
v 1.4.14#
preprocess module:#
Added
batch_keyinpp.preprocessandpp.qc.
utils module:#
Added
plot_ConvexHullto visualize the boundary of clusters.Added
weighted_knn_trainerandweighted_knn_transferfor multi-adata integration.
single module:#
Fixed import errors in
mofa.
v 1.4.17#
bulk module:#
Fixed compatibility issues with
pydeseq2version0.4.0.Added
bulk.batch_correctionfor multi-bulk RNA-seq/microarray samples.
single module:#
Added
single.batch_correctionfor multi-single cell datasets.
preprocess module:#
Added parameter
layers_addinpp.scale.
v 1.5.0#
single module:#
Added
cellfategenieto calculate timing-associated genes/genesets.Fixed the name error in
atac_concat_outer.Added more kwargs for
batch_correction.
utils module:#
Added
plot_heatmapto visualize the heatmap of pseudotime.Fixed
embeddingwhen the version ofmplis larger than3.7.0.Added
geneset_wordcloudto visualize geneset heatmaps of pseudotime.
v 1.5.1#
single module:#
Added
scLTNNto infer cell trajectory.
bulk2single module:#
Updated cell fraction prediction with
TAPEin bulk2single.Fixed group and normalization issues in bulk2single.
utils module:#
Added
Ro/ecalculation (by: Haihao Zhang).Added
cal_pagaandplot_pagato visualize the state transfer matrix.Fixed the
readfunction.
v 1.5.2#
bulk2single Module:#
Resolved a matrix error occurring when gene symbols are not unique.
Addressed the
interpolationissue inBulkTrajBlendwhen target cells do not exist.Corrected the
generatefunction inBulkTrajBlend.Rectified the argument for
vae_configureinBulkTrajBlendwhencell_target_numis set to None.Introduced the parameter
max_single_cellsfor input inBulkTrajBlend.Defaulted to using
scadenfor deconvolution in Bulk RNA-seq.
single Module:#
Fixed an error in
pyVIAwhen the root is set to None.Added the
TrajInfermodule for inferring cell trajectories.Integrated
PalantirandDiffusion_mapinto theTrajInfermodule.Corrected the parameter error in
batch_correction.
utils Module:#
Introduced
plot_pca_variance_ratiofor visualizing the ratio of PCA variance.Added the
clusterandfilteredmodule for clustering the cellsIntegrated
MiRAto calculate the LDA topic
v 1.5.3#
single Module:#
Added
scVIandMIRAto remove batch effect
space Module:#
Added
STAGATEto cluster and denoisy the spatial RNA-seq
pp Module:#
Added
doubletsargument ofov.pp.qcto control doublets(‘Default’=True)
v 1.5.4#
bulk Module:#
Fixed an error in
pyDEG.deg_analysiswhenn_cpuscan not be set inpyDeseq2(v0.4.3)
single Module:#
Fixed an argument error in
single.batch_correctionof combat
utils Module:#
Added
venn4plot to visualizeFixed the label visualization of
plot_networkAdded
ondiskargument ofLDA_topic
space Module:#
Added
Tangramto mapping the scRNA-seq to stRNA-seq
v 1.5.5#
pp Module:#
Added
max_cells_ratioandmax_genes_ratioto control the max threshold in qc of scRNA-seq
single Module:#
Added
SEACellsmodel to calculate the metacells from scRNA-seq
space Module:#
Added
STAlignerto integrate multi stRNA-seq
v 1.5.6#
pp Module#
Added
mt_startswithargument to control theqcin mouse or other species.
utils Module#
Added
schistmethod to cluster the single cell RNA-seq
single Module#
Fixed the import error of
palantirin SEACellsAdded
CEFCONmodel to identify the driver regulators of cell fate decisions
bulk2single Module#
Added
use_repandneighbor_repargument to configure the nocd
space Module#
Added
SpaceFlowto identify the pseudo-spatial map
v 1.5.8#
pp Module#
Added
score_genes_cell_cyclefunction to calculate the cell cycle
bulk Module#
Fixed
dds.plot_volcanotext plot error when the version ofadjustTextlarger than0.9
single Module#
Optimised
MetaCell.loadmodel loading logicFixed an error when loading the model usng
MetaCell.loadAdded tutorials of
Metacells
pl Module#
Add pl as a unified drawing prefix for the next release, to replace the drawing functionality in the original utils, while retaining the drawing in the original utils.
Added
embeddingto plot the embedding of scRNA-seq usingov.pl.embeddingAdded
optim_paletteto provide a spatially constrained approach that generates discriminate color assignments for visualizing single-cell spatial data in various scenariosAdded
cellproportionto plot the proportion of stack bar of scRNA-seqAdded
embedding_celltypeto plot the figures both celltype proportion and embeddingAdded
ConvexHullto plot the ConvexHull around the target cellsAdded
embedding_adjustto adjust the text of celltype legend in embeddingAdded
embedding_densityto plot the category density in the cellsAdded
bardotplotto plot the bardotplot between different groups.Added
add_palueto plot the p-threshold between different groups.Added
embedding_multito support themudataobjectAdded
purple_colorto visualize the purple palette.Added
vennto plot the venn from set 2 to set 4Added
boxplotto visualize the boxdotplotAdded
volcanoto visualzize the result of differential expressed genes
v 1.5.9#
single Module#
Added
slingshotinsingle.TrajInferFixed some error of
scLTNNAdded
GPUmode to preprocess the dataAdded
cNMFto calculate the nmf
space Module#
Added
Spatrioto mapping the scRNA-seq to stRNA-seq
v 1.6.0#
Move CEFCON,GNTD,mofapy2,spaceflow,spatrio,STAligner,tosica from root to external module.
space Module#
Added
STTinomicverse.spaceto calculate the spatial transition tensor.Added
scSLATinomicverse.externalto align of different spatial slices.Added
PROSTinomicverse.externalandsvginomicverse.spaceto identify the spatially variable genes and domain.
single Module#
Added
get_results_rfcinomicverse.single.cNMFto predict the precise cluster in complex scRNA-seq/stRNA-seqAdded
get_results_rfcinomicverse.utils.LDA_topicto predict the precise cluster in complex scRNA-seq/stRNA-seqAdded
gptcelltypeinomicverse.singleto annotate celltype using large language model #82.
pl Module#
Added
plot_spatialinomicverse.plto visual the spot proportion of cells when deconvolution
v 1.6.2#
Support Raw Windows platform
Added
mdeinomicverse.ppto accerate the umap calculation.
v 1.6.3#
Added
ov.setting.cpu_initto change the environment to CPU.Move module
tape,SEACellsandpalantirtoexternal
Single Module#
Added
CytoTrace2to predict cellular potency categories and absolute developmental potential from single-cell RNA-sequencing data.Added
cpdb_exact_targetandcpdb_exact_sourceto exact the means of special ligand/receptorAdded
gptcelltype_localto identify the celltype using local LLM #96 #99
Bulk Module#
Added
MaxBaseMeancolumns in dds.result to help people ignore the empty samples.
Space Module#
Added
**kwargsinSTT.compute_pathwayAdded
GraphSTto identify the spatial domain
pl Module#
Added
cpdb_network,cpdb_chord,cpdb_heatmap,cpdb_interacting_network,cpdb_interacting_heatmapandcpdb_group_heatmapto visualize the result of CellPhoneDB
utils Module#
Added
mclust_pyto identify the Gaussian Mixture clusterAdded
mclustmethdo inclusterfunction
v 1.6.4#
Bulk Module#
Optimised pyGSEA’s
geneset_plotvisualisation of coordinate effectsFixed an error of
pyTCGA.survival_analysiswhen the matrix is sparse. #62, #68, #95Added tqdm to visualize the process of
pyTCGA.survial_analysis_allFixed an error of
data_drop_duplicates_indexwith remove duplicate indexes to retain only the highest expressed genes #45Added
geneset_plot_multiinov.bulkto visualize the multi results of enrichment. #103
Single Module#
Added
mellon_densityto calculate the cell density. #103
PP Module#
Fixed an error of
ov.pp.pcawhen pcs smaller than 13. #102Added
COMPOSITEinov.pp.qc’s method to predicted doublet cells. #103Added
speciesargument inscore_genes_cell_cycleto calculate the cell phase without gene manual input
v 1.6.6#
Pl Module#
Fixed the ‘celltyep_key’ error of
ov.pl.cpdb_group_heatmap#109Fixed an error in
ov.utils.roewhen some expected frequencies are less than expected value.Added
cellstackareato visual the Percent stacked area chart of celltype in samples.
Single Module#
Fixed the bug of
ov.single.cytotrace2when adata.X is not sparse data. #115, #116Fixed the groupby error in
ov.single.get_obs_valueof SEACells.Fixed the error of cNMF #107, #85
Fixed the plot error when
Pycomplexheatmapversion > 1.7 #136
Bulk Module#
Fixed an key error in
ov.bulk.Matrix_ID_mappingAdded
enrichment_multi_concatinov.bulkto concat the result of enrichment.Fixed the pandas version error in gseapy #137
Bulk2Single Module#
Added
adata.var_names_make_unique()to avoid mat shape error if gene not unique. #100
Space Module#
Fixed an error in
construct_landscapeofov.space.STTFixed an error of
get_image_idx_1Dinov.space.svg#117Added
COMMOTto calculate the cell-cell interaction of spatial RNA-seq.Added
starfyshto deconvolute spatial transcriptomic without scRNA-seq (#108)
PP Module#
Updated constraint error of ov.pp.mde #129
Fixed type error of
float128#134
v 1.6.7#
Space Module#
Added
n_jobsargument to adjust thread inextenel.STT.pl.plot_tensor_singleFixed an error in
extenel.STT.tl.construct_landscapeUpdated the tutorial of
COMMOTandFlowsig
Pl Module#
Added
legend_awargsto adjust the legend set inpl.cellstackareaandpl.cellproportion
Single Module#
Fixed the error of
get_resultsandget_results_rfcincNMFmodule. (#143) (#139)Added
sccafto obtain the best clusters.Fixed the
.strerror in cytotrace2 (#146)
Bulk Module#
Fixed the import error of
gseapyinbulk.geneset_enrichmentOptimized code logic for offline enrichment analysis, added background parameter
Added
pyWGCNApackage replace the raw calculation of pyWGCNA (#162)
Bulk2Single Module#
Remove
_stat_axisinbulk2single_data_prepareand useindexinstead of it (#160).
PP Module#
Fixed a return bugs in
pp.regress_and_scale(#156)Fixed a scanpy version error when using
ov.pp.pca(#154)
v 1.6.8#
Bulk Module#
Fixed the error of log_init in gsea_obj.enrichment (#184)
Added
axargument to visualize thegeneset_plot
Space Module#
Added CAST to integrate multi slice
Added
crop_space_visiuminomicverse.tlto crop the sub area of space data
Pl Module#
Added
legendargument to visualize thecpdb_heatmapAdded
text_showargument to visualize thecellstackareaAdded
ForbiddenCitycolor system
v 1.6.9#
PP Module#
Added
recover_countsto recovercountsafterov.pp.preprocessremoved the lognorm layers added in
ov.pp.pca
Single Module#
Added
MultiMapmodule to integrate multi speciesAdded
CellVoteto vote the best cellsAdded
CellANOVAto integrate samples and correct the batch effectAdded
StaViato calculate the pseudotime and infer trajectory.
Space Module#
Added
ov.space.clusterto identify the spatial domainAdded
Binaryfor spatial clusterAdded
Spateoto calculate the SVG
v 1.7.0#
Added cpu-gpu-mixed to accelerate the analysis of scrna-seq using GPU.
Changed the logo presentation of Omicverse to ov.plot_set
Bulk Module#
Added
limma,edgeRin different expression gene analysis. (#238)Fixed the version error of
DEseq2analysis.
Single Module#
Added
lazyfunction to calculate all function of scrna-seq (#291)Added
generate_scRNA_reportandgenerate_reference_tableto generate the report and reference (#291) (#292)Fixed
geneset_preparenot being able to read gmt not split by\t\t(#235) (#238)Added
geneset_aucell_tmp,pathway_aucell_tmp,pathway_aucell_enrichment_tmpto test the chunk_size (#238)Added data enhancement of
FateAdded
plot_atlas_view_ovin VIAFixed an error when the matrix is too large in
recover_counts.Added
forceatlas2to calculate theX_force_directed.Added
miloandscCODAto analysis different celltype abundance.Added
mementoto analysis different gene expression.
Space Module#
Added
GASTONto learn a topographic map of a tissue slice from spatially resolved transcriptomics (SRT) data (#238)Added super kwargs in
plot_tensor_singleof STT.Updated
COMMOTusing GPU-accerlate
Plot Module#
Added
dotplot_doublegroupto visual the genes in doublegroup.Added
transposeargument ofcpdb_interacting_heatmapto transpose the figure.Added
calculate_gene_densityto plot the gene’s density.
v 1.7.1#
Single Module#
Fixed some error of
ov.single.lazy.Fixed the format of
ov.single.generate_scRNA_reportUpdated some functions of
palantirAdded
CellOntologyMapperto map cell name.
v 1.7.2#
Pl Module#
Optimated the plot effect of
ov.pl.box_plotOptimated the plot effect of
ov.pl.volcanoOptimated the plot effect ofov.pl.violinAdded beautiful dotplot than scanpy (#318)
Added the similar visualization function of CellChat. (#313)
Space Module#
Added 3D cell-cell interaction analysis in
COMMOT(#315)
Single Module#
Fixed the error of pathway_enrichment. (#184)
Added SCENIC module with GPU-accerlate. (#331)
utils Module#
Added scICE to calculate the best cluster (#329)
v 1.7.6#
LLM Module#
Added
GeneFromer,scGPT,scFoundation,UCE,CellPLMto call directly in OmicVerse.
Pl Module#
Optimized the visualization effect of embedding.
Added
ov.pl.umap,ov.pl.pca,ov.pl.mde, andov.pl.tsne
v 1.7.8#
Implemented lazy loading system that reduces import omicverse time by 40% (from ~7.8s to ~4.7s).
Added GPU-accelerated PCA support for Apple Silicon (MLX) and CUDA (TorchDR) devices.
Introduced Smart Agent System with natural language processing for 50+ AI models from 8 providers.
Added and fixed the anndata-rs to support million size’s datasets (#336)
PP Module#
Added GPU-accelerated PCA in
ov.pp.pca()with MLX support for Apple Silicon MPS devicesAdded TorchDR-based PCA acceleration in
ov.pp.pca()for NVIDIA CUDA devicesAdded smart device detection and automatic backend selection in
init_pca()andpca()functionsAdded graceful fallback to CPU implementation when GPU acceleration fails
Added enhanced verbose output with device selection information and emoji indicators
Added optimal component determination based on variance contribution thresholds in
init_pca()Added GPU-accelerated SUDE dimensionality reduction in
ov.pp.sude()with MLX/CUDA supportOptimize the
ov.pp.qcand added ribosome and hb-genes to know more information of data quantity.
Datasets Module#
Complete elimination of scanpy dependencies for faster loading
Added dynamo-style dataset framework with comprehensive collection
Added robust download system with progress tracking and caching
Added enhanced mock data generation with realistic structure
Added support for h5ad, loom, xlsx, and compressed formats
Agent Module#
Added multi-provider LLM support (OpenAI, Anthropic, Google, DeepSeek, Qwen, Moonshot, Grok, Zhipu AI)
Added natural language processing for both English and Chinese
Added code generation architecture with local execution
Added function registry system with multi-language aliases
Added smart API key management and provider-specific configuration
Bulk Module#
Added
BayesPrimeandScadento deconvoluted Bulk RNA-seq’s celltype proportion.Added
alignmentto alignment the fastq to counts.
Single Module#
Added
ov.single.Annotationandov.single.AnnotationRefto annotate the cell type automatically.Added
ov.alignment.singleto alignment the scRNA-seq to counts directly.
v 1.7.9#
Implemented smart lazy loading system that dramatically reduces import omicverse time by 85.6x (from ~16.57s to ~0.19s).
Enhanced RNA-seq alignment workflow with comprehensive toolkit for FASTQ processing and counting.
Optimized dataset management with nested directory creation for better organization.
Performance Optimization#
Lazy Loading System:
Implemented module-level lazy loading using
__getattr__mechanism for all major modulesAdded attribute-level lazy loading for frequently-used functions (read, palette, Agent, etc.)
Introduced intelligent caching system to ensure instant access after first load
Reduced initial import time from 16.57 seconds to 0.19 seconds (85.6x speedup)
Maintained full backward compatibility - all existing code works without modification
Preserved complete IDE support with tab completion via
__dir__()implementationFixed circular import issues by delaying settings module initialization
MkDocs API documentation generation fully compatible with lazy loading
Benefits for Users:
⚡ Instant startup for Jupyter notebooks and scripts
🎯 Load only what you use - modules imported on first access
💾 Reduced memory footprint for simple tasks
🔄 Second access is cached and instant (< 0.001s)
Alignment Module#
New Comprehensive RNA-seq Alignment Toolkit:
Added complete end-to-end workflow for processing raw sequencing data:
ov.alignment.prefetch: Download SRA datasets from NCBI with built-in retry logicov.alignment.fqdump: Convert SRA to FASTQ format with parallel processing supportov.alignment.parallel_fastq_dump: High-performance parallel FASTQ extractionov.alignment.fastp: Quality control and adapter trimming for FASTQ filesov.alignment.STAR: RNA-seq alignment using STAR aligner with customizable parametersov.alignment.featureCount: Gene-level read counting (renamed fromcountto avoid conflicts)ov.alignment.single: One-command scRNA-seq alignment with kb-python (kallisto|bustools)ov.alignment.ref: Build kallisto|bustools reference index for alignmentov.alignment.count: Quantify gene expression from aligned reads
Key Features:
Unified API for both bulk RNA-seq (STAR + featureCount) and scRNA-seq (kb-python) workflows
Built-in support for RNA velocity analysis with kb-python
Parallel processing capabilities for faster data conversion
Automatic handling of paired-end and single-end reads
Technology-specific filtering for bulk vs single-cell data
Integration with SRA toolkit for seamless data download
Example Workflow:
# Download and process bulk RNA-seq
ov.alignment.prefetch('SRR1234567', output_dir='./data')
ov.alignment.fqdump('SRR1234567', output_dir='./fastq')
ov.alignment.fastp('sample_1.fastq.gz', 'sample_2.fastq.gz', output_prefix='clean')
ov.alignment.STAR(fastq1='clean_1.fastq.gz', fastq2='clean_2.fastq.gz',
genome_dir='./genome', output_prefix='aligned')
ov.alignment.featureCount(bam='aligned.bam', annotation='genes.gtf', output='counts.txt')
# Or use one-command scRNA-seq alignment
ov.alignment.single(
fastq=['read1.fastq.gz', 'read2.fastq.gz'],
index='./kb_index',
output_dir='./kb_output',
technology='10xv3'
)
PP Module#
Fixed an HVG (Highly Variable Genes) selection issue in
ov.pp.preprocessImproved preprocessing pipeline stability and accuracy
Refactored PCA implementation to utilize
torch_pcafor GPU acceleration (replacing TorchDR)Enhanced support for sparse matrices in PCA computation
Updated PCA embedding basis from
X_pcatoPCAfor clarity and consistencyImproved error handling with try-except blocks in PCA computation
Fixed PCA GPU mode support with sparse matrices to avoid memory errors
Single Module#
Added
CONCORDmethod toov.single.batch_correctionfor single-cell data integrationEnhanced batch correction capabilities with state-of-the-art algorithm
Fixed critical performance issue in pySCENIC: Reverted inefficient correlation calculation optimization that caused memory issues and slowdowns in scRNA-seq data
Removed misleading warnings about dropout genes in SCENIC correlation calculations
Restored memory-efficient pairwise correlation computation (prevents OOM with >20k genes)
SCENIC now uses original approach: calculate correlations only for specific TF-target pairs instead of creating full gene×gene matrices
Added
ov.single.find_markersfor unified marker gene identification supporting five methods:cosg,t-test,t-test_overestim_var,wilcoxon, andlogreg; statistical methods are natively ported from scanpy with no scanpy runtime dependency and numerically consistent results (rtol=1e-4)Added
ov.single.get_markersto extract top marker genes from results as aDataFrameordict, with support for single/multiple cluster filtering and optional filtering bymin_logfoldchange,min_score, andmin_pval_adj; output includespct_groupandpct_restcolumns showing cell expression proportions within and outside each cluster
Space Module#
Added
FlashDeconvfor fast, GPU-free deconvolution in Visium spatial transcriptomicsAdded
Banksyclustering method for spatial domain identificationUpdated spatial analysis documentation with new clustering approaches
Web Module#
Launched
Omicverse-Notebookfor browser-based interactive analysis without local installationLaunched
Omicverse-Webfor web-based data analysis without coding requirementsDemocratized bioinformatics analysis for researchers without programming background
Agent Module#
Enhanced
ov.Agentwith improved natural language processing for data analysisExpanded LLM provider support and model selection
Optimized code generation and execution pipeline
Pl Module#
Enhanced categorical legend handling for scatterplot embeddings
Added
legend_loc='on data'option for direct annotation on plotsImproved visualization clarity for complex datasets
Added
ov.pl.markers_dotplotas a cleaner drop-in forrank_genes_groups_dotplotwith improved defaults (standard_scale='var',cmap='Spectral_r',dendrogram=False)Fixed
KeyErrorinrank_genes_groups_dfwhen cluster names are numeric strings (e.g., leiden'0','1'); now correctly handles structured arrays, DataFrames, and plain 2D arrays from all marker methods
Datasets Module#
Added comprehensive dataset URLs for easier data access
Expanded data downloading utilities with progress tracking
Fixed dataset download to create nested target directories automatically
Improved dataset utilities with better error handling
Refreshed download behaviors for more reliable data fetching
Docs#
Strengthened data handling documentation in dotplot and DEG analysis tutorials
Updated the scTour clustering tutorial with latest best practices
Added comprehensive release notes for v1.7.9
Enhanced alignment module documentation with end-to-end workflows
Bug Fixes#
Resolved circular import issues between
_settingsandutilsmodulesFixed compatibility issues with latest package versions (zarr, pandas, etc.)
Improved error handling in parallel processing functions
Single Module#
Enhanced DEG Analysis with Expression Percentages: Added cell expression percentage information to differential expression results
Added
pct_ctrlcolumn showing percentage of cells expressing each gene in control group (0-100%)Added
pct_testcolumn showing percentage of cells expressing each gene in test group (0-100%)Added
pct_diffcolumn showing the difference in expression percentage (pct_test - pct_ctrl)Works with all DEG methods:
wilcoxon,t-test, andmemento-deEnables better marker gene identification by filtering genes based on expression prevalence
Similar to dotplot circle size information, helps identify genes with widespread vs. sparse expression patterns
Example Usage:
deg_obj = ov.single.DEG(adata, condition='condition',
ctrl_group='Control', test_group='Treatment')
deg_obj.run(celltype_key='cell_label', celltype_group=['T_cells'])
results = deg_obj.get_results()
# Now includes pct_ctrl, pct_test, pct_diff columns
Compatibility#
NumPy 2.0 Compatibility: Fixed all NPY201 compatibility issues to ensure seamless support for both NumPy 1.x and 2.x
Fixed Issues (31 total):
np.in1d→np.isin(9 instances)omicverse/bulk/_dynamicTree.py: 3 instances (lines 697, 741)omicverse/single/_cosg.py: 1 instance (line 77)omicverse/external/GNTD/_preprocessing.py: 2 instancesomicverse/external/scdiffusion/guided_diffusion/cell_datasets_WOT.py: 1 instanceOther external modules: 2 instances
np.row_stack→np.vstack(13 instances)omicverse/external/CAST/CAST_Projection.py: 2 instancesomicverse/external/CAST/visualize.py: 2 instancesomicverse/external/scSLAT/viz/multi_dataset.py: multiple instancesomicverse/single/_mdic3.py: 1 instance
np.product→np.prod(4 instances)omicverse/external/umap_pytorch/model.py: 2 instancesomicverse/external/umap_pytorch/modules.py: 2 instances
np.trapzcompatibility wrapper (2 instances)Added compatibility wrapper in:
omicverse/external/VIA/plotting_via.pyomicverse/external/VIA/plotting_via_ov.py
Uses
numpy.trapezoid(NumPy 2.0+) with fallback tonumpy.trapz(NumPy 1.x)
Backward Compatibility:
✅ All changes maintain full backward compatibility with NumPy 1.x (1.13+)
✅
np.isinavailable since NumPy 1.13✅
np.vstackavailable in all NumPy versions✅
np.prodavailable in all NumPy versions✅ Custom compatibility wrapper handles
trapz/trapezoidtransition
v 1.7.10#
Scope#
This release note summarizes changes from commit
cd3d151(version set to1.7.10rc1) to currentHEAD.Total code delta in this window:
252 files changed,+46,992 / -9,752.
Agent & Runtime#
Upgraded
ov.Agentarchitecture to modern agentic tool-calling workflows with subagent delegation (v4/v5 evolution).Improved GPT-5.2 robustness, response parsing, and backend error handling.
Added harness runtime components for execution contracts, tool catalog, runtime state, tracing, and cleanup policies.
Strengthened sandbox behavior with restricted import controls for internal modules.
Added web bridge and session-level execution improvements for agent workflows.
New Modules#
Added
omicverse.biocontextfor biomedical knowledge queries via BioContext MCP tooling.Added
omicverse.fm(foundation-model adapters, routing, registry, and API).Added structured
omicverse.ionamespaces for general/single/bulk/spatial I/O paths.Added
omicverse.jarvismulti-channel bot framework (Feishu/QQ/Telegram) with bridge support.
Core OmicVerse Improvements#
Continued enhancements across
pp,pl,single,space, andutilsmodules.Fixed circular import between preprocessing utility internals (
_utils.pyand_scale.pypath).Added/updated function-level metadata and documentation quality in key analysis modules (preprocessing, annotation, trajectory, spatial, datasets, bulk).
Extended dataset utilities with new signature resources and improved loading pathways.
Registry & Help System#
Improved registry behavior and module import exposure in package entrypoints.
Enhanced function/class registration metadata coverage for agent discoverability.
Registry help generation now better aligns with class constructor documentation in class-based tools.
Web & UI#
Single-cell analysis UI received iterative upgrades:
Better code cell management and undo behavior
Improved AnnData slot detail retrieval and display
Better DataFrame rendering and integration
Plot density/point style control refinements
i18n and UX polish for analysis panels
omicverse_webservice layer expanded with session-oriented agent service support.
Developer Experience & Testing#
Added FM test suite and multiple harness/ovagent test modules.
Removed obsolete legacy-priority and complexity-classifier test paths.
Added workflow and harness documentation pages for runtime contracts and operational guidance.
Documentation#
Updated and expanded agent architecture and streaming API docs.
Updated
t_preprocess_cpu.ipynbto match latest GPU/version detection behavior.Added bilingual and deployment-oriented guidance for Jarvis and agent-related workflows.
v 2.1.x#
Scope#
Summarises every change between
v2.0.0(tagged 2026-03-18) and the current dev tree (master+ the in-flightfeat/metabolandfeat/alignment-16s-ampliconbranches that land in v2.1.x).Window stats on
master: 462 commits, 429 files changed, +98,799 / −17,461 lines. Plus the two pending feature branches addov.metabol(12 commits) andov.micro+ov.alignment(16S amplicon — 5 commits).Three top-level themes drove the release:
New bio modules —
ov.metabol(metabolomics),ov.micro+ov.alignment(16S amplicon → microbiome),ov.single.Monocle,ov.io.read_xenium,ov.utils.cluster(method='pymclustR'), CellSAM, CellCharter, Harmony v0.2, Marsilea heatmaps, CCC plotting, Nanostring spatial, dynamic trajectories, anndataoom OOM Rust backend.Spatial-platform support — Xenium In Situ end-to-end (read + segmentation + viz), Visium HD SpaceRanger v4 round-trip (cellpose +
write_visium_hd_cellseg), Nanostring CosMx FOV-aware plotting.OVAgent / Jarvis runtime overhaul — ~60
task-*commits across PRs #596–#605 (facade slimming, multi-channel migration, P0/P1 security closures, Codex OAuth).
Single-cell trajectory inference#
ov.single.Monocle— pure-Python Monocle 2 reimplementation: a from-scratch port of the Rmonocle2pipeline (DDRTree, MST, branch-state assignment, BEAM differential expression). Norpy2, no R install. Includes:method='fast'DDRTree update (default): cachesX X.T, solves in(K × D)instead of(K × N), truncates the soft-assignment R to the top-K/5entries per row → ~3× faster per iteration with bounded numerical drift.method='exact'retained for bitwise R-parity.Delaunay-based Euclidean MST in
_project_cells_to_mst— replaces the O(N²) dense pairwise distance matrix (164 GB for a 143 k-cell atlas in R) with an O(N·d) Delaunay + sparse MST that is provably exact.Robust HSMM tutorial output matching the R Monocle 2 reference direction.
Pseudotime correlation with R Monocle 2 ≥ 0.99 on every benchmark dataset; 30-100× faster.
Also published as a standalone PyPI package (
monocle2-py) for users who want trajectory inference without the full omicverse stack.
Dynamic-trajectory utilities: new feature-fitting + lineage-aware trend plotting; better Palantir trend visualisation; cleanup of Slingshot debug plots and sctour input stabilisation.
Spatial omics#
10x Xenium In Situ — end-to-end support (PR #629)#
ov.io.read_xenium— full reader for the standard Xeniumouts/layout:cell_feature_matrix.h5(or.h5ad) →adata.Xcells.csv.gz/cells.parquet→adata.obs, withx_centroid/y_centroid→adata.obsm['spatial']experiment.xeniumJSON →adata.uns['spatial'][library_id]['metadata']Auto-resolves
library_idfromexperiment.xenium(region_name→run_name).Exposed as both
ov.io.spatial.read_xeniumandov.io.read_xenium.Verified against the
Xenium_FFPE_Human_Breast_Cancer_Rep1public sample.
load_boundaries=Trueparameter loadscell_boundaries.parquet/.csv.gz(Xenium’s long-form per-vertex table) into per-cell WKT POLYGON strings — setsuns['omicverse_io']['type'] = 'xenium_seg'so downstream code (matchingnanostring_seg) can render cell boundaries directly viaov.pl.spatialseg.cache_file+ smart pyramid-level image loading — pick the right resolution from the multi-resolution morphology TIFF without loading the full pyramid.New tutorial
t_xenium_preprocess.ipynbshowing read → preprocess → spatialseg overlay (verified on KRT7 in the Breast Cancer sample).
10x Atera (WTA Preview) — end-to-end support (PR #700, omicverse-tutorials#30)#
ov.io.read_atera— reader for the 10x Atera (whole-transcriptome)outs/bundle. Atera ships a Xenium-format core (cell_feature_matrix.h5,cells.parquet,cell_boundaries.parquet,experiment.xenium) plus four Atera-only additions, all handled by the reader:nucleus_boundaries.parquet→obs['nucleus_geometry'](WKT POLYGON).morphology_focus/ch####_<tag>.ome.tif— content-named multi-stain pyramid TIFFs. Channel selector accepts a semantic tag ('dapi'/'boundary'/'rna'/'stroma'), a filename substring ('cd45','18s'), or an integer-as-string index. tifffile’s OME multi-file series is bypassed (is_ome=False) so each channel’s pyramid IFDs are walked standalone.Optional vendor
cell_groups.csvmerge →obs['cell_group']andobs['cell_group_color'], NaN-preserving for cells absent from the CSV.Optional H&E OME-TIFF + 3×3 affine CSV →
uns['spatial'][lib]['images']['he']andscalefactors['he_affine']/'he_downsample'.Verified on the public
WTA_Preview_FFPE_Breast_Cancersample (170,057 cells × 18,028 genes after dropping 9,076 control probes/codewords).
New helpers extracted from the tutorial —
ov.pl.to_rgb_grayscale(percentile-clipped RGB stack so morphology overlays bypass matplotlib’s default viridis colormap),ov.pl.sync_categorical_palette(wires a per-cell colour column intouns[<key>_colors]),ov.space.subset_window(rectangular spatial-window subset preservinguns).New tutorial
t_atera_preprocess.ipynb— load the bundle, render the four morphology channels in a 2 × 2 grid, plot the vendor cell-group classifier withov.pl.spatial, render polygon zooms viaov.pl.spatialseg(img_key=...)against each of the four channels, run a standard preprocessing → HVG → PCA pipeline, and map canonical breast-cancer markers (KRT8 / PTPRC / COL1A1 / PECAM1).
10x Visium HD — SpaceRanger v4 compat (PRs #620, #622)#
ov.io.write_visium_hd_cellseg— export cell-level AnnData back to a SpaceRanger v4 directory structure so downstream tools (Loupe Browser, spaceranger-aware pipelines) can consume the cellpose / CellSAM output as if it came straight out of SpaceRanger v4.Cell IDs use the SpaceRanger v4 convention —
cellid_000000001-1(zero-padded, suffixed with the slice index) rather than the older spot-id format.Image / scalefactors handling simplified; uses existing shapely polygon generation instead of redoing convex hulls.
Cellpose tutorial rewritten end-to-end with executed outputs (0 errors, 7–16 baked-in figures across iterations) showing both Cellpose vs CellSAM segmentation comparison on a 1/16 crop of the HD sample, and the SpaceRanger v4-compatible export round-trip.
Nanostring CosMx — FOV-aware plotting#
New
ov.pl.spatial_*family additions for FOV-aware plotting (multi-FOV layout, per-FOV background image overlay, rasterisation options for very dense slides).FOV image processing pipeline picks up cropping / rasterisation hints from
uns.
Cell segmentation backends#
CellSAM backend added (alongside the existing Cellpose backend). Uses cropped images for the standard 10x HD flow.
stardist()→cellseg()rename with backward-compatible alias — clearer name now that there are multiple backends behind it.
Other spatial features#
CellCharter integration: new
method='cellcharter'inov.utils.clusterplus enhancedspatial_neighborsgraph construction. Includes AutoK export, Banksy Jupyter compat, and pickle cross-version loading fixes.ov.pl.create_custom_colormap: generic palette helper, registered for agent discoverability.Spatial-segmentation overlay fixes: cmap alpha now honoured in
outline_only; FOV image processing supports rasterization options.Spatial background image scaling: fixed bug where the H&E background was rendered tiny (coords were not scaled). Affected all
sc.pl.spatial-style overlays.Starfysh tutorial compatibility restored against modern numpy/pandas/scipy/pytorch (also pins
s3fs ≥ 2023.1.0to avoid Python 3.12 build failures).
Preprocessing & QC#
ov.pp.qc(adata, doublets='pydoubletfinder')— pure-Python DoubletFinder backend alongside the existingscrubletbackend. No R install needed; matches the RDoubletFinderpackage within published-tutorial AUC range.Auto-detect mitochondrial gene prefix in
pp.qc— no more hand-setting'mt-'vs'MT-'per species. Explicit override still respected.Harmony upgrade to upstream
harmonypy v0.2.0with three backends:GPU: existing PyTorch path (unchanged).
CPU NumPy: pure-NumPy fallback for environments without torch/MLX.
MLX: native Apple-Silicon (MPS) backend rewritten to use pure MLX operations (no numpy round-trip). Restored emoji/colour/tqdm output,
n_init=1to match upstream, and reproducibility seeds. Multiple review rounds for MLX correctness (lambda double-insert, slice-assignment crash, deadself._W).
torch_pcasparse +covariance_eighfallback with dynamic memory limits — high-density sparse matrices now convert to dense before CPU PCA (instead of OOM-ing); float64 estimate, OSError handling, and dedup’d memory/threshold constants.scale()default:to_sparse=False(no surprise sparse → dense round trips).Removed:
ov.pp.scrubletlegacy module (useov.pp.qc(doublets='scrublet')or'pydoubletfinder').
I/O & out-of-memory backend#
anndataoomRust backend (optional, opt-in): out-of-memory AnnData reads via Rust. New helperomicverse.utils.convert_adata_for_rustand a dedicatedt_preprocess_rusttutorial. Multiple review rounds (1st through 7th) for stability, lazy-loading semantics, and clear error paths (no_cc=Trueis now refused on the OOM backend; unsorted CSR h5ad gets a precise diagnostic; spatial viz works on lazy AnnData).ov.io.read_xeniumadded (also covered above under Spatial).ov.readdocstring refreshed for the Rust backend’s behaviour.
Plotting#
Marsilea-based heatmap plotting APIs (new family): clean, declarative heatmaps with regression coverage in the test suite.
Cell-cell communication (CCC) plotting APIs (
ov.pl.ccc_*): arrow / sigmoid / flow / scatter / chord-style ligand-receptor plots, with empty-interaction-palette guards and refined flow layouts. Aligned with the CellPhoneDB registry metadata.ov.pl.create_custom_colormapwith white-ramp duplicate de-duplication.Half-violin boxplot function introduced; old equivalent deprecated.
Subset plotting now drops unused categories so legends are accurate; legendkit
show_ataccepts 0–1 percentiles (not raw data values).Iterative refactor of plotting imports (lazy loading, optional torch dependency handling, deprecated old utilities in favour of the new embedding helpers).
ov.utils.cluster — pymclustR backend (PR #638)#
New
method='pymclustR': pure-Python re-implementation of CRANmclustcovering all 14 Banfield-Raftery / Celeux-Govaert covariance parameterisations. Drop-in replacement for the legacy'mclust_R'rpy2 backend, which is now removed (calling it raises aValueErrorpointing at the replacement).Validation against CRAN
mclust 6.1.1across 222 records: 12 of 14 models bitwise-identical, overall mean z-correlation 0.9935.Published as
pymclustRon PyPI.
Datasets & utilities#
Gene ID conversion functions added to
ov.utils(with conflict-column handling on merge), plus a database-validation helper.Function search uses a global registry fallback for better discoverability across modules.
Removed
pymdedependency; tightened scipy pin and FOV image processing.Removed deprecated data files; cleaned up
biocontextmodule exports.
ov.Agent / Jarvis / OVAgent runtime#
A multi-month decomposition + hardening of the agent runtime stack. ~60 numbered task-* commits across PRs #596 (codex OAuth), #600 (runtime upgrade), #601 (decomposition follow-up), #602 (PR-602 reviewer follow-ups), #603 (security + runtime hardening), #604/#605 (orchestration waves 3 & 4).
Highlights:
Smart-agent facade slimming:
smart_agent.pycollapsed; bootstrap, auth, runtime setup, codegen/review/reflection pipeline, session/context/history services, and tool facade extracted into focused modules.OVAgent runtime decomposition:
analysis_executor.py,tool_runtime.py,agent_backend.py,turn_controller.pyeach split into transformer/diagnostic/sandbox/handler helpers.Tool dispatch facade collapse: the
~45-wrapperCodegenToolDispatchFacadeMixinreduced to 3 concrete delegations.Subagent infrastructure: configurable subagent profile schema, override plumbing through runtime bootstrap, and end-to-end coverage.
Jarvis multi-channel migration onto a single
MessageRuntime: Discord, Telegram, QQ, WeChat, Feishu, iMessage all now use the shared runtime / presenter / command / media abstractions. Channel-core session/request abstractions were also extracted (channel_media).Tool runtime hardening: bounded sync bridge, fail-closed bash roots, explicit stdout guard, sandbox runtime enforcement,
pandas.evalclassification, repair-loop retry boundedness, LLM timeout, and dependency-safe parallel tool scheduler.Security closures (P0/P1 from PR #601 review):
SafeOsProxymetadata escape closed via allowlist,strip_local_pathsReDoS-hardened, URL substring assertions replaced with parsed-hostname validation (CodeQL clean), CodeQL sensitive-data alerts cleared in real-provider E2E test, session-facade lock-test portability fixed.Backend hardening: Gemini and OpenAI adapter exception handling, credential resolution + factory consolidation, context budget model-window registry sync, bounded silent-fallback debug logging.
Web bridge & sessions: prior-history retrieval per session,
.h5adchannel handling, AgentBridge ↔ messaging-channel reply-text plumbing, shared kernel + AnnData support.Codex OAuth support added to
ov.Agent(PR #596).CLI / install:
omicclawentrypoint replacesomicverseweb; install.sh rewritten as a guided installer with optional package menus (web, Jarvis, full bio suite); MCP startup-timeout troubleshooting documented for Windows.
Documentation & infrastructure#
Sphinx + Furo migration (later switched to
sphinx_book_theme): the docs site moved from MkDocs to Sphinx. Bilingualdocs/+docs_zh/checkouts,.readthedocs.yamlconfig with repo-root-relative paths, deploy-docs CI workflow migrated, gh-pages history accumulation prevented. PinnedPygments < 2.20.Multi-Omics docs reorganisation:
Tutorials-bulk2single/folder moved underTutorials-Multi-Omics/bulk-single/with an overview page, in line with how Multi-Omics is now framed as a top-level domain alongside Bulk / Single / Spatial.Spatial-clustering tutorials: original combined
t_cluster_space.ipynbsplit into 5 self-contained per-method notebooks underdocs/Tutorials-space/cluster/(GraphST, BINARY, STAGATE, CAST, BANKSY), all clustered withmethod='pymclustR'(no rpy2). New parentcluster/index.mdwith embedder paper DOIs and a recommendation table.Cellpose tutorial rewritten with executed CellSAM vs Cellpose comparison and SpaceRanger v4-compatible export.
CCC plotting API docstrings expanded; OmicVerse Agent skill guidance updated for CellPhoneDB.
Dependencies: pinned
s3fs ≥ 2023.1.0(Python 3.12), updated scipy, removedpymde, removedsocksiofrom default deps.CI: dev PRs now run package + MCP workflows; flake8 excludes virtualenvs; lint regressions cleaned up across namespace exports and bootstrap tests.
ov.metabol — new metabolomics module (PR #636, branch feat/metabol)#
A from-scratch metabolomics analysis namespace mirroring the structure of ov.bulk and ov.single. Twelve commits, several shipped sub-releases (v0.1 → v0.3).
v0.1 — base
ov.metabolnamespace registered; LC-MS reader; vendoredgseapyfor environments without it.ID mappingbetween HMDB / KEGG / LIPID MAPS / PubChem / ChEBI identifiers.Lipidomics class-level summarisation.
v0.2 — pathway analysis
pyMSEA(metabolite set enrichment) with KEGG / LION / HMDB pathway sources.pyMummichog(peak-list-based pathway inference for unannotated LC-MS features).pathway_bar,pathway_dotplots;volcanogainsuse_pvalue+clip_log2fcoptions.
v0.3 — multi-factor + biomarker
SERRF — QC-RF drift correction for batch / acquisition-order systematic bias.
DGCA — differential gene/metabolite co-correlation analysis.
ASCA + MixedLM — multi-factor analysis (e.g. treatment × time, repeated measures).
ROC / biomarker panel — single + multi-feature ROC, AUC bootstrap CIs, panel selection.
Cross-cutting
Every public API registered with
@register_functionsoov.Agentcan dispatch metabolomics workflows.Lazy-loaded submodules — top-level
import omicversecost cut 3200× by deferring metabol’s heavy R-style stats imports until first call.Fetcher-based data migration — KEGG / LION / HMDB pathway tables fetched on demand instead of shipped (drops 3 data files from the wheel).
ov.micro + ov.alignment — microbiome / 16S amplicon (PR #637, branch feat/alignment-16s-amplicon)#
End-to-end 16S rRNA amplicon → microbiome AnnData pipeline.
ov.alignment — upstream sequence processing
cutadaptprimer trimming +vsearchUNOISE3 ASV calling +SINTAXtaxonomic classification.Phylogeny step — multiple sequence alignment via
MAFFT+ tree construction viaFastTree, attached to the AnnData viaov.micro.attach_tree(adata, tree_path).
ov.micro — downstream microbiome analysis
New downstream namespace covering alpha / beta diversity, differential abundance, ordination, taxonomic-level summaries.
Compatible with
scikit-bio 0.6(PR #637 round-2 review fix bumps from the deprecated 0.5 API for the phylogenetic metrics).Class names follow the no-
py-prefix convention (e.g.MicroBiome,Diversity).
Removed / deprecated#
ov.pp.scrubletlegacy module.ov.utils.cluster(method='mclust_R')(rpy2 bridge) — replaced by'pymclustR'.Old plotting utilities deprecated in favour of new embedding helpers +
half_violin_boxplot.pymderuntime dependency.omicverse_webGit submodule (functionality folded intoomicclaw).Legacy ReadTheDocs MkDocs config.
Three shipped metabolomics data files (KEGG / LION / HMDB pathway subsets) — now fetched on demand.
v 2.2.0#
Scope#
Summarises every change between
v2.1.2and thev2.2.0release tip: 208 commits, 212 files changed, +34,878 / −8,888 lines, ~50 merged PRs (#652 → #727).Three top-level themes drove this release:
New analysis modules —
ov.es(vendored decoupler kernels with GPU acceleration),ov.single.NMF(Rust-backed),ov.single.CNV(copykat / infercnv), Geneformer in-silico perturbation,ov.pp.champ,ov.report(one-call HTML provenance report),ov.space.RCTD,ov.space.nmf_tissue_zones.GPU-first preprocessing — pure-PyTorch Leiden (74×), full-GPU parametric UMAP (21×), GPU KNN by default in
pp.neighbors, native torch t-SNE, plus a ~20-method GPU sweep inov.es(gsva 23–28×, mdt/udt 83–158×, viper 20×, gsea 13–16×, ora 48×).Maturation of v2.1.x modules — Atera (WTA Preview) reader, Xenium V2 / Prime morphology fix, paired microbe↔metabolite + cross-cohort 16S meta-analysis in
ov.micro, MTBLS1 case study + plot helpers inov.metabol, Seurat-style CCA insingle.batch_correction, scMulan / MetaTiME / TOSICA insingle.Annotation, CellVote consensus.
ov.es — vendored decoupler kernels with GPU acceleration (PR #722)#
A new top-level enrichment-scoring namespace, vendoring the decoupler 1.x algorithms behind a clean omicverse API and rewriting every method with an optional torch/GPU kernel.
ov.es.decouplerunified dispatcher registered with@register_function— single entry point that picks the right backend by method name.GPU kernels for 10 methods:
aucell,ulm,zscore,mlm,waggr(first wave), thengsea,ora,gsva,viper,mdt/udt. Speedups vs the CPU decoupler reference, on the same matrix:aucell: 0.2× → 7.5× after kernel redesign (bit-exact vs CPU).gsea: 13–16× via batched cumsum ES.ora: 48× via batched hypergeom on torch.gsva: 23–28× across the three pipeline stages.mdt/udt: 83× / 158× via pure-torch GBDT, dropping the xgboost CPU path.viper: 20× on aREA, 4× full pipeline.
GPU-side sparse densify (2–3× across the board), math approximations + loop vectorisation, and GPU memory pattern aligned with
ov.pp._pca(drops per-call cleanup overhead).waggrGPU dispatcher densifies before CPU fallback to avoid hidden round-trips.Method dispatch lifted into each method (no
MethodMetafacade); decoupler’s_docs/_loginfrastructure stripped.Legacy
ov.single.aucellhelpers still work but print a migration notice toov.es.aucell.pytest smoke coverage for all 11 scoring methods.
Tutorial: t_es_compare — side-by-side comparison of all 11 enrichment methods on the same matrix.
Single-cell analysis#
ov.single.NMF— Rust-backed fast NMF (PR #717) vianmf-rs. Auto-K selection via stability-drop heuristic and Brunet K-selection with a consensus heatmap; consensus switched from cell-level to spectra-level for stability. Adds per-factorobscolumns and drops empty categories cleanly. Tutorial: uses the cNMF workflow (t_cnmf) as its reference pipeline.ov.single.CNV+ov.pl.cnv_*(PR #723) — single-cell CNV inference covering both copykat and infercnv backends. Plotting uses a marsilea backend forcnv_heatmapwith multi-strip annotations; split betweengroupby(ordering) andannotations(overlay) for layering. Tutorials: t_copykat, t_infercnv.ov.single.Annotation— scMulan / MetaTiME / TOSICA backends (PRs #719, #720):scMulan added to
Annotation.annotate(modern transformers compat) — tutorial: t_scmulan.MetaTiME and TOSICA wired into the same dispatcher — tutorials: t_metatime, t_tosica.
CellVote consensus score — confidence-aware multi-annotator consensus (PR #719). Tutorial: t_cellvote.
ov.single.batch_correction— Seurat-style CCA (PR #670, #669) — drop-in CCA backend alongside the existing methods; 3 review bug-fixes + UX improvements landed in the same PR (PR #670 review feedback). Tutorial: t_single_batch.ov.single.auto_resolution— null-adjusted (PR #662), per Lange et al. 2004. Renamed fromautoResolution, with a selection-curve plot helper and a newmethod='champ'backend. Tutorial: covered in t_cluster.ov.single.cal_grnfix (#681) — passesgene_namestogrnboost2/genie3so the resulting GRN keeps the real gene labels. Tutorial: t_scenic.SCLLMManager is now the canonical foundation-model entry (PR #704) —
ov.fmnamespace dropped entirely; SCLLM is the single FM surface. Defensive HVG warning + actionable RegDiffusion OOM error insingle/SCENIC. Tutorials: t_scgpt, t_scfoundation, t_cellplm, t_uce.
Geneformer in-silico perturbation (PRs #725, #726)#
ov.llmGeneformer perturbation — in-silico knockout / knock-in via embedding shifts in the Geneformer foundation model.TF-perturbation per-gene downstream-shift analysis (PR #726) — quantifies how each downstream gene shifts in embedding space after a TF perturbation.
New
ov.plhelpers for the perturbation result tables.Tutorial: t_geneformer — shipped end-to-end with executed outputs.
Preprocessing & GPU performance#
Pure-torch GPU Leiden (PR #657) —
ov.pp.leiden(method='torch')is 74× faster than the CPU reference and has notorch_sparse/torch_scatterdependency (one of the most fragile installs in the stack). Pure tensor ops only.Full-GPU parametric UMAP (PR #652) — pumap path is 21× faster with bounded VRAM; batch size bumped to 16384 across the board.
GPU KNN by default for
method='torch'(PR #654) inov.pp.neighbors, with the correct fuzzy simplicial graph (matches the UMAP reference, not an approximation).Native torch t-SNE + louvain auto-redirect (PR #658) — mixed-mode polish so the GPU path is the default when torch is available.
ov.pp.tsneroutes directly to sklearn (dropssc.tl.tsne) — fixes a stalen_componentsforwarding bug (#683).ov.pp.qc(doublets=…)default is nowscdblfinderfor CPU and mixed modes (PR #655).ov.pp.champ— Convex Hull of Admissible Modularity Partitions (PR #666) — new resolution-stability backend withn_seeds,modularity='cpm', adaptive refinement, threewidth_metricmodes (log/linear/relative), and NaN-safe gamma clamping. Wired intoauto_resolutionasmethod='champ'with a landscape plot. Tutorial: t_cluster.GPU preprocessing tutorials: t_preprocess_gpu (Leiden / UMAP / KNN GPU path), t_preprocess_cpu (CPU baseline with the new defaults).
Spatial omics#
ov.space.RCTDdeconvolution backend (PR #710, issue #682) — RCTD added alongside the existing spatial deconvolution backends. Full-slide tutorial with executed outputs; reference dataset shipped asov.datasets.visium_lymph_node. Tutorial: t_decov_rctd.ov.space.nmf_tissue_zones(PR #673) — NMF colocation over spot abundance matrices, withnormalize='rows'option and DataFrame-column inference. Prefersunsfactor names and strips shared prefixes for cleaner zone labels. Tutorial: t_decov Step 6.ov.io.read_atera(PR #700) — full reader for the 10x Atera (WTA Preview) bundle (Xenium-format core + nucleus boundaries + multi-stain morphology + optional H&E + vendorcell_groups.csv). See the dedicated section under “I/O” below. Tutorial: t_atera_preprocess.ov.io.read_xeniumV2 / Prime fix (PR #716, issue #708) — Xenium V2 / Prime data shipsmorphology_focus/morphology_focus_NNNN.ome.tifper stain channel; the OME-XML cross-references between siblings made tifffile’s default_multifile=Truesilently merge them and break the pyramid walk, producing[Xenium] No morphology image loaded. The fix walks each per-channel pyramid standalone with_multifile=Falseand prioritises the user-requested channel. Verified on Xenium Prime FFPE Human Prostate (5,006 genes / 193K cells) — all four channels now load. Tutorial: t_xenium_preprocess.New helpers extracted from the Atera tutorial —
ov.pl.to_rgb_grayscale,ov.pl.sync_categorical_palette,ov.space.subset_window.ov.pl.spatial—frameon=Falsenow truly hides the frame (matches scanpy).
Microbiome & Metabolomics#
ov.micro — paired microbe ↔ metabolite + meta-analysis#
Paired microbe↔metabolite integration (PR #664) — new
simulate_paired,paired_spearman,paired_cca, and MMvec for unpaired co-occurrence inference, with companion plotting helpers. Tutorial: t_micro_metabol_paired.Cross-cohort 16S meta-analysis (PR #663) —
combine_studies+meta_dafor proper between-study differential abundance, handling per-study confounders. Tutorials: t_16s_meta_analysis, t_16s_da_comparison.fetch_franzosa_ibd_2019(PR #664) — first built-in real paired multi-omics dataset; the paired tutorial was rewritten end-to-end against it.
ov.metabol — MTBLS1 case study (PRs #665)#
MTBLS1 case-study helpers + 4 plot helpers for the MetaboLights MTBLS1 benchmark.
v0.5 perf/usability fixes discovered during the MTBLS1 smoke-test rollout.
Tutorial: t_metabol_11_real_data_mtbls1.
ov.report — one-call HTML pipeline report from AnnData provenance (PRs #659, #660)#
ov.report— one-call HTML pipeline report generated from AnnData provenance.@trackeddecorator consolidates timing / nesting / record-call concerns into a single annotation.Extends to
ov.single.batch_correction,ov.single.Annotation,ov.single.AnnotationRef(PR #660) — class-method tracking, not just function tracking.Developer guide: see the “Making a dispatcher appear in
ov.report.from_anndata” section of Developer_guild for how to wire your own dispatcher into the report.
Plotting#
ov.pl.plot1cell— circular UMAP with concentric metadata tracks (PRs #674, #675, #676, #679): new circular embedding plot withbending.insidelabels, axis ticks, per-track palettes, and horizontal outer-ring labels with wrap + repel. Respectsplot_setbackground (no hardcoded figure background). Registered in Sphinx toctrees with a 4-scale tutorial slimmed to 2 real datasets. Tutorial: t_plot1cell.ov.pl.embeddingflow layout (PR #719) — flow layout is now the default for multi-panel embedding plots (replaces the grid layout). Multiple polish fixes:Flow layout accounts for default colorbar + title in panel footprint.
Colorbar width halved (fraction 0.08 → 0.04); slim inset colorbar default that reclaims unused figure width.
_flow_layout_panelsmeasures axis & colorbar tightbboxes so layout is colorbar-aware.inset_axescolorbar registered so flow layout measures its tightbbox correctly.
ov.pl.ccc_*— unified communication adapters for LIANA and CellPhoneDB (PR #666 area) — single adapter layer soccc_circle,ccc_heatmap,ccc_network_plot,ccc_stat_plotwork against either source. Empty-interaction palettes guarded. Volcano fix (#712) for non-standard significance column values. Tutorials: t_ccc_cellphonedb, t_ccc_liana.ov.pl.trajectory(PR #707) — generic trajectory plotting backend with a robust label assertion in tests.ov.pl.cnv_*— seeov.single.CNVabove for the marsilea backend.Dynamic heatmap fixes — preserve explicit
feature_labels; preserve dynamic annotations through layout.WGCNA color assignment refactor (PR #684) — community contribution from @libmelo.
I/O#
ov.io.read_atera(PR #700) — full reader for 10x Atera (WTA Preview)outs/. Atera ships a Xenium-format core (cell_feature_matrix.h5,cells.parquet,cell_boundaries.parquet,experiment.xenium) plus four Atera-only additions, all handled by the reader:nucleus_boundaries.parquet→obs['nucleus_geometry'](WKT POLYGON).morphology_focus/ch####_<tag>.ome.tif— content-named multi-stain pyramid TIFFs. Channel selector accepts a semantic tag ('dapi'/'boundary'/'rna'/'stroma'), a filename substring ('cd45','18s'), or an integer-as-string index. tifffile’s OME multi-file series is bypassed (is_ome=False) so each channel’s pyramid IFDs are walked standalone.Optional vendor
cell_groups.csvmerge →obs['cell_group']andobs['cell_group_color'], NaN-preserving for cells absent from the CSV.Optional H&E OME-TIFF + 3×3 affine CSV →
uns['spatial'][lib]['images']['he']andscalefactors['he_affine']/'he_downsample'.Verified on the public
WTA_Preview_FFPE_Breast_Cancersample (170,057 cells × 18,028 genes after dropping 9,076 control probes / codewords).Tutorial: t_atera_preprocess.
ov.io.read_csv(latest) — now flags pandas’s silent duplicate-column rename (col,col.1,col.2, …) instead of letting it pass through unnoticed.ov.io.read_xeniumV2 / Prime fix — see Spatial.
Datasets & utilities#
ov.datasets.visium_lymph_nodeloader — first-class reference dataset for the RCTD tutorial.ov.utils.preflight_alignment(latest) — sample-metadata alignment pre-flight helper; checks that theobstable you’re about to merge will actually align with the AnnData index before you do it. Registered with the function registry.ov.bulk—pyWGCNA/readWGCNAregistry-discoverable shim (PR #700) so the WGCNA path is discoverable fromov.Agentwithout importing the heavy backend at top level.
OVAgent / Jarvis / registry#
registry_lookupimprovements (PR #691):Shows docstrings + visual dividers in
registry_lookupoutput (was a bare table before)._registryincludes keyword-only args in AST-derived signatures (was previously dropping them).Caches
RegistryScanner+ skill registry across calls (perf win on long agent sessions).
@register_functionbackfill on docstring-backfilled APIs (PR #695) — every metabol / micro / single API whose docstring was filled in from omicverse-skills 0.3.0 grounding is now registry-discoverable.OVAgent resilience + live LLM trace + auto-download (PR #688) — better recovery on transient backend failures, live trace of the LLM call as it streams, and auto-download of small dependencies on first use.
Hardcoded user-specific paths removed (PR #689) —
possible_pathsfallback lists deleted in favour of explicit configuration.omicverse-skillsbumped to >=0.3.0 (PR #696) to pick up the v0.3 docstring corpus.
Bug fixes#
#706 —
ov.pp.preprocess(mode='shiftlog|seurat')no longer raisesUnboundLocalError(PR #709).#683 —
ov.pp.tsneno longer forwardsn_componentsto the scanpy backend (PR #698).#685 — bulk/single enrichment
logpnow matches the−log10colour-bar label (PR #697).#681 — SCENIC
cal_grnpassesgene_namestogrnboost2/genie3(PR #699).#708 — Xenium V2 / Prime
morphology_focus_NNNN.ome.tifchannels load correctly (PR #716).#612 —
bulk2single/bayesprism:ThetaPostcolumns are cell types, not genes (PR #672).#712 — volcano misclassifies significant genes when
sigcolumn uses non-standard values (community PR).bulk2single TAPE (PR #724) — high-res mode now exposes signature matrices via
self.signature_matrix.CCC circle (latest) — circle aggregation now aligns with significant interactions (community PR #715).
MOFA — raw regex for factor parsing.
GPU connectivity device — lazily resolved (no spurious CUDA imports).
Removed / deprecated#
ov.fmnamespace dropped — useSCLLMManageras the canonical foundation-model entry (PR #704). Thetests/fm/directory was removed; architecture tests were updated to pin the bulk lazy route through_wgcnainstead of_Gene_module.xgboostCPU path forov.es.mdt/ov.es.udt— replaced by the pure-torch GBDT (83× / 158× faster).possible_pathshardcoded fallback lists — removed in favour of explicit configuration.