Trajectory Inference with Palantir#

This tutorial uses pancreas endocrine development data to demonstrate Palantir pseudotime inference, branch inspection, gene-trend summaries, and a branch-aware pseudotime stream plot built with ov.pl.branch_streamplot.

Method background#

Following the Palantir documentation and the original Nature Biotechnology paper, Palantir models differentiation as a stochastic process on a diffusion manifold.

Its core logic is:

  • compute diffusion components from a denoised low-dimensional representation

  • choose an early/root cell and order cells along pseudotime

  • identify terminal states and estimate branch-specific fate probabilities

  • summarize gene expression trends along pseudotime and across terminal branches

This makes Palantir especially useful when we care about both a shared developmental trunk and gradual commitment toward multiple endpoints.

Why use the pancreas dataset here?#

Pancreatic endocrine development is a compact teaching example for Palantir because it contains a clear progenitor-to-endocrine progression and multiple endocrine fates. That makes it easy to inspect pseudotime, branch probabilities, and branch-aware gene trends in one tutorial.

Preprocess data#

As an example, we apply trajectory inference to pancreas development.

import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import omicverse as ov
ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2
🔬 Starting plot initialization...
Using already downloaded Arial font from: /var/folders/rv/3jnfbs0d6r7d0c5bfj7ft5k00000gn/T/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ Apple Silicon MPS detected
    • [MPS] Apple Silicon GPU - Metal Performance Shaders available

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.3rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.
adata=ov.datasets.pancreatic_endocrinogenesis()
⚠️ File ./data/endocrinogenesis_day15.h5ad already exists
 Loading data from ./data/endocrinogenesis_day15.h5ad
✅ Successfully loaded: 3696 cells × 27998 genes
adata=ov.pp.preprocess(adata,mode='shiftlog|pearson',n_HVGs=3000,)
adata.raw = adata
adata = adata[:, adata.var.highly_variable_features]
ov.pp.scale(adata)
ov.pp.pca(adata,layer='scaled',n_pcs=50)
🔍 [2026-04-28 15:44:19] Running preprocessing in 'cpu' mode...
Begin robust gene identification
    After filtration, 17750/27998 genes are kept.
    Among 17750 genes, 16426 genes are robust.
✅ Robust gene identification completed successfully.
Begin size normalization: shiftlog and HVGs selection pearson

🔍 Count Normalization:
   Target sum: 500000.0
   Exclude highly expressed: True
   Max fraction threshold: 0.2
⚠️ Excluding 1 highly-expressed genes from normalization computation
   Excluded genes: ['Ghrl']

✅ Count Normalization Completed Successfully!
   ✓ Processed: 3,696 cells × 16,426 genes
   ✓ Runtime: 0.08s

🔍 Highly Variable Genes Selection (Experimental):
   Method: pearson_residuals
   Target genes: 3,000
   Theta (overdispersion): 100
✅ Experimental HVG Selection Completed Successfully!
   ✓ Selected: 3,000 highly variable genes out of 16,426 total (18.3%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'highly_variable_rank': Float vector (adata.var)
     • 'highly_variable_nbatches': Int vector (adata.var)
     • 'highly_variable_intersection': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'variances': Float vector (adata.var)
     • 'residual_variances': Float vector (adata.var)
    Time to analyze data in cpu: 0.46 seconds.
✅ Preprocessing completed successfully.
    Added:
        'highly_variable_features', boolean vector (adata.var)
        'means', float vector (adata.var)
        'variances', float vector (adata.var)
        'residual_variances', float vector (adata.var)
        'counts', raw counts layer (adata.layers)
    End of size normalization: shiftlog and HVGs selection pearson

╭─ SUMMARY: preprocess ──────────────────────────────────────────────╮
  Duration: 0.5635s                                                 
  Shape:    3,696 x 27,998 -> 3,696 x 16,426                        
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   VAR    highly_variable (bool)                               
 highly_variable_features (bool)                      
 highly_variable_rank (float)                         
 means (float)                                        
 n_cells (int)                                        
 percent_cells (float)                                
 residual_variances (float)                           
 robust (bool)                                        
 variances (float)                                    
                                                                    
   UNS    REFERENCE_MANU                                       
 _ov_provenance                                       
 history_log                                          
 hvg                                                  
 log1p                                                
 status                                               
 status_args                                          
                                                                    
   LAYERS counts (sparse matrix, 3696x16426)                   
                                                                    
╰────────────────────────────────────────────────────────────────────╯
╭─ SUMMARY: scale ───────────────────────────────────────────────────╮
  Duration: 0.2963s                                                 
  Shape:    3,696 x 3,000 (Unchanged)                               
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   LAYERS scaled (array, 3696x3000)                            
                                                                    
╰────────────────────────────────────────────────────────────────────╯
computing PCA🔍
    with n_comps=50
   🖥️ Using sklearn PCA for CPU computation
   🖥️ sklearn PCA backend: CPU computation
   📊 PCA input data type: ArrayView, shape: (3696, 3000), dtype: float64
🔧 PCA solver used: covariance_eigh
    finished✅ (57.96s)

╭─ SUMMARY: pca ─────────────────────────────────────────────────────╮
  Duration: 57.9656s                                                
  Shape:    3,696 x 3,000 (Unchanged)                               
                                                                    
  CHANGES DETECTED                                                  
  ────────────────                                                  
   UNS    scaled|original|cum_sum_eigenvalues                  
 scaled|original|pca_var_ratios                       
                                                                    
   OBSM   scaled|original|X_pca (array, 3696x50)               
                                                                    
╰────────────────────────────────────────────────────────────────────╯

Let us inspect the contribution of single PCs to the total variance in the data. This gives us information about how many PCs we should consider in order to compute the neighborhood relations of cells. In our experience, often a rough estimate of the number of PCs does fine.

ov.utils.plot_pca_variance_ratio(adata, n_pcs=15)
ov.pl.umap(
    adata,
    color='clusters'
)
X_umap converted to UMAP to visualize and saved to adata.obsm['UMAP']
if you want to use X_umap, please set convert=False
../_images/354001f7eb832d53ea8686de6661e19b9c0c363bc5ca9699a4c61145cced1157.png

Palantir#

Palantir can be run by specifying an approxiate early cell.

Palantir can automatically determine the terminal states as well. In this dataset, we know the terminal states and we will set them using the terminal_states parameter

Here, we used ov.single.TrajInfer to construct a Trajectory Inference object.

Traj=ov.single.TrajInfer(
    adata,
    basis='X_umap',
    groupby='clusters',
    use_rep='scaled|original|X_pca',
    n_comps=50
)
Traj.set_origin_cells('Ductal')
Traj.set_terminal_cells(["Alpha","Beta","Delta","Epsilon"])
Traj.inference(method='palantir',num_waypoints=500)
**finished identifying marker genes by COSG**
Sampling and flocking waypoints...
Time for determining waypoints: 0.00044803619384765626 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
Time for shortest paths: 0.15296376943588258 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9998
Correlation at iteration 2: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...
<omicverse.external.palantir.presults.PResults at 0x164ccadd0>

Palantir results can be visualized on the tSNE or UMAP using the plot_palantir_results function

Traj.palantir_plot_pseudotime(
    embedding_basis='X_umap',
    cmap='RdBu_r',
    s=3,
    n_cols=4,
    figsize=(8, 6),
)

Once the cells are selected, it’s often helpful to visualize the selection on the pseudotime trajectory to ensure we’ve isolated the correct cells for our specific trend. We can do this using the plot_branch_selection function:

Traj.palantir_cal_branch(
    eps=0,
    plot_kwargs={
        'figsize': (6, 4),
        'selected_color': '#1f77b4',
        'deselected_color': '#d9d9d9',
        's': 4,
    },
)
ov.external.palantir.plot.plot_trajectory(
    adata,
    "Alpha",
    cell_color="palantir_entropy",
    n_arrows=10,
    color="red",
    scanpy_kwargs=dict(cmap="RdBu_r"),
)
[2026-04-28 15:45:48,028] [INFO    ] Using sparse Gaussian Process since n_landmarks (50) < n_samples (805) and rank = 1.0.
[2026-04-28 15:45:48,029] [INFO    ] Using covariance function Matern52(ls=1.262711524963379).
[2026-04-28 15:45:48,086] [INFO    ] Computing 50 landmarks with k-means clustering (random_state=42).
[2026-04-28 15:45:49,654] [INFO    ] Sigma interpreted as element-wise standard deviation.
<Axes: title={'center': 'Branch: Alpha'}, xlabel='UMAP1', ylabel='UMAP2'>
../_images/16f8470162bc7bf4448d41440ccec3d9aabf8345419aea1f467184b2f51b1adc.png

Branch-aware pseudotime stream plot#

After computing palantir_pseudotime, we can summarize cluster occupancy along pseudotime with KDE-smoothed ribbons and lay them out on a simple branch skeleton. This gives a compact trajectory-level overview that is useful for publication-style pseudotime figures.

fig, ax = ov.pl.branch_streamplot(
    adata,
    group_key='clusters',
    pseudotime_key='palantir_pseudotime',
    show=False,
)
plt.show()

Palantir uses Mellon Function Estimator to determine the gene expression trends along different lineages. The marker trends can be determined using the following snippet. This computes the trends for all lineages. A subset of lineages can be used using the lineages parameter.

adata.layers['lognorm'] = adata.X.copy()
# MAGIC currently conflicts with the NumPy version in the dev environment,
# so we keep a stable smoothed-expression placeholder layer for downstream trends/heatmaps.
adata.layers['MAGIC_imputed_data'] = adata.layers['lognorm'].copy()
gene_trends = Traj.palantir_cal_gene_trends(
    layers="MAGIC_imputed_data",
)
Delta
[2026-04-28 15:45:50,680] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (624) and rank = 1.0.
[2026-04-28 15:45:50,681] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-04-28 15:45:51,718] [INFO    ] Sigma interpreted as element-wise standard deviation.
Beta
[2026-04-28 15:45:51,937] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (939) and rank = 1.0.
[2026-04-28 15:45:51,938] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-04-28 15:45:52,346] [INFO    ] Sigma interpreted as element-wise standard deviation.
Alpha
[2026-04-28 15:45:52,475] [INFO    ] Using sparse Gaussian Process since n_landmarks (500) < n_samples (805) and rank = 1.0.
[2026-04-28 15:45:52,475] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-04-28 15:45:52,862] [INFO    ] Sigma interpreted as element-wise standard deviation.
Epsilon
[2026-04-28 15:45:52,993] [INFO    ] Using non-sparse Gaussian Process since n_landmarks (500) >= n_samples (324) and rank = 1.0.
[2026-04-28 15:45:52,994] [INFO    ] Using covariance function Matern52(ls=1.0).
[2026-04-28 15:45:53,438] [INFO    ] Sigma interpreted as element-wise standard deviation.
Traj.palantir_plot_gene_trends(
    ['Pax4', 'Ins2'],
    layers='MAGIC_imputed_data',
    figsize=(4.5, 3),
    compare_groups=True,
    linewidth=2.2,
)
plt.show()
🔍 Dynamic feature analysis:
   Views: 4 | Features: 2
   Pseudotime: palantir_pseudotime
   Layer: MAGIC_imputed_data
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 8/8
   ✓ Fitted rows: 1600

🔍 Dynamic trend plotting:
   Features: 2 | Groups: 4
   compare_features=False | compare_groups=True
✅ Dynamic trend plotting completed!
../_images/13a6e986c7cbe5f9198ac99c5c0bbfb0b8f54338d658ade7fad1f5ee3e0b7db1.png
Traj.palantir_plot_gene_trends(
    ['Pax4', 'Ins2'],
    lineages=['Beta', 'Alpha'],
    layers='MAGIC_imputed_data',
    figsize=(4.5, 3),
    compare_groups=True,
    linewidth=2.2,
)
plt.show()
🔍 Dynamic feature analysis:
   Views: 2 | Features: 2
   Pseudotime: palantir_pseudotime
   Layer: MAGIC_imputed_data
   GAM: normal-identity | splines=8
✅ Dynamic feature analysis completed!
   ✓ Successful fits: 4/4
   ✓ Fitted rows: 800

🔍 Dynamic trend plotting:
   Features: 2 | Groups: 2
   compare_features=False | compare_groups=True
✅ Dynamic trend plotting completed!
../_images/a6d6972e08b7f482accdd22d8e3be0926e5fb21733311e706fb0e9d9dd4ec259.png

OV trajectory graph overlay#

After PAGA is computed with Palantir pseudotime as the time prior, ov.pl.trajectory gives a compact method-independent graph overlay on the UMAP embedding.

ov.pl.trajectory(
    adata,
    method='paga',
    basis='X_umap',
    groups='clusters',
    color='clusters',
    title='Palantir trajectory graph',
)
plt.show()

OV trajectory overlay#

ov.pl.trajectory_overlay adds the PAGA backbone to an existing UMAP embedding.

fig, ax = plt.subplots(figsize=(4, 4))
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=50,
)
ov.pl.trajectory_overlay(
    adata,
    ax=ax,
    method='paga',
    basis='X_umap',
    groups='clusters',
)
ax.set_title('Palantir trajectory overlay')
plt.show()