Velocity-guided CellRank Analysis#

This notebook demonstrates a velocity-informed CellRank workflow on a pancreas RNA-velocity dataset. It covers velocity preprocessing, terminal-state prediction, fate probabilities, branch-aware pseudotime visualization, and marker-gene dynamics.

Load data#

We use the pancreas developmental dataset distributed with scvelo. It contains spliced and unspliced layers required for RNA velocity analysis.

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

import omicverse as ov
import scanpy as sc
import scvelo as scv
import cellrank as cr
import numpy as np
import matplotlib.pyplot as plt

ov.plot_set(font_path='Arial')

%load_ext autoreload
%autoreload 2
🔬 Starting plot initialization...
Using already downloaded Arial font from: /var/folders/rv/3jnfbs0d6r7d0c5bfj7ft5k00000gn/T/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
🚫 PyTorch not available - GPU detection skipped

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.3rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.
adata = scv.datasets.pancreas()
adata
AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    frameon='small',
)

Estimate RNA velocity#

We keep the preprocessing intentionally compact: normalize shared counts, compute PCA/neighbors, calculate moments, estimate deterministic RNA velocity, and build the velocity graph. We also derive a development-oriented pseudotime for the downstream branch and marker-program visualizations.

scv.pp.filter_and_normalize(adata, min_shared_counts=20)
sc.pp.pca(adata)
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)
scv.tl.velocity(adata, mode='deterministic')
scv.tl.velocity_graph(adata, n_jobs=1)
scv.tl.velocity_pseudotime(adata)

root_mask = adata.obs['clusters'].astype(str).eq('Ductal')
if not root_mask.any():
    root_mask = adata.obs['clusters'].astype(str).eq('Ngn3 low EP')
root_cells = np.flatnonzero(root_mask.to_numpy())
root_center = np.asarray(adata.obsm['X_pca'][root_cells]).mean(axis=0)
root_index = int(
    root_cells[
        np.argmin(
            np.linalg.norm(adata.obsm['X_pca'][root_cells] - root_center, axis=1)
        )
    ]
)
adata.uns['iroot'] = root_index
sc.tl.diffmap(adata)
sc.tl.dpt(adata, n_dcs=10)
adata.obs['development_pseudotime'] = adata.obs['dpt_pseudotime']
Filtered out 20801 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
computing moments based on connectivities
finished (0:00:01) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
computing velocities
finished (0:00:00) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/12 cores)
finished (0:00:05) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing terminal states
WARNING: Uncertain or fuzzy root cell identification. Please verify.
    identified 1 region of root cells and 1 region of end points .
    finished (0:00:00) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
scv.pl.velocity_embedding_stream(
    adata,
    basis='umap',
    color='clusters',
    legend_loc='right',
    title='RNA velocity stream',
)
computing velocity embedding
    finished (0:00:00) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
../_images/b59521a7aa86a7682c5928546fe57d61e291a5908c21b36349f9d6b2b4a8e9e6.png

Build CellRank kernels#

A common setup is to blend velocity-based transitions with neighborhood connectivity so the downstream fate analysis uses both local graph structure and RNA velocity direction.

vk = cr.kernels.VelocityKernel(adata).compute_transition_matrix()
ck = cr.kernels.ConnectivityKernel(adata).compute_transition_matrix()
kernel = 0.8 * vk + 0.2 * ck
kernel
(0.8 * VelocityKernel[n=3696, model='deterministic', similarity='correlation', softmax_scale=12.543] + 0.2 * ConnectivityKernel[n=3696, dnorm=True, key='connectivities'])

Visualize the CellRank velocity kernel#

After constructing the CellRank velocity kernel, we project its transition directions onto UMAP to inspect the directionality used by the fate analysis.

vk.plot_projection(
    basis='umap',
    color='clusters',
    legend_loc='right',
    title='CellRank velocity-kernel projection',
    stream=True,
    size=80,
)

OV trajectory graph overlay#

The development-oriented pseudotime can be used as a PAGA prior, giving a graph overlay that complements the velocity stream and CellRank kernel projection.

ov.utils.cal_paga(
    adata,
    use_time_prior='development_pseudotime',
    vkey='velocity',
    groups='clusters',
)
ov.pl.trajectory(
    adata,
    method='paga',
    basis='X_umap',
    groups='clusters',
    color='clusters',
    title='CellRank development trajectory graph',
)
plt.show()
running PAGA using priors: ['development_pseudotime']
finished
added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns)
    'paga/transitions_confidence', velocity transitions (adata.uns)
../_images/7d4e29365878de53850ac9133913da2b716fba4bd306fce1a4bd0e2944843a3e.png

OV trajectory overlay#

ov.pl.trajectory_overlay adds the PAGA backbone to an existing UMAP embedding.

fig, ax = plt.subplots(figsize=(4, 4))
ov.pl.embedding(
    adata,
    basis='X_umap',
    color='clusters',
    ax=ax,
    show=False,
    size=50,
)
ov.pl.trajectory_overlay(
    adata,
    ax=ax,
    method='paga',
    basis='X_umap',
    groups='clusters',
)
ax.set_title('CellRank development trajectory overlay')
plt.show()
g = cr.estimators.GPCCA(kernel)
g.compute_macrostates(n_states=6, cluster_key='clusters')
list(g.macrostates.cat.categories)
WARNING: Unable to import `petsc4py` or `slepc4py`. Using `method='brandts'`
WARNING: For `method='brandts'`, dense matrix is required. Densifying
['Epsilon', 'Alpha', 'Ductal_1', 'Ductal_2', 'Ngn3 high EP', 'Beta']
g.plot_macrostates(which='all', legend_loc='right', size=90)

Identify terminal states and fate probabilities#

For this pancreas example, CellRank can automatically propose terminal states from the fitted macrostates. We then compute lineage probabilities for every cell.

g.predict_terminal_states(n_states=4)
list(g.terminal_states.cat.categories)
['Epsilon', 'Alpha', 'Beta']
g.plot_macrostates(which='terminal', legend_loc='right', size=100)
g.compute_fate_probabilities(solver='direct', use_petsc=False)
g.plot_fate_probabilities(same_plot=True)
adata.obs.groupby('clusters', observed=True)[
    ['velocity_pseudotime', 'development_pseudotime']
].median().sort_values('development_pseudotime')
               velocity_pseudotime  development_pseudotime
clusters                                                  
Ngn3 low EP               0.911242                0.039220
Ductal                    0.908973                0.043246
Ngn3 high EP              0.937826                0.347985
Pre-endocrine             0.944476                0.634166
Beta                      0.950648                0.774505
Epsilon                   0.633200                0.796823
Alpha                     0.911853                0.819963
Delta                     0.898328                0.845001

Branch-aware stream plot along development pseudotime#

ov.pl.branch_streamplot can use the root-oriented development pseudotime together with the pancreas cluster labels. The shared endocrine progenitor states form the trunk, while the terminal endocrine fates bend into branch ribbons, giving a compact bridge between the UMAP velocity field and the CellRank fate probabilities.

fig, ax = ov.pl.branch_streamplot(
    adata,
    group_key='clusters',
    pseudotime_key='development_pseudotime',
    trunk_groups=['Ductal', 'Ngn3 low EP', 'Ngn3 high EP', 'Pre-endocrine'],
    branch_center=0.62,
    figsize=(6, 4),
    xlabel='Development pseudotime',
    show=False,
)
plt.show()
adata.obs['beta_fate'] = np.asarray(g.fate_probabilities['Beta']).ravel()
ov.pl.embedding(
    adata,
    basis='X_umap',
    color=['clusters', 'beta_fate'],
    cmap='Reds',
    frameon='small',
)

Summarize marker programs with dynamic_heatmap#

The development pseudotime can also drive ov.pl.dynamic_heatmap. Here we group marker genes by endocrine program and read smoothed expression from the Ms layer generated during the scVelo moments step, so the heatmap complements the single-gene trend panels above.

cellrank_marker_modules = {
    'Endocrine progenitor': ['Sox9', 'Neurog3', 'Fev'],
    'Alpha fate': ['Gcg', 'Arx'],
    'Beta fate': ['Pax4', 'Ins1', 'Ins2', 'Pdx1'],
    'Delta fate': ['Sst', 'Hhex'],
}
cellrank_marker_modules = {
    program: [gene for gene in genes if gene in adata.var_names]
    for program, genes in cellrank_marker_modules.items()
}
cellrank_marker_modules = {
    program: genes for program, genes in cellrank_marker_modules.items() if genes
}

cellrank_heatmap = ov.pl.dynamic_heatmap(
    adata,
    var_names=cellrank_marker_modules,
    pseudotime='development_pseudotime',
    layer='Ms',
    use_cell_columns=False,
    cell_annotation='clusters',
    cell_bins=180,
    smooth_window=17,
    fitted_window=31,
    figsize=(5, 4),
    standard_scale='var',
    cmap='RdBu_r',
    use_fitted=True,
    show_row_names=True,
    border=False,
    show=False,
)
🔍 Dynamic heatmap:
   Candidate features: 11
   Pseudotime: development_pseudotime
   Cell annotation: clusters
   use_fitted=True | cell_bins=180 | cmap=RdBu_r
✅ Dynamic heatmap completed!
✓ Matrix shape: 11 features × 176 columns
../_images/e5554b30fd11f256117b31a963443d3ba38a86b3f9f724ef00a2ac43c6f12e0f.png