Analyze 10x Atera (WTA Preview) FFPE breast cancer data

Analyze 10x Atera (WTA Preview) FFPE breast cancer data#

Atera is 10x Genomics’ next-generation, in-situ, whole-transcriptome assay built on top of the Xenium platform. Compared with Xenium’s gene-panel chemistry, Atera v1 expands coverage to ~18,000 human gene targets while keeping the cell-segmentation, polygon, and OME-TIFF morphology imaging that Xenium tutorials are written against.

An Atera outs/ bundle ships a Xenium-format core plus three additions:

File / folder	Origin	Notes
`cell_feature_matrix.h5`	shared	gene × cell sparse counts (10x HDF5)
`cells.parquet`	shared	per-cell metadata + centroids (microns)
`cell_boundaries.parquet`	shared	per-cell polygon vertices
`experiment.xenium`	shared	pipeline metadata (JSON)
`nucleus_boundaries.parquet`	Atera-only	per-cell nucleus polygon vertices
`morphology_focus/ch####_<tag>.ome.tif`	Atera-only	named multi-stain images (DAPI, ATP1A1+CD45+E-Cad, 18S, αSMA+Vim)
`*_cell_groups.csv`	optional	vendor-shipped cell-type classifier (cell_id → group → display color)
`_he_image.ome.tif` + `_he_alignment.csv`	optional	registered H&E whole-slide image and 3×3 affine

OmicVerse’s ov.io.spatial.read_atera mirrors read_xenium plus loaders for the four Atera-only items. This notebook walks through:

loading an Atera FFPE breast-cancer bundle into a single AnnData,
inspecting the four morphology stain channels,
plotting the vendor-supplied cell-type segmentation in spatial coords,
zooming into a small region with ov.pl.spatialseg to render cell polygons,
running standard preprocessing (filter, normalize, HVG, PCA),
plotting per-gene spatial expression for canonical breast-cancer markers.

Environment setup#

import omicverse as ov
ov.style(font_path='Arial')

%load_ext autoreload
%autoreload 2

🔬 Starting plot initialization...
Using already downloaded Arial font from: /tmp/omicverse_arial.ttf
Registered as: Arial
🧬 Detecting GPU devices…
✅ NVIDIA CUDA GPUs detected: 1
    • [CUDA 0] NVIDIA H100 80GB HBM3
      Memory: 79.1 GB | Compute: 9.0

   ____            _     _    __                  
  / __ \____ ___  (_)___| |  / /__  _____________ 
 / / / / __ `__ \/ / ___/ | / / _ \/ ___/ ___/ _ \ 
/ /_/ / / / / / / / /__ | |/ /  __/ /  (__  )  __/ 
\____/_/ /_/ /_/_/\___/ |___/\___/_/  /____/\___/                                              

🔖 Version: 2.1.2rc1   📚 Tutorials: https://omicverse.readthedocs.io/
✅ plot_set complete.

ov.settings.cpu_gpu_mixed_init()

CPU-GPU mixed mode activated
Available GPU accelerators: CUDA

Inspect the Atera bundle#

Atera ships its primary outputs as a single outs.zip. For tutorials we extract the small files (matrices, parquet, JSON) plus the four morphology focus channels — the giant morphology.ome.tif (~15 GB, full multi-z stack) and transcripts.parquet (~10 GB) are not needed for cell-level downstream analysis.

Companion files (cell_groups CSV, H&E image + alignment CSV) sit next to outs/ in the public 10x dataset page rather than inside it.

from pathlib import Path

ATERA = Path("data/atera_breast_cancer")

# Show top-level files + the morphology_focus channel TIFFs (skip GB-scale OME-TIFFs).
for p in sorted(ATERA.iterdir()):
    if p.is_dir():
        print(f"  {p.name}/")
        for k in sorted(p.iterdir()):
            print(f"      {k.name}  ({k.stat().st_size / 1024**2:,.1f} MB)")
    else:
        size_mb = p.stat().st_size / 1024**2
        # Skip the multi-GB H&E OME-TIFF and outs.zip from the printout.
        if size_mb < 500:
            print(f"  {p.name}  ({size_mb:,.1f} MB)")
        else:
            print(f"  {p.name}  ({size_mb / 1024:,.1f} GB)  # large, not loaded directly")

WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv  (7.2 MB)
  WTA_Preview_FFPE_Breast_Cancer_he_alignment.csv  (0.0 MB)
  WTA_Preview_FFPE_Breast_Cancer_he_image.ome.tif  (16.5 GB)  # large, not loaded directly
  WTA_Preview_FFPE_Breast_Cancer_keypoints.csv  (0.0 MB)
  WTA_Preview_FFPE_Breast_Cancer_outs.zip  (51.4 GB)  # large, not loaded directly
  analysis.tar.gz  (72.3 MB)
  cell_boundaries.parquet  (15.4 MB)
  cell_feature_matrix.h5  (379.7 MB)
  cells.parquet  (3.4 MB)
  experiment.xenium  (0.0 MB)
  metrics_summary.csv  (0.0 MB)
  morphology_focus/
      ch0000_dapi.ome.tif  (1,224.5 MB)
      ch0001_atp1a1_cd45_e-cadherin.ome.tif  (1,309.9 MB)
      ch0002_18s.ome.tif  (1,324.1 MB)
      ch0003_alphasma_vimentin.ome.tif  (1,169.8 MB)
  nucleus_boundaries.parquet  (14.1 MB)

Load the dataset with `read_atera`#

read_atera returns an AnnData with:

X: cells × genes sparse counts (control probes / codewords are dropped automatically; only Gene Expression features are kept).
obsm['spatial']: cell centroids in microns.
obs: cells.parquet metadata, plus geometry (cell polygon WKT), nucleus_geometry (nucleus polygon WKT), and — when cell_groups_csv is passed — cell_group and cell_group_color columns from the vendor’s classifier.
uns['spatial'][library_id]: images['hires'] (the chosen morphology channel), scalefactors that map microns → image pixels, and the full experiment.xenium metadata dict.

We start by selecting the dapi channel for the morphology image. Atera multi-channel selection accepts either a semantic tag ('dapi', 'boundary', 'rna', 'stroma'), a substring ('cd45', '18s'), or an integer-as-string index ('0'–'3').

adata = ov.io.spatial.read_atera(
    ATERA,
    image_key='dapi',
    image_max_dim=2048,
    cell_groups_csv=ATERA / 'WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv',
    cache_file=ATERA / 'atera_dapi.h5ad',
)
adata

[Atera] Reading Atera data from: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer
[Atera] Loaded morphology channel ch0000_dapi.ome.tif (876, 1454) (downsample 0.0312)
[Atera] Loaded cell polygons for 170057/170057 cells
[Atera] Loaded nucleus polygons for 168335/170057 cells
[Atera] Merged cell_groups: 170057/170057 cells from WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv
[Atera] Wrote cache AnnData to: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer/atera_dapi.h5ad
[Atera] Done (n_obs=170057, n_vars=18028, library_id=WTA Breast Cancer (FFPE))

AnnData object with n_obs × n_vars = 170057 × 18028
    obs: 'transcript_counts', 'control_probe_counts', 'genomic_control_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'nucleus_count', 'segmentation_method', 'geometry', 'nucleus_geometry', 'cell_group', 'cell_group_color'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial', 'omicverse_io'
    obsm: 'spatial'

The experiment.xenium metadata is preserved verbatim under uns['spatial']['<library>']['metadata']. The pixel_size field (in microns) defines how spatial centroids are converted into image-pixel coordinates downstream.

library_id = list(adata.uns['spatial'].keys())[0]
meta = adata.uns['spatial'][library_id]['metadata']
for k in ['run_name', 'region_name', 'preservation_method', 'panel_name',
          'panel_num_targets_predesigned', 'chemistry_version', 'pixel_size',
          'num_cells', 'transcripts_per_cell']:
    print(f"  {k:30s} {meta.get(k)}")

run_name                       WTA Preview
  region_name                    WTA Breast Cancer (FFPE)
  preservation_method            FFPE
  panel_name                     Human WTA (pre-release)
  panel_num_targets_predesigned  18028
  chemistry_version              Atera v1
  pixel_size                     0.2125
  num_cells                      170057
  transcripts_per_cell           2116

Quick QC: per-cell distributions#

Atera’s cells.parquet already carries per-cell transcript_counts, cell_area, and nucleus_count. A first sanity check is to verify the distributions look reasonable: a mean of ~2,000 transcripts/cell with cell areas in the tens of µm² is typical for Atera v1.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

fig, axes = plt.subplots(1, 4, figsize=(15, 3.2))
for ax, (col, log) in zip(axes, [
    ('transcript_counts', True),
    ('cell_area',        False),
    ('nucleus_area',     False),
    ('nucleus_count',    False),
]):
    vals = adata.obs[col].astype(float).to_numpy()
    if log:
        vals = np.log10(vals + 1)
        ax.set_xlabel(f'log10({col} + 1)')
    else:
        ax.set_xlabel(col)
    ax.hist(vals, bins=60, color='#4477aa', edgecolor='white', linewidth=0.3)
    ax.set_ylabel('cells')
    ax.spines[['right', 'top']].set_visible(False)
fig.suptitle(f'Per-cell QC ({adata.n_obs:,} cells)', y=1.02)
fig.tight_layout()
plt.show()

../_images/e4342e9a178e98d71f7ea72a32f254a12447df7a81e039c9a92ff0cbe0d0b739.png

Inspect the four morphology channels#

Atera ships morphology imaging as four separate stain channels rather than a single composite. Each channel is a standalone OME-TIFF pyramid:

Channel	Stain	Purpose
`ch0000_dapi.ome.tif`	DAPI	nucleus
`ch0001_atp1a1_cd45_e-cadherin.ome.tif`	ATP1A1 + CD45 + E-Cadherin	cell boundary
`ch0002_18s.ome.tif`	18S rRNA	RNA / cell mass
`ch0003_alphasma_vimentin.ome.tif`	αSMA + Vimentin	stromal cells

We re-read the bundle with each channel in turn. read_atera only loads the morphology image — the matrix and metadata reuse from cache_file are not triggered when re-reading directly from the source path with a different image_key, but image loading is cheap (~15 s per channel at 2048 px max) so we just call it four times.

channel_keys = [('dapi',     'DAPI (nucleus)'),
                ('boundary', 'ATP1A1 / CD45 / E-Cadherin (cell boundary)'),
                ('rna',      '18S rRNA'),
                ('stroma',   'αSMA / Vimentin (stroma)')]

channel_imgs = {}
for key, _ in channel_keys:
    a = ov.io.spatial.read_atera(
        ATERA, image_key=key, image_max_dim=2048,
        load_boundaries=False, load_nucleus_boundaries=False,
    )
    channel_imgs[key] = a.uns['spatial'][library_id]['images']['hires']

# Register every channel under a semantic key so ov.pl.spatialseg can use any
# of them as background via `img_key=...`. ov.pl.to_rgb_grayscale converts each
# 2-D channel to a contrast-clipped RGB stack — without this matplotlib's
# default viridis colormap kicks in and washes the per-channel structure into
# a uniform purple background. All four channels share the same
# `tissue_hires_scalef` (loaded at the same `image_max_dim`).
spatial_block = adata.uns['spatial'][library_id]
scalef = spatial_block['scalefactors']['tissue_hires_scalef']
for key, _ in channel_keys:
    spatial_block['images'][key] = ov.pl.to_rgb_grayscale(channel_imgs[key])
    spatial_block['scalefactors'][f'tissue_{key}_scalef'] = scalef
spatial_block['images']['hires'] = spatial_block['images']['dapi']
print('available background channels:', list(spatial_block['images']))

[Atera] Reading Atera data from: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer
[Atera] Loaded morphology channel ch0000_dapi.ome.tif (876, 1454) (downsample 0.0312)
[Atera] Done (n_obs=170057, n_vars=18028, library_id=WTA Breast Cancer (FFPE))
[Atera] Reading Atera data from: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer
[Atera] Loaded morphology channel ch0001_atp1a1_cd45_e-cadherin.ome.tif (876, 1454) (downsample 0.0312)
[Atera] Done (n_obs=170057, n_vars=18028, library_id=WTA Breast Cancer (FFPE))
[Atera] Reading Atera data from: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer
[Atera] Loaded morphology channel ch0002_18s.ome.tif (876, 1454) (downsample 0.0312)
[Atera] Done (n_obs=170057, n_vars=18028, library_id=WTA Breast Cancer (FFPE))
[Atera] Reading Atera data from: /scratch/users/steorra/analysis/omicverse_dev/omicverse-test/notebooks/data/atera_breast_cancer
[Atera] Loaded morphology channel ch0003_alphasma_vimentin.ome.tif (876, 1454) (downsample 0.0312)
[Atera] Done (n_obs=170057, n_vars=18028, library_id=WTA Breast Cancer (FFPE))
available background channels: ['hires', 'dapi', 'boundary', 'rna', 'stroma']

fig, axes = plt.subplots(2, 2, figsize=(12, 12))
for ax, (key, title) in zip(axes.flat, channel_keys):
    img = channel_imgs[key]
    # Per-channel 99th-percentile contrast clip — Atera stains have a long tail
    # of bright outliers that flatten the rest of the image if not clipped.
    vmax = np.percentile(img[img > 0], 99) if (img > 0).any() else img.max()
    ax.imshow(img, cmap='gray', vmin=0, vmax=vmax)
    ax.set_title(f'{key}: {title}', fontsize=10)
    ax.set_xticks([]); ax.set_yticks([])
fig.suptitle(f'Atera morphology focus channels ({img.shape[1]}×{img.shape[0]} px)',
             y=0.94, fontsize=11)
fig.tight_layout()
plt.show()

../_images/1d72fefa1fb00102db9c48f23046e3698ade2b551b927117e1805be83a5e7df7.png

Vendor cell-group spatial map#

Atera ships a CSV mapping every cell_id to a curated cell-type label and a display color (*_cell_groups.csv). The 10x team produces these labels with a downstream classifier on top of the segmentation, and they make a useful sanity reference for our own clustering later.

We plot every cell as a tiny dot using its centroid and the vendor color directly.

print(adata.obs['cell_group'].value_counts().head(10))
print()
print(f"{adata.obs['cell_group'].nunique()} groups in total")

cell_group
11q13 Invasive Tumor Cells           64036
CAFs, DCIS Associated                24442
Luminal-like Amorphous DCIS Cells    13028
Basal-like Structured DCIS Cells      8760
Endothelial Cells                     8624
Myoepithelial Cells                   8438
Pericytes                             8087
T Lymphocytes                         8018
CXCL14+ Fibroblasts                   4936
Dendritic Cells                       4008
Name: count, dtype: int64

20 groups in total

# Pull vendor display colours into uns['cell_group_colors'] so ov.pl.spatial
# uses them as the categorical palette automatically.
ov.pl.sync_categorical_palette(adata, key='cell_group', color_obs='cell_group_color')

ov.pl.spatial(
    adata,
    color='cell_group',
    img_key=None,
    size=10,
    show=False,
)
plt.gcf().suptitle(f'Vendor cell-group classifier ({adata.n_obs:,} cells, '
                   f"{adata.obs['cell_group'].nunique()} groups)",
                   y=1.02, fontsize=11)
plt.show()

../_images/b66e4e404a2347321aac3c5a9de4268ca133f157c7b60f5d38bce62685f143f3.png

Render cell polygons in a small region#

ov.pl.spatialseg reads the WKT polygons stored in obs['geometry'] and renders each cell as its segmentation outline. Plotting all 170k polygons at once would dwarf the visible features, so we subset to a 1 mm × 1 mm window first.

Background image selection: by registering all four morphology channels under semantic keys ('dapi', 'boundary', 'rna', 'stroma') in uns['spatial'][lib]['images'], ov.pl.spatialseg can render polygons over any channel via img_key=.... For the cell-segmentation view the boundary channel (ATP1A1+CD45+E-Cadherin membrane stain) is the most informative — the polygons trace exactly the structures the stain lights up.

# Pick a 1 mm × 1 mm window centred on the densest tumour patch and reuse
# the same window for both the cell subset and the spatialseg crop_coord —
# this keeps the rendered background image aligned to the polygon extent.
x0, y0 = np.median(adata.obsm['spatial'], axis=0)
crop_window = (x0 - 500, x0 + 500, y0 - 500, y0 + 500)  # (x0, x1, y0, y1)
bdata = ov.space.subset_window(adata,
                                xlim=(crop_window[0], crop_window[1]),
                                ylim=(crop_window[2], crop_window[3]))
print(f'Subset window: {bdata.n_obs:,} cells')

Subset window: 3,231 cells

fig, axes = plt.subplots(2, 2, figsize=(14, 14))
for i, (ax, (key, label)) in enumerate(zip(axes.flat, channel_keys)):
    ov.pl.spatialseg(
        bdata,
        color='cell_group',
        img_key=key,
        crop_coord=crop_window,
        edges_color='white',
        edges_width=0.4,
        alpha=0.35,         # let the morphology dominate — was 0.85
        alpha_img=1.0,
        ax=ax,
        legend=(i == len(channel_keys) - 1),
        show=False,
    )
    ax.set_title(f'{key}: {label}', fontsize=10)
fig.suptitle('Cell polygons (vendor cell_group) over each morphology channel',
             y=0.94, fontsize=12)
plt.tight_layout()
plt.show()

../_images/120faf675f02f5f02ad9a3db82b671bbe4cd4fc947a9b0a4509b286f759646c3.png

Standard preprocessing#

We follow the Visium HD / Xenium recipe: filter very-low-count cells, normalize counts to a fixed total, log-transform, and select highly-variable genes. The Atera matrix is dense enough (median ~2k counts/cell) that a count-based filter is gentle.

adata.layers['counts'] = adata.X.copy()
ov.pp.filter_cells(adata, min_counts=20)
ov.pp.filter_genes(adata, min_cells=10)
ov.pp.normalize_total(adata)
ov.pp.log1p(adata)
print(adata)

🔍 Filtering cells...
   Parameters: min_counts≥20
✓ Filtered: 92 cells removed

╭─ SUMMARY: filter_cells ────────────────────────────────────────────╮
│  Duration: 7.8415s                                                 │
│  Shape:    170,057 x 18,028 -> 169,965 x 18,028                    │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
╰────────────────────────────────────────────────────────────────────╯
🔍 Filtering genes...
   Parameters: min_cells≥10
✓ Filtered: 6 genes removed

╭─ SUMMARY: filter_genes ────────────────────────────────────────────╮
│  Duration: 8.7565s                                                 │
│  Shape:    169,965 x 18,028 -> 169,965 x 18,022                    │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
╰────────────────────────────────────────────────────────────────────╯

🔍 Count Normalization:
   Target sum: median
   Exclude highly expressed: False
✅ Count Normalization Completed Successfully!
   ✓ Processed: 169,965 cells × 18,022 genes
   ✓ Runtime: 1.01s
AnnData object with n_obs × n_vars = 169965 × 18022
    obs: 'transcript_counts', 'control_probe_counts', 'genomic_control_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'nucleus_count', 'segmentation_method', 'geometry', 'nucleus_geometry', 'cell_group', 'cell_group_color'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial', 'omicverse_io', 'cell_group_colors', 'history_log', 'log1p'
    obsm: 'spatial'
    layers: 'counts'

ov.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key=None)
adata.raw = adata
adata = adata[:, adata.var['highly_variable']].copy()
print(f'Kept {adata.n_vars} HVGs')

🔍 Highly Variable Genes Selection:
   Method: seurat
   Target genes: 2,000
⚠️ Gene indices [ 2347  7356  9855 10554] fell into a single bin: normalized dispersion set to 1
   💡 Consider decreasing `n_bins` to avoid this effect
   📊 Top 2,000 genes correspond to normalized dispersion cutoff: 1.1648

✅ HVG Selection Completed Successfully!
   ✓ Selected: 2,000 highly variable genes out of 18,022 total (11.1%)
   ✓ Results added to AnnData object:
     • 'highly_variable': Boolean vector (adata.var)
     • 'means': Float vector (adata.var)
     • 'dispersions': Float vector (adata.var)
     • 'dispersions_norm': Float vector (adata.var)

╭─ SUMMARY: highly_variable_genes ───────────────────────────────────╮
│  Duration: 3.8829s                                                 │
│  Shape:    169,965 x 18,022 (Unchanged)                            │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● VAR    │ ✚ dispersions (float)                                  │
│           │ ✚ dispersions_norm (float)                             │
│           │ ✚ highly_variable (bool)                               │
│           │ ✚ means (float)                                        │
│                                                                    │
│  ● UNS    │ ✚ _ov_provenance                                       │
│           │ ✚ hvg                                                  │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯
Kept 2000 HVGs

ov.pp.scale(adata)
ov.pp.pca(adata, layer='scaled', n_pcs=50)
adata

╭─ SUMMARY: scale ───────────────────────────────────────────────────╮
│  Duration: 4.2843s                                                 │
│  Shape:    169,965 x 2,000 (Unchanged)                             │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● UNS    │ ✚ REFERENCE_MANU                                       │
│           │ ✚ status                                               │
│           │ ✚ status_args                                          │
│                                                                    │
│  ● LAYERS │ ✚ scaled (array, 169965x2000)                          │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯
🚀 Using GPU to calculate PCA...
NVIDIA CUDA GPUs detected:
📊 [CUDA 0] NVIDIA H100 80GB HBM3
    |||||||||||||----------------- 37305/81559 MiB (45.7%)
computing PCA🔍
with n_comps=50
Using CUDA device: NVIDIA H100 80GB HBM3
✅ Using built-in torch_pca for GPU-accelerated PCA
   🚀 Using torch_pca PCA for CUDA GPU acceleration
   🚀 torch_pca PCA backend: CUDA GPU acceleration (supports sparse matrices)
   📊 PCA input data type: ArrayView, shape: (169965, 2000), dtype: float64
   🔧 solver_used_in_uns (planned): covariance_eigh
🔧 PCA solver used: covariance_eigh
    finished✅ (7.02s)

╭─ SUMMARY: pca ─────────────────────────────────────────────────────╮
│  Duration: 7.3136s                                                 │
│  Shape:    169,965 x 2,000 (Unchanged)                             │
│                                                                    │
│  CHANGES DETECTED                                                  │
│  ────────────────                                                  │
│  ● UNS    │ ✚ pca                                                  │
│           │ └─ params: {'zero_center': True, 'use_highly_variable': Tr...│
│           │ ✚ scaled|original|cum_sum_eigenvalues                  │
│           │ ✚ scaled|original|pca_var_ratios                       │
│                                                                    │
│  ● OBSM   │ ✚ X_pca (array, 169965x50)                             │
│           │ ✚ scaled|original|X_pca (array, 169965x50)             │
│                                                                    │
╰────────────────────────────────────────────────────────────────────╯

AnnData object with n_obs × n_vars = 169965 × 2000
    obs: 'transcript_counts', 'control_probe_counts', 'genomic_control_counts', 'control_codeword_counts', 'unassigned_codeword_counts', 'deprecated_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'nucleus_count', 'segmentation_method', 'geometry', 'nucleus_geometry', 'cell_group', 'cell_group_color'
    var: 'gene_ids', 'feature_types', 'genome', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'spatial', 'omicverse_io', 'cell_group_colors', 'history_log', 'log1p', 'hvg', '_ov_provenance', 'status', 'status_args', 'REFERENCE_MANU', 'pca', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues'
    obsm: 'spatial', 'X_pca', 'scaled|original|X_pca'
    varm: 'PCs', 'scaled|original|pca_loadings'
    layers: 'counts', 'scaled'

Spatial maps for canonical breast-cancer markers#

With the matrix log-normalised, the same obsm['spatial'] is the natural axis for visualising any gene. We pick four genes that should split cleanly between the cell types in the vendor classifier:

KRT8: luminal-epithelial keratin (DCIS / luminal-like cells).
PTPRC: CD45 — pan-immune.
COL1A1: collagen — CAF / stromal.
PECAM1: CD31 — endothelial.

We use .raw so we can plot any gene that wasn’t selected as HVG.

marker_genes = ['KRT8', 'PTPRC', 'COL1A1', 'PECAM1']
available = [g for g in marker_genes if g in adata.raw.var_names]
print('Available markers:', available)

ov.pl.spatial(
    adata,
    color=available,
    use_raw=True,
    img_key=None,
    size=10,
    vmax='p99.2',     # scanpy parses 'p<N>' as the N-th percentile per panel
    cmap='magma',
    show=False,
)
plt.gcf().suptitle('Spatial expression (log-normalised) of canonical markers',
                   y=1.02, fontsize=11)
plt.show()

Available markers: ['KRT8', 'PTPRC', 'COL1A1', 'PECAM1']

../_images/a77eb31ed0b5311b04d36906fc0b42d25dcb96aaea50fe30017b101db0fbe347.png

Marker spatialseg over each morphology channel#

Re-creating the 1 mm × 1 mm subset on the post-preprocessing matrix lets us overlay marker expression on top of the cell polygons. Instead of fixing the background to DAPI, we pair each marker with a different morphology channel — every panel shows the same polygon set with the same zoom window but a different image behind it, so the multi-channel flexibility of ov.pl.spatialseg(..., img_key=...) is visible at a glance.

Channel pairing (rationale):

Marker	Background channel	Why
`KRT8` (luminal keratin)	`dapi` (nuclei)	epithelial nuclei light up where KRT8 is expressed
`PTPRC` / CD45 (immune)	`boundary` (ATP1A1+CD45+E-Cad)	CD45 lives in the boundary stain itself
`COL1A1` (collagen)	`stroma` (αSMA+Vim)	both highlight stromal compartment
`PECAM1` / CD31 (endothelial)	`rna` (18S)	endothelial cells stand out against generic RNA

ov.pl.spatialseg doesn’t parse the 'p99.2' percentile string (only ov.pl.spatial does), so we compute the per-gene 99.2-th percentile up front and pass it through as a float vmax.

bdata = ov.space.subset_window(adata,
                                xlim=(crop_window[0], crop_window[1]),
                                ylim=(crop_window[2], crop_window[3]))
print(f'Subset for spatialseg: {bdata.n_obs:,} cells')

# Re-attach the multi-channel uns so spatialseg can find each `img_key`.
bdata.uns['spatial'] = adata.uns['spatial']

# Materialise per-gene log-normalised expression onto bdata.obs so spatialseg
# can colour polygons directly. p99.2 vmax per gene clips the bright tail.
raw = adata.raw.to_adata()
raw_b = raw[bdata.obs_names].copy()
for gene in available:
    bdata.obs[f'{gene}_expr'] = raw_b[:, gene].X.toarray().ravel()

# Pair each marker with a different morphology channel — see the markdown
# above for the rationale. Order matches `available`: KRT8/PTPRC/COL1A1/PECAM1.
pairings = list(zip(available, ['dapi', 'boundary', 'stroma', 'rna']))

fig, axes = plt.subplots(2, 2, figsize=(14, 14))
for ax, (gene, ch) in zip(axes.flat, pairings):
    expr = bdata.obs[f'{gene}_expr'].to_numpy()
    nz = expr[expr > 0]
    vmax_val = float(np.percentile(nz, 99.2)) if nz.size else float(expr.max() or 1.0)
    ov.pl.spatialseg(
        bdata,
        color=f'{gene}_expr',
        img_key=ch,
        crop_coord=crop_window,
        edges_color='white',
        edges_width=0.4,
        alpha=0.55,
        alpha_img=1.0,
        cmap='magma',
        vmax=vmax_val,
        ax=ax,
        show=False,
    )
    ax.set_title(f'{gene} on {ch} (vmax≈{vmax_val:.2f})', fontsize=10)
fig.suptitle('Marker expression on cell polygons across morphology channels',
             y=0.94, fontsize=12)
plt.tight_layout()
plt.show()

Subset for spatialseg: 3,230 cells

../_images/bb09e7273ee27a0211be74fcb3598933de2290dffc7bc5d67b991af30208176f.png

Summary#

In this notebook we used omicverse.io.spatial.read_atera to:

load a 170,057-cell × 18,028-gene Atera v1 FFPE breast-cancer dataset into a single AnnData,
inspect Atera’s four-channel morphology focus stack (DAPI / boundary / 18S / stroma),
merge the vendor cell_groups.csv classifier directly into obs,
render cell polygons via ov.pl.spatialseg for region-of-interest views,
run a standard normalize → HVG → PCA preprocessing pipeline,
map canonical breast-cancer markers (KRT8, PTPRC, COL1A1, PECAM1) in physical space.

Atera’s outs/ layout is a strict superset of Xenium’s, so any downstream OmicVerse spatial workflow (ov.space.svg, ov.pl.spatial, neighborhood graphs, leiden clustering, cell-cell communication) works without modification — read_atera is the only thing that has to know about the extras (nucleus polygons, channel-named morphology, cell-group CSV, optional H&E + alignment).