omicverse.micro.MMvec

Contents

omicverse.micro.MMvec#

omicverse.micro.MMvec(n_latent: int = 3, lr: float = 0.05, epochs: int = 1000, val_frac: float = 0.1, patience: int = 100, l2: float = 0.001, seed: int = 0, device: str | None = None)[source]#

MMvec (Morton et al. 2019) in ~80 lines of PyTorch.

The objective is the exact expected multinomial log-likelihood

\[\ell \;=\; \sum_{i,j} W_{ij} \,\log \mathrm{softmax}(u_i \cdot V^\top + \beta)_j\]

where \(W_{ij} = \sum_s c_{s,i} \cdot m_{s,j} / M_s\) is the co-occurrence weight matrix (total microbe-i count × expected metabolite-j fraction over the cohort). For the tutorial-scale data we use the full softmax; the upstream mmvec package uses negative sampling to scale to thousands of features.

Parameters:
  • n_latent (default: 3) – Embedding dimensionality K.

  • lr (default: 0.05) – Adam learning rate.

  • epochs (default: 1000) – Maximum training epochs.

  • val_frac (default: 0.1) – Fraction of samples held out for the validation loss curve / early stopping. Set to 0 to skip validation.

  • patience (default: 100) – Early-stopping patience on validation loss (epochs without improvement before training halts).

  • l2 (default: 0.001) – Weight-decay on U / V / beta.

  • seed (default: 0) – Torch RNG seed.

  • device (default: None) – 'cpu' / 'cuda' / None (auto-pick based on availability).