omicverse.bulk.pyWGCNA

omicverse.bulk.pyWGCNA#

omicverse.bulk.pyWGCNA(name='WGCNA', TPMcutoff=1, powers=None, RsquaredCut=0.9, MeanCut=100, networkType='signed hybrid', TOMType='signed', minModuleSize=50, naColor='grey', cut=inf, MEDissThres=0.2, species=None, level='gene', anndata=None, geneExp=None, geneExpPath=None, sep=',', geneInfo=None, sampleInfo=None, save=False, outputPath=None, figureType='pdf')[source]#

Weighted Gene Co-expression Network Analysis.

Identifies highly co-expressed gene modules and relates them to clinical traits / sample metadata. Standard WGCNA workflow:

Preprocess — remove low-expressed genes (TPM cutoff) and outlier samples (Euclidean distance to mean).
Soft-thresholding — pick a power that yields scale-free topology in the gene-gene correlation network.
Adjacency + TOM — adjacency = |cor|^power; topological overlap matrix (TOM) measures shared neighbourhood.
Dynamic tree cut — hierarchical clustering on 1 - TOM; tree cut yields gene modules (named by colour).
Module eigengenes — first principal component of each module’s expression matrix.
Module-trait correlation — Pearson correlation of each module eigengene against numeric sample traits, with FDR-corrected p-values.

Parameters:

name (str) – Analysis label, used for output file names.
species (str) – Organism (e.g. "mus musculus", "homo sapiens").
geneExp (pandas.DataFrame) – Expression matrix shaped (genes × samples). Sample identifiers are the column names, gene identifiers are the index. Note this is the TRANSPOSE of the typical samples × genes layout used by AnnData.
TPMcutoff (float, default 1) – Per-gene TPM threshold; genes whose maximum across samples falls below this are dropped during preprocess.
powers (list[int], optional) – Candidate soft-threshold powers. Defaults to a 1–30 sweep.
networkType ({"signed", "unsigned", "signed hybrid"}) – How adjacency is computed from correlation.
minModuleSize (int, default 50) – Smallest module size kept by the dynamic tree cut.
save (bool, default False) – Whether to persist results to disk.

Notes

Wide expression CSVs are usually shaped samples × genes; remember to pass data.T so the constructor receives genes × samples.

Methods (call in this order — each step populates the attributes listed under it). Use the high-level runWGCNA() to chain everything end-to-end, or the explicit methods below for finer control:

preprocess() — drop low-TPM genes, drop outlier samples (updates self.datExpr).
calculate_soft_threshold() — scale-free fit power scan; sets self.power (int, not self.softPower) and self.sft (DataFrame with R²/slope/k per power).
calculating_adjacency_matrix() — sets self.adjacency.
calculating_TOM_similarity_matrix() — sets self.TOM.
calculate_geneTree() — sets self.geneTree (linkage matrix).
calculate_dynamicMods(kwargs_function={...}) — sets self.dynamicMods and self.datExpr.var['dynamicColors'].
calculate_gene_module(kwargs_function={...}) — merges close modules, sets self.datExpr.var['moduleColors'], self.datExpr.var['moduleLabels'], self.MEs, self.datME.
findModules() — convenience that runs the soft-threshold + adjacency + TOM + tree + module merge as one call (preferred).
runWGCNA() — runs preprocess() then findModules().
analyseWGCNA(geneList=None) — module–trait correlation; sets self.moduleTraitCor and self.moduleTraitPvalue. Requires sample metadata (set via updateSampleInfo(...) or passed via sampleInfo at construction).

Attributes (state machine — populated in this order). The class is a thin shim that delegates to the upstream PyWGCNA implementation; these are the actual attribute names on the returned instance, which agents commonly mis-spell:

self.geneExpr — AnnData (genes × samples) holding the original input expression.
self.datExpr — AnnData (genes × samples), filtered after preprocess(). Per-gene module annotations live on self.datExpr.var.
self.power (int) — chosen soft-threshold power. The attribute is ``power``, NOT ``softPower``. Set after calculate_soft_threshold() or findModules(); before that it is 0.
self.sft (pandas.DataFrame) — scale-free fit table per candidate power (columns: Power, SFT.R.sq, slope, mean(k), …). Set together with self.power.
self.adjacency (pandas.DataFrame) — gene-gene weighted adjacency. None until calculating_adjacency_matrix() / findModules() runs.
self.TOM (numpy.ndarray) — topological overlap matrix. None until calculating_TOM_similarity_matrix() / findModules() runs.
self.geneTree — scipy linkage matrix from 1 - TOM.
self.dynamicMods — initial dynamic-tree-cut module integer labels per gene.
self.datExpr.var['dynamicColors'] — initial module colour per gene (string, e.g. 'turquoise').
self.datExpr.var['moduleColors'] — final module colour per gene (after merging close modules). Use this for downstream.
self.datExpr.var['moduleLabels'] — integer label per gene aligned to moduleColors.
self.MEs (pandas.DataFrame) — module eigengenes, samples × modules. Do not compute this manually — the class already provides it; manual mean-by-mask is not equivalent (eigengene = first PC of the module’s expression, not the mean).
self.datME — pre-merge eigengene matrix; usually self.MEs is what you want.
self.moduleTraitCor (pandas.DataFrame) — module × trait Pearson correlations. None until analyseWGCNA() runs.
self.moduleTraitPvalue (pandas.DataFrame) — parallel p-value table. None until analyseWGCNA() runs.

Examples

>>> import pandas as pd, omicverse as ov
>>> data = pd.read_csv('expressionList.csv', index_col=0)
>>> wgcna = ov.bulk.pyWGCNA(
...     name='5xFAD',
...     species='mus musculus',
...     geneExp=data.T,            # transpose to genes × samples
...     TPMcutoff=1,
...     networkType='signed hybrid',
... )
>>> wgcna.preprocess()
>>> wgcna.findModules()

omicverse.bulk.pyWGCNA

Contents

omicverse.bulk.pyWGCNA#