omicverse.bulk.pyWGCNA#
- omicverse.bulk.pyWGCNA(name='WGCNA', TPMcutoff=1, powers=None, RsquaredCut=0.9, MeanCut=100, networkType='signed hybrid', TOMType='signed', minModuleSize=50, naColor='grey', cut=inf, MEDissThres=0.2, species=None, level='gene', anndata=None, geneExp=None, geneExpPath=None, sep=',', geneInfo=None, sampleInfo=None, save=False, outputPath=None, figureType='pdf')[source]#
Weighted Gene Co-expression Network Analysis.
Identifies highly co-expressed gene modules and relates them to clinical traits / sample metadata. Standard WGCNA workflow:
Preprocess — remove low-expressed genes (TPM cutoff) and outlier samples (Euclidean distance to mean).
Soft-thresholding — pick a power that yields scale-free topology in the gene-gene correlation network.
Adjacency + TOM — adjacency =
|cor|^power; topological overlap matrix (TOM) measures shared neighbourhood.Dynamic tree cut — hierarchical clustering on
1 - TOM; tree cut yields gene modules (named by colour).Module eigengenes — first principal component of each module’s expression matrix.
Module-trait correlation — Pearson correlation of each module eigengene against numeric sample traits, with FDR-corrected p-values.
- Parameters:
name (str) – Analysis label, used for output file names.
species (str) – Organism (e.g.
"mus musculus","homo sapiens").geneExp (pandas.DataFrame) – Expression matrix shaped (genes × samples). Sample identifiers are the column names, gene identifiers are the index. Note this is the TRANSPOSE of the typical samples × genes layout used by AnnData.
TPMcutoff (float, default 1) – Per-gene TPM threshold; genes whose maximum across samples falls below this are dropped during
preprocess.powers (list[int], optional) – Candidate soft-threshold powers. Defaults to a 1–30 sweep.
networkType ({"signed", "unsigned", "signed hybrid"}) – How adjacency is computed from correlation.
minModuleSize (int, default 50) – Smallest module size kept by the dynamic tree cut.
save (bool, default False) – Whether to persist results to disk.
Notes
Wide expression CSVs are usually shaped samples × genes; remember to pass
data.Tso the constructor receives genes × samples.Methods (call in this order — each step populates the attributes listed under it). Use the high-level
runWGCNA()to chain everything end-to-end, or the explicit methods below for finer control:preprocess()— drop low-TPM genes, drop outlier samples (updatesself.datExpr).calculate_soft_threshold()— scale-free fit power scan; setsself.power(int, notself.softPower) andself.sft(DataFrame with R²/slope/k per power).calculating_adjacency_matrix()— setsself.adjacency.calculating_TOM_similarity_matrix()— setsself.TOM.calculate_geneTree()— setsself.geneTree(linkage matrix).calculate_dynamicMods(kwargs_function={...})— setsself.dynamicModsandself.datExpr.var['dynamicColors'].calculate_gene_module(kwargs_function={...})— merges close modules, setsself.datExpr.var['moduleColors'],self.datExpr.var['moduleLabels'],self.MEs,self.datME.findModules()— convenience that runs the soft-threshold + adjacency + TOM + tree + module merge as one call (preferred).runWGCNA()— runspreprocess()thenfindModules().analyseWGCNA(geneList=None)— module–trait correlation; setsself.moduleTraitCorandself.moduleTraitPvalue. Requires sample metadata (set viaupdateSampleInfo(...)or passed viasampleInfoat construction).
Attributes (state machine — populated in this order). The class is a thin shim that delegates to the upstream PyWGCNA implementation; these are the actual attribute names on the returned instance, which agents commonly mis-spell:
self.geneExpr— AnnData (genes × samples) holding the original input expression.self.datExpr— AnnData (genes × samples), filtered afterpreprocess(). Per-gene module annotations live onself.datExpr.var.self.power(int) — chosen soft-threshold power. The attribute is ``power``, NOT ``softPower``. Set aftercalculate_soft_threshold()orfindModules(); before that it is0.self.sft(pandas.DataFrame) — scale-free fit table per candidate power (columns:Power,SFT.R.sq,slope,mean(k), …). Set together withself.power.self.adjacency(pandas.DataFrame) — gene-gene weighted adjacency.Noneuntilcalculating_adjacency_matrix()/findModules()runs.self.TOM(numpy.ndarray) — topological overlap matrix.Noneuntilcalculating_TOM_similarity_matrix()/findModules()runs.self.geneTree— scipy linkage matrix from1 - TOM.self.dynamicMods— initial dynamic-tree-cut module integer labels per gene.self.datExpr.var['dynamicColors']— initial module colour per gene (string, e.g.'turquoise').self.datExpr.var['moduleColors']— final module colour per gene (after merging close modules). Use this for downstream.self.datExpr.var['moduleLabels']— integer label per gene aligned tomoduleColors.self.MEs(pandas.DataFrame) — module eigengenes, samples × modules. Do not compute this manually — the class already provides it; manual mean-by-mask is not equivalent (eigengene = first PC of the module’s expression, not the mean).self.datME— pre-merge eigengene matrix; usuallyself.MEsis what you want.self.moduleTraitCor(pandas.DataFrame) — module × trait Pearson correlations.NoneuntilanalyseWGCNA()runs.self.moduleTraitPvalue(pandas.DataFrame) — parallel p-value table.NoneuntilanalyseWGCNA()runs.
Examples
>>> import pandas as pd, omicverse as ov >>> data = pd.read_csv('expressionList.csv', index_col=0) >>> wgcna = ov.bulk.pyWGCNA( ... name='5xFAD', ... species='mus musculus', ... geneExp=data.T, # transpose to genes × samples ... TPMcutoff=1, ... networkType='signed hybrid', ... ) >>> wgcna.preprocess() >>> wgcna.findModules()