omicverse.micro.fetch_franzosa_ibd_2019#
- omicverse.micro.fetch_franzosa_ibd_2019(data_dir, overwrite=False, microbe_count_scale=1000000.0)[source]#
Download + parse the Franzosa et al. 2019 paired IBD dataset.
Files are fetched from the Borenstein lab’s curated
microbiome-metabolome-curated-dataGitHub repository — three TSVs (genera.tsv, mtb.tsv, metadata.tsv) totalling about 30 MB. Once the files exist indata_dirthe function is offline.- Parameters:
data_dir (
str) – Absolute path the three TSVs are cached under. No$HOMEfallback — you pick where it goes (recommended: a scratch directory).overwrite (
bool(default:False)) – Re-download even if the files already exist.microbe_count_scale (
float(default:1000000.0)) – The Borenstein TSV delivers per-sample relative abundances. To make the tables look like familiar 16S count matrices (integer counts, range 10⁰–10⁵) we multiply by this scale and round — a pseudo-count-per-million by default. Pass 1.0 to keep proportions (most useful if you plan to CLR-transform immediately and don’t need integer counts). All downstream ov.micro APIs (filter_by_prevalence,paired_spearman,paired_cca,MMvec) work on either, butmin_countfilters expect counts ≥ 1.
- Return type:
- Returns:
(adata_microbe, adata_metabolite)— two AnnDatas sharingobs_names(same 220 samples, same order). The microbevarcarries parsed GTDB 7-rank taxonomy (``domain / phylum / class /
order / family / genus / species`` + the raw GTDB string as
taxonomy). The metabolitevarcarries the cluster ID and,where annotated, the HMDB name (
namecolumn;NaNforunannotated clusters). Both
obsframes carry the same cohortmetadata from metadata.tsv (
Study.Group= CD / UC / Control,Subject,Age,Fecal.Calprotectin, drug covariates).