Skip to content

Meta-Analysis

The credtools meta command performs meta-analysis of summary statistics and LD matrices across multiple ancestries or studies. This step combines evidence from different populations to improve fine-mapping resolution and power.

Overview

Meta-analysis in credtools integrates genetic evidence across populations while accounting for different LD structures and effect sizes. The meta-analysis process:

  • Combines summary statistics using inverse-variance weighted fixed-effects meta-analysis
  • Merges LD matrices using sample-size weighted averaging
  • Computes heterogeneity metrics before combining data
  • Creates unified datasets for downstream fine-mapping
  • Supports flexible strategies — combine all, by population, or keep separate
  • Preserves ancestry-specific information when needed

Why meta-analyze?

Multi-ancestry fine-mapping leverages diverse LD patterns across populations to narrow credible sets. Even when effect sizes are similar, different LD structures help distinguish causal variants from their tagging neighbors.

Quick Start

credtools meta chunked/loci_list.txt meta_output/ --meta-method meta_all
credtools meta chunked/loci_list.txt meta_output/ --meta-method meta_by_population
credtools meta chunked/loci_list.txt meta_output/ --meta-method no_meta

Try It with Test Data

credtools meta exampledata/test_meta/loci_list.txt /tmp/meta_output/ \
  --meta-method meta_all \
  --threads 2

Input Format

The input is a tab-delimited loci_list.txt file produced by the credtools chunk step. It must contain the following columns:

Column Type Description
locus_id str Locus identifier (e.g., chr1_1000_3000)
prefix str File path prefix for sumstats/LD files
popu str Population code (e.g., EUR, AFR, EAS)
cohort str Cohort or study name
sample_size int Sample size for this cohort
chr int Chromosome number
start int Locus start position
end int Locus end position

Each locus_id can have multiple rows representing different cohorts/populations. Rows with the same locus_id must share the same chr, start, and end values.

Tip

The input file is typically chunked/loci_list.txt generated by credtools chunk. You do not need to create it manually.

Command Reference

credtools meta [OPTIONS] INPUTS OUTDIR

Arguments:

Argument Description
INPUTS Path to loci list file (tab-delimited)
OUTDIR Output directory for meta-analysis results

Options:

Option Short Description Default
--meta-method -m Meta-analysis method (meta_all, meta_by_population, no_meta) meta_all
--threads -t Number of parallel threads 1
--calculate-lambda-s -cls Calculate lambda_s parameter using estimate_s_rss False
--log-file -l Write log output to a file None

Meta-Analysis Methods

Combines all ancestries into a single meta-analyzed dataset per locus.

  • Maximizes statistical power by pooling all available data
  • Summary statistics combined via inverse-variance weighting (IVW)
  • LD matrices combined via sample-size weighted averaging
  • Population and cohort labels joined with + (e.g., AFR+EUR)
  • Best for traits with consistent effects across ancestries

Performs meta-analysis within each population separately.

  • Groups input loci by population code
  • Multi-cohort populations are meta-analyzed (same as meta_all within group)
  • Single-cohort populations are intersected without meta-analysis
  • Preserves population-specific LD patterns
  • Suitable when effect sizes differ between populations

Processes each cohort independently — no combining.

  • Each input locus is intersected (sumstats ∩ LD) but not combined
  • Preserves all population- and cohort-specific information
  • Required for multi-ancestry fine-mapping tools (e.g., SuSiEx, MuSuSiE)
  • Useful for comparing results across populations

Output Format

Output Files

Each locus produces a directory containing meta-analyzed data and heterogeneity assessments:

meta_output/
├── loci_info.txt                                    # Updated loci info for downstream steps
├── heterogeneity.txt.gz                             # Global heterogeneity summary
├── chr1_1000_3000/                                  # Per-locus directory
│   ├── AFR+EUR_meta2cohorts_a1b2c3d4.sumstats.gz   # Meta-analyzed summary statistics
│   ├── AFR+EUR_meta2cohorts_a1b2c3d4.ld.npz        # Meta-analyzed LD matrix (float16)
│   ├── AFR+EUR_meta2cohorts_a1b2c3d4.ldmap.gz      # LD variant map
│   ├── heterogeneity.txt.gz                         # Per-locus heterogeneity summary
│   ├── ld_4th_moment.txt.gz                         # LD 4th moment metric
│   ├── ld_decay.txt.gz                              # LD decay analysis
│   ├── cochran_q.txt.gz                             # Cochran's Q (multi-cohort only)
│   └── snp_missingness.txt.gz                       # SNP missingness (multi-cohort only)
└── chr2_5000_8000/
    └── ...

The file prefix follows the pattern {popu}_{cohort} for single cohorts, or {popu}_meta{N}cohorts_{hash} for meta-analyzed results.

Output Columns (sumstats.gz)

Meta-analyzed summary statistics contain the standard credtools schema:

Column Type Description
SNPID str Unique variant identifier (chr-bp-allele1-allele2)
CHR int8 Chromosome
BP int32 Base pair position
EA str Effect allele
NEA str Non-effect allele
EAF float32 Sample-size weighted effect allele frequency
BETA float32 IVW meta-analysis effect size
SE float32 Meta-analysis standard error
P float64 Meta-analysis p-value

Heterogeneity Analysis

Before meta-analysis combines data, credtools automatically computes heterogeneity metrics to assess consistency across input cohorts. This helps identify loci where meta-analysis may be inappropriate.

When is heterogeneity computed?

Heterogeneity is always computed on the original per-cohort data before any combining. The results are saved even when using no_meta mode.

Per-Locus Heterogeneity Files

File Description When produced
ld_4th_moment.txt.gz 4th moment of LD matrix entries per cohort — indicates LD distribution shape Always
ld_decay.txt.gz LD decay rate per cohort — how LD decreases with genomic distance Always
cochran_q.txt.gz Cochran's Q test for effect size heterogeneity across cohorts Multi-cohort only
snp_missingness.txt.gz SNP presence/absence matrix across cohorts Multi-cohort only

Heterogeneity Summary Table

The heterogeneity.txt.gz file (both per-locus and global) contains one row per cohort:

Column Type Description
locus_id str Locus identifier
popu str Population code (e.g., EUR, AFR)
cohort str Cohort name (e.g., UKB, MVP)
ld_4th_moment_mean float Mean 4th moment of LD matrix for this cohort
ld_decay_rate float Exponential decay rate of LD with distance
missing_rate float Fraction of SNPs missing relative to the union
cochran_q_median float Median Cochran's Q statistic across SNPs
i_squared_median float Median I² heterogeneity index across SNPs
n_het_snps int Number of SNPs with significant heterogeneity (Q p-value < 0.05)

Interpreting heterogeneity

  • High ld_decay_rate differences between cohorts suggest divergent LD structures
  • Large n_het_snps indicates many variants with inconsistent effect sizes
  • High missing_rate for a cohort means poor variant overlap — consider filtering or using no_meta

Choosing the Right Method

Scenario Recommended Method
Effect sizes consistent across populations meta_all
Maximum statistical power needed meta_all
Multiple studies per ancestry, some cross-ancestry heterogeneity meta_by_population
Studies within ancestry are more homogeneous than across meta_by_population
Effect sizes differ substantially between populations no_meta
Using multi-ancestry tools (SuSiEx, MuSuSiE) no_meta
Comparing ancestry-specific signals no_meta

Integration with Workflow

Meta-analysis fits into the credtools pipeline after chunking:

graph LR
    A[Munged GWAS files] -->|credtools chunk| B[Per-locus files + LD matrices]
    B -->|credtools meta| C[Meta-analyzed loci + heterogeneity]
    C -->|credtools qc| D[QC'd loci]
    D -->|credtools finemap| E[Credible sets]
# Step 1: Identify loci, chunk data, and extract LD matrices
credtools chunk munged/sumstat_info_updated.txt chunked/

# Step 2: Perform meta-analysis
credtools meta chunked/loci_list.txt meta/ --meta-method meta_all

# Step 3: Run quality control (optional)
credtools qc meta/loci_info.txt qc_results/

# Step 4: Perform fine-mapping
credtools finemap meta/loci_info.txt finemap_results/

Troubleshooting

Memory issues with large datasets

Reduce the number of threads or process subsets of loci separately. Large LD matrices can consume significant RAM — each N×N float32 matrix uses ~4N² bytes.

Inconsistent ancestry labels

Ensure ancestry identifiers match exactly between summary statistics and LD matrices. Labels are case-sensitive (EUReur).

Missing input files

Check that all required files from the credtools chunk step are present. Each row in the loci list requires {prefix}.sumstats.gz, {prefix}.ld.npz, and {prefix}.ldmap.gz.

ValueError: All input loci must have the same start/end position

This occurs when loci grouped by locus_id have different boundary coordinates. Ensure the start and end columns are consistent within each locus_id group.

Some loci produce warnings about curve fitting

The LD decay analysis uses exponential curve fitting, which may warn when the data does not fit well. This is informational and does not affect meta-analysis results.

Best Practices

Recommendations

  1. Review heterogeneity first — check heterogeneity.txt.gz to identify problematic loci before proceeding to fine-mapping
  2. Start with meta_all for maximum power, then compare with meta_by_population if heterogeneity is high
  3. Use --log-file for a detailed audit trail of all processing steps
  4. Match thread count to cores — generally use 1 thread per CPU core, reduce for memory-intensive analyses
  5. Use fast storage — place the output directory on SSD when possible for large studies
  6. Keep intermediate results — meta-analysis outputs are needed for reproducibility
  7. Use consistent identifiers — maintain consistent locus and ancestry naming throughout your workflow