Meta-Analysis¶
The credtools meta
command performs meta-analysis of summary statistics and LD matrices across multiple ancestries or studies. This step combines evidence from different populations to improve fine-mapping resolution and power.
Overview¶
Meta-analysis in credtools integrates genetic evidence across populations while accounting for different LD structures and effect sizes. The meta-analysis process:
- Combines summary statistics using inverse variance weighting or other methods
- Merges LD matrices appropriately across ancestries
- Handles population-specific effects and heterogeneity
- Creates unified datasets for downstream fine-mapping
- Supports flexible strategies for different study designs
- Preserves ancestry-specific information when needed
When to Use¶
Use credtools meta
when you have:
- Fine-mapping data from multiple ancestries or populations
- Studies with overlapping genetic signals that could benefit from combined analysis
- Need to increase statistical power through meta-analysis
- Want to identify ancestry-specific vs shared causal variants
- Prepared locus files from multiple populations ready for integration
Basic Usage¶
Standard Meta-Analysis¶
Population-Specific Meta-Analysis¶
Skip Meta-Analysis¶
Command Options¶
Option | Description | Default |
---|---|---|
--meta-method / -m |
Meta-analysis method | meta_all |
--threads / -t |
Number of parallel threads | 1 |
Meta-Analysis Methods¶
meta_all¶
Combines all ancestries into a single meta-analyzed dataset for each locus. This approach: - Maximizes statistical power by pooling all available data - Assumes similar effect sizes across populations - Creates one unified summary statistic per locus - Best for traits with consistent effects across ancestries
meta_by_population¶
Performs meta-analysis within ancestry groups while keeping populations separate. This approach: - Maintains population-specific effect estimates - Allows for heterogeneity between ancestries - Creates separate datasets for each major ancestry group - Suitable when effect sizes differ significantly between populations
no_meta¶
Processes each ancestry/study separately without meta-analysis. This approach: - Preserves all population-specific information - Enables ancestry-specific fine-mapping - Useful for identifying population-specific causal variants - Required when populations have very different LD structures
Expected Output¶
The meta-analysis process creates:
- Meta-analyzed summary statistics: Combined effect estimates for each method
- Merged LD matrices: Appropriately weighted correlation matrices
- Loci information files: Updated metadata for downstream analysis
- Processing logs: Detailed information about meta-analysis steps
Output Structure¶
meta_output/
├── meta_all/ # Results from meta_all method
│ ├── locus_1/
│ │ ├── sumstats.txt
│ │ ├── ldmatrix.txt
│ │ └── locus_info.json
│ └── locus_2/
├── meta_by_population/ # Results from meta_by_population method
│ ├── EUR/
│ ├── ASN/
│ └── AFR/
└── no_meta/ # Results from no_meta method
├── study1/
├── study2/
└── study3/
Examples¶
Example 1: Standard Multi-Ancestry Meta-Analysis¶
# Perform meta-analysis across all ancestries
credtools meta prepared/final_loci_list.txt meta_results/ \
--meta-method meta_all \
--threads 4
# Output: Combined datasets maximizing power across populations
Example 2: Population-Specific Analysis¶
# Keep ancestry groups separate but meta-analyze within groups
credtools meta prepared/final_loci_list.txt meta_results/ \
--meta-method meta_by_population \
--threads 8
# Useful when you have multiple studies per ancestry
Example 3: No Meta-Analysis (Individual Studies)¶
# Process each study/ancestry independently
credtools meta prepared/final_loci_list.txt meta_results/ \
--meta-method no_meta \
--threads 2
# Preserves all study-specific information
Example 4: High-Performance Processing¶
# Use maximum threads for large datasets
credtools meta prepared/final_loci_list.txt meta_results/ \
--meta-method meta_all \
--threads 16
# Recommended for genome-wide studies with many loci
Integration with Workflow¶
Meta-analysis fits into the credtools workflow after preparation:
# 1. Prepare LD matrices and inputs
credtools prepare chunked/chunk_info.txt genotype_config.json prepared/
# 2. Perform meta-analysis
credtools meta prepared/final_loci_list.txt meta/ --meta-method meta_all
# 3. Run quality control (optional)
credtools qc meta/meta_all/loci_list.txt qc_results/
# 4. Perform fine-mapping
credtools finemap meta/meta_all/loci_list.txt finemap_results/
Choosing the Right Method¶
Use meta_all when:¶
- Effect sizes are consistent across populations
- You want maximum statistical power
- Trait biology suggests similar mechanisms across ancestries
- Sample sizes are relatively balanced
Use meta_by_population when:¶
- You have multiple studies per ancestry
- Some heterogeneity in effect sizes is expected
- You want to preserve some population structure
- Studies within ancestry are more homogeneous than across ancestries
Use no_meta when:¶
- Effect sizes differ substantially between populations
- You want to identify population-specific signals
- Studies have very different designs or phenotype definitions
- You plan to compare results across populations
Troubleshooting¶
Common Issues¶
Memory issues with large datasets: Reduce the number of threads or process subsets of loci separately.
Inconsistent ancestry labels: Ensure ancestry identifiers match between summary statistics and LD matrices.
Missing files: Check that all required input files from the prepare step are present and accessible.
Convergence issues: Some loci may fail meta-analysis due to data quality issues - these will be logged and skipped.
Quality Checks¶
The meta command automatically performs several quality checks: - Validates input file formats and required columns - Checks for matching variants between summary statistics and LD matrices - Identifies and handles allele mismatches - Reports summary statistics for each locus processed
Performance Optimization¶
- Use appropriate thread counts: Generally use 1 thread per CPU core, but reduce for memory-intensive analyses
- Process by chromosome: For very large studies, consider splitting by chromosome first
- Monitor memory usage: Large LD matrices can consume significant RAM
- Use fast storage: Place output directory on fast SSD storage when possible
Tips for Success¶
- Validate inputs first: Always run the prepare step successfully before meta-analysis
- Check ancestry matching: Ensure genetic ancestry matches between summary stats and reference panels
- Document your choices: Record which meta-analysis method you used and why
- Save intermediate results: Keep meta-analysis outputs for reproducibility
- Plan for heterogeneity: Consider population-specific analysis if you observe large effect differences
- Use consistent identifiers: Maintain consistent locus and ancestry naming throughout your workflow