Quick Start Guide¶
This guide will get you running your first CREDTOOLS analysis in just a few minutes using the credtools pipeline
command - the easiest way to perform end-to-end multi-ancestry fine-mapping.
What is credtools pipeline
?¶
The credtools pipeline
command runs the complete CREDTOOLS workflow in a single command:
- Quality control
- Meta-analysis
- Fine-mapping
- Results aggregation
Input Data Format¶
CREDTOOLS requires a tab-separated file describing your loci and studies. Here's the required format:
Column | Description | Example |
---|---|---|
chr |
Chromosome | 8 |
start |
Start position (bp) | 41242482 |
end |
End position (bp) | 42492482 |
popu |
Population/ancestry | EUR , AFR , SAS , HIS |
sample_size |
Sample size | 337465 |
cohort |
Cohort/study name | UKBB , MVP |
prefix |
File path prefix | /path/to/data/EUR.UKBB.chr8_41242482_42492482 |
locus_id |
Locus identifier | chr8_41242482_42492482 |
File Structure
For each prefix
, CREDTOOLS expects these files:
{prefix}.sumstats
- Summary statistics{prefix}.ld
or{prefix}.ld.npz
- LD matrix{prefix}.ldmap
- LD matrix variant map
Example Input File¶
chr start end popu sample_size cohort prefix locus_id
8 41242482 42492482 AFR 89499 MVP data/AFR.MVP.chr8_41242482_42492482 chr8_41242482_42492482
8 41242482 42492482 EUR 337465 MVP data/EUR.MVP.chr8_41242482_42492482 chr8_41242482_42492482
8 41242482 42492482 EUR 442817 UKBB data/EUR.UKBB.chr8_41242482_42492482 chr8_41242482_42492482
Basic Usage¶
Simple Cross-Ancestry Analysis¶
This command:
- Combines all studies across ancestries (
meta_all
) - Uses multi-input strategy with MultiSuSiE
- Outputs results to
output_dir/
Population-Specific Analysis¶
credtools pipeline my_loci.txt output_dir \
--tool susie \ --threads 4 \
--max-causal 5 \
--credible-level 0.95
This command:
- Meta-analyzes within each ancestry separately (
meta_by_population
) - Runs SuSiE on each population, then combines results (
post_hoc_combine
)
Understanding the Output¶
After running credtools pipeline
, you'll find these files in your output directory:
Meta-Analysis Results¶
output_dir/
├── {locus_id}.{popu}.{cohort}.sumstat # Meta-analyzed summary stats
├── {locus_id}.{popu}.{cohort}.ld.npz # Meta-analyzed LD matrix
└── {locus_id}.{popu}.{cohort}.ldmap # LD variant mapping
Quality Control Reports¶
output_dir/
├── s_estimate.txt # Inconsistency parameter estimates
├── kriging_rss.txt # Allele switch detection
├── maf_comparison.txt # MAF consistency across studies
├── cochran_q.txt # Heterogeneity testing
└── ld_structure.txt # LD matrix eigenanalysis
Fine-Mapping Results¶
output_dir/
├── pips.txt # Posterior inclusion probabilities
└── creds.json # Credible sets information
Interpreting Results¶
Posterior Inclusion Probabilities (PIPs)¶
The pips.txt
file contains PIPs for each variant:
- Values range from 0 to 1
- Higher values indicate stronger evidence for causality
- Typically, variants with PIP > 0.1 are considered noteworthy
Credible Sets¶
The creds.json
file contains credible sets - groups of variants that collectively have high probability of containing the causal variant:
{
"credible_sets": {
"cs1": {
"variants": ["8-41235678-C-T", "8-41235680-A-G"],
"coverage": 0.95,
"total_pip": 0.96
}
}
}
Common Options¶
Meta-Analysis Methods¶
# Combine all studies regardless of ancestry
--meta-method meta_all
# Combine studies within each ancestry separately
--meta-method meta_by_population
# Keep all studies separate (no meta-analysis)
--meta-method no_meta
Fine-Mapping Tools¶
# General purpose, robust
--tool susie
# Multi-ancestry designed tools
--tool multisusie
--tool susiex
# Bayesian model averaging
--tool finemap
# Simple Bayes factors
--tool abf
Quality Control¶
# Skip QC (faster but not recommended)
--skip-qc
# Include QC (default, recommended)
# No flag needed - QC runs by default
Troubleshooting¶
Common Issues
- File not found errors
- Check that your file paths in the input table are correct
- Ensure summary statistics and LD files exist for each prefix
- Memory errors
- Large LD matrices can consume significant memory
- Consider analyzing loci one at a time for very large regions
- Tool-specific errors
- Some tools have specific requirements (see tool documentation)
- Try SuSiE first as it's the most robust default option
Performance Tips
- Start with smaller regions to test your setup
- Use
--tool susie
for initial exploration (fastest, most reliable) - Save QC results to identify problematic studies before fine-mapping
Next Steps¶
Once you've run your first analysis:
- Single-Input Fine-Mapping - Learn about analyzing individual studies
- Multi-Input Fine-Mapping - Deep dive into multi-ancestry analysis
- Advanced Topics - Customize parameters and understand tool options
Example with Real Data¶
Using the included example data:
# Navigate to example data directory
cd exampledata/
# Run pipeline on example locus
credtools pipeline test_loci.txt results/ \
--tool susie \ --threads 4
This will analyze the multi-ancestry example data and produce results in the results/
directory.