Raw GWAS to Results¶
Use this tutorial when your starting point is whole-genome summary statistics. You will clean the files, split them into loci, build LD inputs, and run fine-mapping.
What You Need¶
For each population or cohort:
- one GWAS summary statistics file,
- sample size,
- population label,
- cohort name,
- LD reference files in PLINK format (
.bed,.bim,.fam).
The example below uses the small mock data in exampledata/test_mock_data.
Step 1: Write a Population Config¶
Create a tab-separated file with one row per study.
cat > population_config.tsv <<'EOF'
popu cohort sample_size path ld_ref
EUR cohort1 10000 exampledata/test_mock_data/EUR_all_loci.sumstats exampledata/test_mock_data/EUR_all_loci
AFR cohort1 8000 exampledata/test_mock_data/AFR_all_loci.sumstats exampledata/test_mock_data/AFR_all_loci
EAS cohort1 12000 exampledata/test_mock_data/EAS_all_loci.sumstats exampledata/test_mock_data/EAS_all_loci
EOF
The path column points to raw summary statistics. The ld_ref column is the
PLINK prefix, without .bed, .bim, or .fam.
Why this file matters
CREDTOOLS carries this metadata forward. Later outputs can still tell which row came from EUR, AFR, EAS, or any cohort labels you used.
Step 2: Munge the Summary Statistics¶
Munging does three things:
- renames common columns into the CREDTOOLS format,
- removes obvious bad rows,
- creates a stable
SNPIDfrom chromosome, position, and alleles.
After this step, check:
This updated config points to the munged files and keeps the original metadata.
Step 3: Chunk the Genome Into Loci¶
Chunking finds significant regions, cuts the summary statistics to each region, and extracts LD matrices from the PLINK reference panels.
The handoff file is:
Open it once. It should have columns like:
The prefix column is important. CREDTOOLS uses it to find:
Step 4: Run the Full Pipeline¶
credtools pipeline \
work/chunks/loci_list.txt \
work/results \
--tool susie \
--meta-method meta_all
For each locus_id, the pipeline creates a subdirectory:
Inside each locus directory, look for:
Step 5: Make a Quick Plot¶
credtools plot \
work/results/locus_1 \
--type summary \
--output work/results/locus_1/qc_summary.png
Use plots as a fast check. Use the tables when you need exact values.
When Something Fails¶
Start with run_summary.log and overall_run_summary.log. Most failures are
path or input-shape problems:
- a
prefixpoints to files that do not exist, - LD and summary statistics do not share enough variants,
- an external tool is not installed,
- a locus is too large for available memory.
See Troubleshooting for the common fixes.