Skip to content

Known Limitations and Gotchas

Most CREDTOOLS runs fail for simple reasons: a placeholder in loci_list.txt, an LD file that was never created, or an external tool that is not on PATH. This page lists the current traps so you can check them before starting a long run.

Read this before a genome-wide run

Run the first locus end to end before launching hundreds of loci. It is much cheaper to find a schema or environment problem on one locus.

Current Limits

Area What happens now What to do
VCF LD extraction credtools chunk --ld-format vcf is accepted by the CLI, but VCF LD extraction is not implemented. Use PLINK .bed/.bim/.fam references and keep --ld-format plink.
Custom chunks --custom-chunks reads chr, start, and end, then assigns the internal ancestry label custom. This may not match your real population labels. For now, prefer an explicit loci_list.txt when you already know the regions.
Auto-created sample size chunk can write sample_size=50000 as a placeholder in loci_list.txt. Replace it with the real cohort sample size before meta, qc, finemap, or pipeline.
Auto-created cohort label chunk may set cohort equal to the ancestry label. Edit cohort if you need study-level labels such as UKBB, MVP, or BBJ.
Direct chunk input Passing raw file paths to chunk skips LD extraction because no ld_ref is available. Use a population config with ld_ref, run credtools prepare with a genotype config, or provide pre-generated LD files yourself.
ABF without LD The ABF method itself can run without LD in Python, but the current CLI locus loader expects {prefix}.ld or {prefix}.ld.npz and a matching {prefix}.ldmap. Use the Python API for a true no-LD ABF run, or provide LD files for CLI workflows.
FINEMAP MAF FINEMAP requires a MAF column after CREDTOOLS loads the locus. Make sure EAF is present so CREDTOOLS can derive MAF, or provide MAF in prepared inputs.
Multi-input tools susiex, multisusie, and mesusie analyze all rows in a locus together. Keep rows for the same locus_id aligned to the same chromosome, start, and end.

The loci_list.txt Check

Before running pipeline, open the generated loci list:

head -n 5 work/chunks/loci_list.txt

Check these columns first:

Column Check
sample_size not the placeholder 50000 unless that is really correct
popu matches the population label you want in output files
cohort matches the study or cohort name you want in reports
prefix points to files that actually exist

Use a quick file check:

prefix=$(awk 'NR==2 {print $8}' work/chunks/loci_list.txt)
ls "${prefix}.sumstats.gz" "${prefix}.ld.npz" "${prefix}.ldmap.gz"

If this fails, fix the input files before running the full pipeline.

Custom Regions

Custom region files use this shape:

chr start   end
1   1000000 1500000
1   5000000 5600000

At the moment, this path is best for region discovery and manual inspection, not for a fully automatic LD-prepared pipeline. If you already know the regions and want a reliable production run, make a loci_list.txt directly:

locus_id    chr start   end popu    cohort  sample_size prefix
chr1_1000000_1500000    1   1000000 1500000 EUR UKBB    400000  data/EUR_UKBB_chr1_1000000_1500000

That direct loci list is the most explicit handoff into qc, finemap, and pipeline.

Empty Results Are Not Always Errors

Most fine-mapping wrappers check whether any variant passes --significant-threshold before doing expensive work. If no variant passes, the result can be a valid empty credible set:

  • n_cs = 0
  • all PIPs set to zero
  • no lead SNPs

This usually means the locus did not pass the significance threshold used for fine-mapping. It is different from a tool crash.

First Run Checklist

  • Run one locus with --log-file first_locus.log.
  • Confirm pips.txt.gz and credible_sets_summary.txt.gz are written.
  • Confirm run_summary.log has Failed: 0.
  • Plot one locus with credtools plot.
  • Only then scale to all loci.