Skip to content

Core Concepts

This page explains the words used throughout the docs. You do not need to be a fine-mapping expert to use CREDTOOLS, but these terms make the commands easier to understand.

Summary Statistics

Summary statistics are the GWAS results table. Each row is a variant. The important columns are usually:

Column Meaning
CHR chromosome
BP base-pair position
EA effect allele
NEA non-effect allele
BETA effect size
SE standard error
P p-value
EAF effect allele frequency, if available
N sample size, if available

Different tools and cohorts often use different column names. credtools munge turns them into one common format.

LD Matrix

LD means linkage disequilibrium. In practical terms, it tells CREDTOOLS how correlated nearby variants are.

CREDTOOLS expects an LD matrix plus a map file:

my_locus.ld.npz
my_locus.ldmap

The .ldmap file tells CREDTOOLS which variant belongs to each row and column of the matrix. The order must match.

LD and summary statistics must describe the same variants

Fine-mapping can fail or produce bad results when the LD matrix and summary statistics use different alleles, positions, or variant ordering. CREDTOOLS checks and intersects them, but the cleaner your inputs are, the better.

Locus

A locus is one genomic region, such as chr9:21900000-22100000.

CREDTOOLS uses loci because fine-mapping is local. You do not usually fine-map the whole genome as one huge matrix. You split the genome into regions, then analyze each region.

Locus Set

A locus set is the same locus measured in one or more studies.

For example, the same region may appear in:

Population Cohort Sample size
EUR UKBB 400000
AFR MVP 90000
EAS BBJ 180000

CREDTOOLS can meta-analyze these rows, run them separately, or pass them to a multi-input fine-mapping tool.

PIP

PIP means posterior inclusion probability. It is the model's estimate that a variant is causal.

The value is between 0 and 1:

  • 0.90 means strong evidence for that variant.
  • 0.10 means weaker but still worth checking.
  • 0.00 means little support in the current model.

PIP is not a p-value. It is a fine-mapping probability after considering the variants in the locus and the LD pattern.

Credible Set

A credible set is a small group of variants that should contain a causal variant with a chosen coverage, often 95%.

If a credible set has 95% coverage, the model is saying: "given the data and assumptions, this set should contain the causal variant with probability 0.95."

Small credible sets are easier to interpret. Large credible sets usually mean the data or LD structure cannot separate the variants well.

Meta-Analysis Method

The --meta-method flag controls how CREDTOOLS combines studies before fine-mapping:

Method What it does Use when
meta_all combine all rows into one analysis input you want maximum power and expect shared effects
meta_by_population combine cohorts within each population you want population-level results
no_meta keep every row separate you want to preserve each cohort or use multi-input tools

Fine-Mapping Tool

The --tool flag chooses the statistical engine. A safe first choice is susie. For multi-ancestry joint analysis, look at multisusie and susiex.

You do not need to pick the perfect tool on day one. Start with SuSiE, inspect QC, then compare tools if the locus matters.

QC

QC checks whether the summary statistics and LD make sense together. CREDTOOLS looks for issues such as:

  • allele or sign mismatches,
  • outlier variants,
  • missing variants across studies,
  • unusual LD structure,
  • heterogeneity across cohorts.

QC does not prove the result is correct. It tells you where to look before you trust the result.