Skip to content

File Schemas

This page is the strict version of the input and output file formats. Use it when you are creating files outside CREDTOOLS or checking a failed run.

Population Config

Used by credtools munge:

popu    cohort  sample_size path
EUR UKBB    400000  /data/EUR.sumstats.gz

Used by credtools chunk when LD extraction is needed:

popu    cohort  sample_size path    ld_ref
EUR UKBB    400000  work/munged/EUR_UKBB.munged.txt.gz  /ref/EUR
Column Type Required Meaning
popu string yes population or ancestry label
cohort string yes cohort or study label
sample_size integer yes cohort sample size
path path yes summary statistics file
ld_ref PLINK prefix for chunk LD extraction path prefix for .bed/.bim/.fam

Raw Summary Statistics Aliases

credtools munge can recognize common raw headers.

CREDTOOLS column Common aliases
CHR CHROM, #CHROM, chromosome, Chromosome
BP POS, Position, position, base_pair_location, pos
SNPID SNP, MarkerName, variant, ID
EA A1, effect_allele, ALT, Allele1
NEA A2, other_allele, REF, Allele2
EAF FRQ, FREQ, frequency, Freq1
MAF MAF
BETA beta, Beta, effect, Effect
SE StdErr, stderr, standard_error
P PVAL, P_BOLT_LMM, pvalue, P-value, p_value
N n, sample_size, NMISS
INFO info, imputation_quality
Z STAT, zscore, z_score
RSID rsid, rs

When aliases are not enough, pass a JSON mapping to munge.

Munged Summary Statistics

credtools munge writes:

CHR BP  SNPID   EA  NEA EAF BETA    SE  P   N   RSID

Prepared locus files loaded by fine-mapping may also include MAF, which is derived from EAF when CREDTOOLS loads and munges the locus file.

Column Required for fine-mapping Notes
SNPID yes generated as chr-bp-sortedAllele1-sortedAllele2
CHR, BP yes chromosome and base-pair position
EA, NEA yes effect and non-effect allele
EAF strongly recommended used to derive MAF
MAF required by FINEMAP derived during loading when EAF exists
BETA, SE, P yes model inputs
N recommended sample size also comes from loci_list.txt
RSID optional carried into reports when present

Loci List

Used by prepare, meta, qc, finemap, and pipeline.

locus_id    chr start   end popu    cohort  sample_size prefix
locus_1 1   50000000    50500000    EUR UKBB    400000  data/EUR_UKBB_locus_1
locus_1 1   50000000    50500000    AFR MVP 90000   data/AFR_MVP_locus_1
Column Type Rule
locus_id string rows with the same value are analyzed together
chr integer must be the same for all rows in one locus_id
start integer must be positive
end integer must be greater than start
popu string population label
cohort string cohort label
sample_size integer must be positive
prefix path prefix no file extension

Each popu + cohort + locus_id combination must be unique.

Genotype Config

Used by credtools prepare.

JSON:

{
  "EUR": "/ref/ukb_eur",
  "AFR": "/ref/1kg_afr"
}

TSV:

popu    ld_ref
EUR /ref/ukb_eur
AFR /ref/1kg_afr
Column Type Rule
popu string must match popu in the prepare input
ld_ref path prefix PLINK prefix without .bed, .bim, or .fam

Files Behind prefix

CREDTOOLS checks these names:

Data Accepted names
summary statistics {prefix}.sumstat, {prefix}.sumstats.gz
LD matrix {prefix}.ld, {prefix}.ld.npz
LD map {prefix}.ldmap, {prefix}.ldmap.gz

Prefix is not a folder

If prefix is data/EUR_locus_1, CREDTOOLS reads data/EUR_locus_1.sumstats.gz, not data/EUR_locus_1/sumstats.gz.

LD Matrix

Text LD files use lower-triangular rows:

1
0.12    1
-0.03   0.25    1

.npz LD files store a square matrix. CREDTOOLS loads the first array in the archive and replaces missing values with zero.

LD Map

Minimum columns:

CHR BP  A1  A2
1   50000123    A   G
1   50000456    C   T

Optional but useful:

Column Meaning
SNPID if absent, CREDTOOLS creates one
AF2 allele frequency used by MAF comparison and SuSiEx preparation

The number of LD map rows must match the number of LD matrix rows.

Fine-Mapping Outputs

pips.txt.gz always includes:

Column Meaning
SNPID variant identifier
PIP posterior inclusion probability
CRED credible set index; 0 means not assigned

For one input row, it also includes available summary-statistic columns such as CHR, BP, RSID, EA, NEA, EAF, MAF, BETA, SE, P, and R2.

For multiple input rows, study-specific columns are prefixed:

EUR_UKBB_P
EUR_UKBB_R2
AFR_MVP_P
AFR_MVP_R2

Other common result files:

File Contents
credible_sets_summary.txt.gz one row per credible set
causal_variants.txt.gz variants with CRED != 0
parameters.json tool, settings, and run metadata
run_summary.log success, failure, and parameter summary

QC Outputs

Global and per-locus QC summaries use the same schema:

popu    cohort  n_snps  n_1e-5  n_5e-8  maf_corr    lambda_s    n_lambda_s_outlier  n_dentist_s_outlier

When C1b is enabled, n_c1b_outlier is appended.

Detailed QC files:

File Key columns
expected_z.txt.gz SNPID, z, condmean, condvar, z_std_diff, logLR, lambda_s, cohort
dentist_s.txt.gz SNPID, t_dentist_s, -log10p_dentist_s, r2, cohort
compare_maf.txt.gz SNPID, MAF_sumstats, MAF_ld, cohort
cleaned/outlier_snps.txt.gz SNPID, C1_ld_mismatch, C2_marginal, C3_dentist_s, optional C1b_high_z_residual
cleaned/cleaned_loci_info.txt.gz loci list pointing to cleaned files

Heterogeneity Outputs

meta and pipeline can write heterogeneity summaries:

File Key columns
heterogeneity.txt.gz popu, cohort, ld_4th_moment_mean, ld_decay_rate, missing_rate, cochran_q_median, i_squared_median, n_het_snps
ld_4th_moment.txt.gz per-variant LD fourth-moment values by cohort
ld_decay.txt.gz distance_kb, r2_avg, decay_rate, cohort
cochran_q.txt.gz SNPID, Q, Q_pvalue, I_squared, k
snp_missingness.txt.gz variant presence or absence by cohort