Validate LD reference

The validation of the LD reference is performed to facilitate the calculation of the LD matrix. However, if you intend to use LD-free fine-mapping methods, you can disregard this step.

The validate-ldref method in EasyFinemap quickly validates the PLINK bfile format. The main steps involved are as follows:

  • If the input is not already separated by chromosome, the PLINK bfile is split by chromosome.
  • Variants with a minor allele count below a specified threshold (default is 10) are filtered out.
  • Multiallelic variants are removed.
  • Duplicate variants are removed.
  • Variant IDs are converted to Unique SNP IDs for easy matching of summary statistics with variants in the LD reference. In EasyFinemap, the Unique SNP ID format used is chr-bp-sorted(EA,NEA).

Let's demonstrate the usage of validate-ldref with an example dataset.

Download the example data:

git clone https://github.com/Jianhua-Wang/easyfinemap.git
cd easyfinemap/exampledata
ls
The exampledata directory contains bfiles with the prefix EUR.chr21 for chromosome 21, EUR.chr22 for chromosome 22, and EUR.chr21-22, which is a merged file of chromosomes 21 and 22.

validate-ldref supports both split-by-chromosome bfiles (such as those directly converted from chromosome-separated VCF files from 1000 Genomes) and non-split bfiles. The specific command is as follows:

easyfinemap validate-ldref ./EUR.chr21-22 EUR.valid
For split-by-chromosome bfiles, use the {chrom} wildcard to represent the chromosome number.
easyfinemap validate-ldref ./EUR.chr{chrom} EUR.valid
Other parameters:

$ easyfinemap validate-ldref -h
────────────────────────────────── EasyFinemap ───────────────────────────────────
                                  Version: 0.3.9
                               Author: Jianhua Wang
                          Email: jianhua.mert@gmail.com

 Usage: easyfinemap validate-ldref [OPTIONS] LDREF_PATH OUTPREFIX

 Validate the LD reference file.

╭─ Arguments ────────────────────────────────────────────────────────────────────╮
│ *    ldref_path      TEXT  The path to the LD reference file. [default: None]  │
│                            [required]                                          │
│ *    outprefix       TEXT  The output prefix. [default: None] [required]       │
╰────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────╮
│ --file-type  -f      TEXT     The file type of the LD reference file.          │
│                               [default: plink]                                 │
│ --mac        -m      INTEGER  The minor allele count threshold. [default: 10]  │
│ --threads    -t      INTEGER  The number of threads. [default: 1]              │
│ --help       -h               Show this message and exit.                      │
╰────────────────────────────────────────────────────────────────────────────────╯
The -m option allows you to change the minor allele count threshold for filtering. It should be set to a value greater than 0 because a minor allele count of 0 can cause errors in LD matrix calculations. The -t option specifies the number of threads. Setting it higher can speed up the process. Parallelization is performed by chromosome, so it should not exceed the total number of chromosomes.