Validate LD reference
The validation of the LD reference is performed to facilitate the calculation of the LD matrix. However, if you intend to use LD-free fine-mapping methods, you can disregard this step.
The validate-ldref
method in EasyFinemap quickly validates the PLINK bfile format. The main steps involved are as follows:
- If the input is not already separated by chromosome, the PLINK bfile is split by chromosome.
- Variants with a minor allele count below a specified threshold (default is 10) are filtered out.
- Multiallelic variants are removed.
- Duplicate variants are removed.
- Variant IDs are converted to Unique SNP IDs for easy matching of summary statistics with variants in the LD reference. In EasyFinemap, the Unique SNP ID format used is chr-bp-sorted(EA,NEA).
Let's demonstrate the usage of validate-ldref
with an example dataset.
Download the example data:
git clone https://github.com/Jianhua-Wang/easyfinemap.git
cd easyfinemap/exampledata
ls
validate-ldref
supports both split-by-chromosome bfiles (such as those directly converted from chromosome-separated VCF files from 1000 Genomes) and non-split bfiles. The specific command is as follows:
easyfinemap validate-ldref ./EUR.chr21-22 EUR.valid
easyfinemap validate-ldref ./EUR.chr{chrom} EUR.valid
$ easyfinemap validate-ldref -h
────────────────────────────────── EasyFinemap ───────────────────────────────────
Version: 0.3.9
Author: Jianhua Wang
Email: jianhua.mert@gmail.com
Usage: easyfinemap validate-ldref [OPTIONS] LDREF_PATH OUTPREFIX
Validate the LD reference file.
╭─ Arguments ────────────────────────────────────────────────────────────────────╮
│ * ldref_path TEXT The path to the LD reference file. [default: None] │
│ [required] │
│ * outprefix TEXT The output prefix. [default: None] [required] │
╰────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────╮
│ --file-type -f TEXT The file type of the LD reference file. │
│ [default: plink] │
│ --mac -m INTEGER The minor allele count threshold. [default: 10] │
│ --threads -t INTEGER The number of threads. [default: 1] │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────╯
-m
option allows you to change the minor allele count threshold for filtering. It should be set to a value greater than 0 because a minor allele count of 0 can cause errors in LD matrix calculations.
The -t
option specifies the number of threads. Setting it higher can speed up the process. Parallelization is performed by chromosome, so it should not exceed the total number of chromosomes.