cojo
Wrapper for COJO.
conditional_selection(locus, p_cutoff=5e-08, collinear_cutoff=0.9, window_size=10000000, maf_cutoff=0.01, diff_freq_cutoff=0.2)
¶
Perform conditional selection on the locus using COJO method.
Parameters¶
locus : Locus The locus to perform conditional selection on. Must contain summary statistics and LD matrix data. p_cutoff : float, optional The p-value cutoff for the conditional selection, by default 5e-8. If no SNPs pass this threshold, it will be relaxed to 1e-5. collinear_cutoff : float, optional The collinearity cutoff for the conditional selection, by default 0.9. SNPs with LD correlation above this threshold are considered collinear. window_size : int, optional The window size in base pairs for the conditional selection, by default 10000000. SNPs within this window are considered for conditional analysis. maf_cutoff : float, optional The minor allele frequency cutoff for the conditional selection, by default 0.01. SNPs with MAF below this threshold are excluded. diff_freq_cutoff : float, optional The difference in frequency cutoff between summary statistics and reference panel, by default 0.2. SNPs with frequency differences above this threshold are excluded.
Returns¶
pd.DataFrame The conditional selection results containing independently associated variants with columns including SNP identifiers, effect sizes, and conditional p-values.
Warnings¶
If no SNPs pass the initial p-value cutoff, the threshold is automatically relaxed to 1e-5 and a warning is logged.
If AF2 (reference allele frequency) is not available in the LD matrix, a warning is logged and frequency checking is disabled.
Notes¶
COJO (Conditional and Joint analysis) performs stepwise conditional analysis to identify independently associated variants at a locus. The method:
- Identifies the most significant SNP
- Performs conditional analysis on remaining SNPs
- Iteratively adds independently associated SNPs
- Continues until no more SNPs meet significance criteria
The algorithm accounts for linkage disequilibrium patterns and helps distinguish truly independent signals from those in LD with lead variants.
Reference: Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369-375 (2012).
Examples¶
Basic conditional selection¶
results = conditional_selection(locus) print(f"Found {len(results)} independent signals") Found 3 independent signals
With custom thresholds¶
results = conditional_selection( ... locus, ... p_cutoff=1e-6, ... maf_cutoff=0.05 ... ) print(results[['SNP', 'b', 'se', 'p']]) SNP b se p 0 rs123456 0.15 0.025 1.2e-08 1 rs789012 -0.08 0.020 4.5e-07
Source code in credtools/cojo.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|