ldmatrix
Functions for reading and converting lower triangle matrices.
LDMatrix
¶
Class to store the LD matrix and the corresponding Variant IDs.
Parameters¶
map_df : pd.DataFrame DataFrame containing the Variant IDs. r : np.ndarray LD matrix.
Attributes¶
map : pd.DataFrame DataFrame containing the Variant IDs. r : np.ndarray LD matrix.
Raises¶
ValueError If the number of rows in the map file does not match the number of rows in the LD matrix.
Source code in credtools/ldmatrix.py
__check_length()
¶
Check if the number of rows in the map file matches the number of rows in the LD matrix.
Raises¶
ValueError If the number of rows in the map file does not match the number of rows in the LD matrix.
Source code in credtools/ldmatrix.py
__init__(map_df, r)
¶
Initialize the LDMatrix object.
Parameters¶
map_df : pd.DataFrame DataFrame containing the Variant IDs. r : np.ndarray LD matrix.
Raises¶
ValueError If the number of rows in the map file does not match the number of rows in the LD matrix.
Source code in credtools/ldmatrix.py
__repr__()
¶
Return a string representation of the LDMatrix object.
Returns¶
str String representation showing the shapes of map and r.
load_ld(ld_path, map_path, delimiter='\t', if_sort_alleles=True)
¶
Read LD matrices and Variant IDs from files. Pair each matrix with its corresponding Variant IDs.
Parameters¶
ld_path : str Path to the input text file containing the lower triangle matrix or .npz file. map_path : str Path to the input text file containing the Variant IDs. delimiter : str, optional Delimiter used in the input file, by default "\t". if_sort_alleles : bool, optional Sort alleles in the LD map in alphabetical order and change the sign of the LD matrix if the alleles are swapped, by default True.
Returns¶
LDMatrix Object containing the LD matrix and the Variant IDs.
Raises¶
ValueError If the number of variants in the map file does not match the number of rows in the LD matrix.
Notes¶
Future enhancements planned:
- Support for npz files (partially implemented)
- Support for plink bin4 format
- Support for ldstore bcor format
The function validates that the LD matrix and map file have consistent dimensions and optionally sorts alleles for consistent representation.
Examples¶
ld_matrix = load_ld('data.ld', 'data.ldmap') print(f"Loaded LD matrix with {ld_matrix.r.shape[0]} variants") Loaded LD matrix with 1000 variants
Source code in credtools/ldmatrix.py
load_ld_map(map_path, delimiter='\t')
¶
Read Variant IDs from a file.
Parameters¶
map_path : str Path to the input text file containing the Variant IDs. delimiter : str, optional Delimiter used in the input file, by default "\t".
Returns¶
pd.DataFrame DataFrame containing the Variant IDs with columns CHR, BP, A1, A2, and SNPID.
Raises¶
ValueError If the input file is empty or does not contain the required columns.
Notes¶
This function assumes that the input file contains the required columns:
- Chromosome (CHR)
- Base pair position (BP)
- Allele 1 (A1)
- Allele 2 (A2)
The function performs data cleaning including:
- Converting chromosome and position to appropriate types
- Validating alleles are valid DNA bases (A, C, G, T)
- Removing variants where A1 == A2
- Creating unique SNPID identifiers
Examples¶
Create sample map file¶
contents = "CHR\tBP\tA1\tA2\n1\t1000\tA\tG\n1\t2000\tC\tT\n2\t3000\tT\tC" with open('map.txt', 'w') as file: ... file.write(contents) df = load_ld_map('map.txt') print(df) SNPID CHR BP A1 A2 0 1-1000-A-G 1 1000 A G 1 1-2000-C-T 1 2000 C T 2 2-3000-C-T 2 3000 T C
Source code in credtools/ldmatrix.py
load_ld_matrix(file_path, delimiter='\t')
¶
Convert a lower triangle matrix from a file to a symmetric square matrix.
Parameters¶
file_path : str Path to the input text file containing the lower triangle matrix. delimiter : str, optional Delimiter used in the input file, by default "\t".
Returns¶
np.ndarray Symmetric square matrix with diagonal filled with 1.
Raises¶
ValueError If the input file is empty or does not contain a valid lower triangle matrix. FileNotFoundError If the specified file does not exist.
Notes¶
This function assumes that the input file contains a valid lower triangle matrix with each row on a new line and elements separated by the specified delimiter. For .npz files, it loads the first array key in the file.
Examples¶
Assuming 'lower_triangle.txt' contains:¶
1.0¶
0.1 1.0¶
0.2 0.4 1.0¶
0.3 0.5 0.6 1.0¶
matrix = load_ld_matrix('lower_triangle.txt') print(matrix) array([[1. , 0.1 , 0.2 , 0.3 ], [0.1 , 1. , 0.4 , 0.5 ], [0.2 , 0.4 , 1. , 0.6 ], [0.3 , 0.5 , 0.6 , 1. ]])
Source code in credtools/ldmatrix.py
read_lower_triangle(file_path, delimiter='\t')
¶
Read a lower triangle matrix from a file.
Parameters¶
file_path : str Path to the input text file containing the lower triangle matrix. delimiter : str, optional Delimiter used in the input file, by default "\t".
Returns¶
np.ndarray Lower triangle matrix.
Raises¶
ValueError If the input file is empty or does not contain a valid lower triangle matrix. FileNotFoundError If the specified file does not exist.
Notes¶
This function reads a lower triangular matrix where each row contains elements from the diagonal down to that row position.
Source code in credtools/ldmatrix.py
sort_alleles(ld)
¶
Sort alleles in the LD map in alphabetical order. Change the sign of the LD matrix if the alleles are swapped.
Parameters¶
ld : LDMatrix LDMatrix object containing the Variant IDs and the LD matrix.
Returns¶
LDMatrix LDMatrix object containing the Variant IDs and the LD matrix with alleles sorted.
Notes¶
This function ensures consistent allele ordering by:
- Sorting alleles alphabetically (A1 <= A2)
- Flipping the sign of LD correlations for variants where alleles were swapped
- Maintaining diagonal elements as 1.0
This is important for consistent merging across different datasets.
Examples¶
map_df = pd.DataFrame({ ... 'SNPID': ['1-1000-A-G', '1-2000-C-T'], ... 'CHR': [1, 1], ... 'BP': [1000, 2000], ... 'A1': ['A', 'T'], ... 'A2': ['G', 'C'] ... }) r_matrix = np.array([[1. , 0.1], ... [0.1, 1. ]]) ld = LDMatrix(map_df, r_matrix) sorted_ld = sort_alleles(ld) print(sorted_ld.map) SNPID CHR BP A1 A2 0 1-1000-A-G 1 1000 A G 1 1-2000-C-T 1 2000 C T print(sorted_ld.r) array([[ 1. , -0.1], [-0.1, 1. ]])