Use distance variant position to get the nearest gene. Uses snpsettest function `map_snp_to_gene` to identify genes from GENCODE databases (https://github.com/HimesGroup/snpsettest).
Usage
get_nearest_gene(
variants,
detect_headers = TRUE,
snp_col = "SNP",
chr_col = "CHR",
pos_col = "BP",
build = 37,
n_bases = 1e+05
)
Arguments
- variants
A data.frame. Contains the variants (e.g., in summary statistics).
- detect_headers
Logical. Default=TRUE. Search input headers to see if BOLT-LMM, SAIGE, REGENIE, or GWAS CATALOG input (user therefore doesn't need to provide).
- snp_col
A string. Default="SNP". The RSID/variantID column name.
- chr_col
A string. Default="CHR". The chromosome column name.
- pos_col
A string. Default="BP". The base pair/position column name.
- build
An integer. Default=37. Genome build to use (can only be 37 or 38).
- n_bases
An interger. Default=1e5. The max distance in base-pairs between a variant and a gene to annotate
Value
Returns a data frame of variant IDs mapped to genes (with distance).
If `dist` is positive, the variant is intergenic, and this is the distance to the closest gene. If `dist` is negative, the variant is within a gene, and this is the distance to the start of the gene. If `dist` is NA, the variant is not within `n_bases` of a gene in GENCODE.
Examples
gwas_loci = get_loci(gwas_example)
#>
#> Locus size (bases) = 5e+05
#> P-value threshold = 5e-08
#>
#> N variants = 319732
#> N variants p<threshold = 4132
#> N loci = 15
gwas_loci_genes = get_nearest_gene(gwas_loci)
#> Using human genome build 37
#> Getting nearest gene for 4131 unique variants
#> (Removed 1 duplicated or missing variant IDs/positions)
head(gwas_loci_genes)
#> SNP CHR BP A1 A2 MAF BETA SE P locus
#> 1 rs12046439 1 107536799 T C 0.248 0.00997159 0.00170546 5.01e-09 1
#> 2 rs143849791 1 107537916 CATG C 0.325 0.01283200 0.00164361 5.85e-15 1
#> 3 rs113329442 1 107539252 A G 0.330 0.01109240 0.00149706 1.27e-13 1
#> 4 rs3861909 1 107544176 G A 0.327 0.01187220 0.00150837 3.52e-15 1
#> 5 rs17496332 1 107546375 A G 0.331 0.01110260 0.00148844 8.70e-14 1
#> 6 rs2878349 1 107549245 G A 0.327 0.01182020 0.00149200 2.33e-15 1
#> lead gene dist
#> 1 FALSE PRMT6 62468
#> 2 FALSE PRMT6 61351
#> 3 FALSE PRMT6 60015
#> 4 FALSE PRMT6 55091
#> 5 FALSE PRMT6 52892
#> 6 FALSE PRMT6 50022
head(gwas_loci_genes[ gwas_loci_genes$lead==TRUE , ])
#> SNP CHR BP A1 A2 MAF BETA SE P
#> 13 rs111232683 1 107566149 G C 0.34300 0.01352040 0.00161401 5.43e-17
#> 55 rs114254196 1 108635400 C T 0.00848 -0.04481140 0.00818473 4.38e-08
#> 92 rs115292790 1 109310728 G A 0.01360 -0.05639270 0.00608890 2.01e-20
#> 672 rs12740374 1 109817590 G T 0.21900 -0.14822800 0.00166391 4.73e-305
#> 1542 rs140266316 1 110326545 G A 0.01630 -0.05770880 0.00597770 4.73e-22
#> 1564 rs657801 1 111736389 T C 0.31500 0.00905412 0.00150713 1.88e-09
#> locus lead gene dist
#> 13 1 TRUE PRMT6 33118
#> 55 2 TRUE SLC25A24 41258
#> 92 3 TRUE STXBP3 -21432
#> 672 4 TRUE CELSR2 -24949
#> 1542 5 TRUE GSTM5 8495
#> 1564 6 TRUE DENND2D -6593