Skip to contents

Use distance variant position to get the nearest gene. Uses snpsettest function `map_snp_to_gene` to identify genes from GENCODE databases (https://github.com/HimesGroup/snpsettest).

Usage

get_nearest_gene(
  variants,
  detect_headers = TRUE,
  snp_col = "SNP",
  chr_col = "CHR",
  pos_col = "BP",
  build = 37,
  n_bases = 1e+05
)

Arguments

variants

A data.frame. Contains the variants (e.g., in summary statistics).

detect_headers

Logical. Default=TRUE. Search input headers to see if BOLT-LMM, SAIGE, REGENIE, or GWAS CATALOG input (user therefore doesn't need to provide).

snp_col

A string. Default="SNP". The RSID/variantID column name.

chr_col

A string. Default="CHR". The chromosome column name.

pos_col

A string. Default="BP". The base pair/position column name.

build

An integer. Default=37. Genome build to use (can only be 37 or 38).

n_bases

An interger. Default=1e5. The max distance in base-pairs between a variant and a gene to annotate

Value

Returns a data frame of variant IDs mapped to genes (with distance).

If `dist` is positive, the variant is intergenic, and this is the distance to the closest gene. If `dist` is negative, the variant is within a gene, and this is the distance to the start of the gene. If `dist` is NA, the variant is not within `n_bases` of a gene in GENCODE.

Author

Luke Pilling

Examples

gwas_loci = get_loci(gwas_example)
#> 
#> Locus size (bases) = 5e+05
#> P-value threshold = 5e-08
#> 
#> N variants = 319732
#> N variants p<threshold = 4132
#> N loci = 15

gwas_loci_genes = get_nearest_gene(gwas_loci)
#> Using human genome build 37
#> Getting nearest gene for 4131 unique variants
#> (Removed 1 duplicated or missing variant IDs/positions)

head(gwas_loci_genes)
#>           SNP CHR        BP   A1 A2   MAF       BETA         SE        P locus
#> 1  rs12046439   1 107536799    T  C 0.248 0.00997159 0.00170546 5.01e-09     1
#> 2 rs143849791   1 107537916 CATG  C 0.325 0.01283200 0.00164361 5.85e-15     1
#> 3 rs113329442   1 107539252    A  G 0.330 0.01109240 0.00149706 1.27e-13     1
#> 4   rs3861909   1 107544176    G  A 0.327 0.01187220 0.00150837 3.52e-15     1
#> 5  rs17496332   1 107546375    A  G 0.331 0.01110260 0.00148844 8.70e-14     1
#> 6   rs2878349   1 107549245    G  A 0.327 0.01182020 0.00149200 2.33e-15     1
#>    lead  gene  dist
#> 1 FALSE PRMT6 62468
#> 2 FALSE PRMT6 61351
#> 3 FALSE PRMT6 60015
#> 4 FALSE PRMT6 55091
#> 5 FALSE PRMT6 52892
#> 6 FALSE PRMT6 50022

head(gwas_loci_genes[ gwas_loci_genes$lead==TRUE , ])
#>              SNP CHR        BP A1 A2     MAF        BETA         SE         P
#> 13   rs111232683   1 107566149  G  C 0.34300  0.01352040 0.00161401  5.43e-17
#> 55   rs114254196   1 108635400  C  T 0.00848 -0.04481140 0.00818473  4.38e-08
#> 92   rs115292790   1 109310728  G  A 0.01360 -0.05639270 0.00608890  2.01e-20
#> 672   rs12740374   1 109817590  G  T 0.21900 -0.14822800 0.00166391 4.73e-305
#> 1542 rs140266316   1 110326545  G  A 0.01630 -0.05770880 0.00597770  4.73e-22
#> 1564    rs657801   1 111736389  T  C 0.31500  0.00905412 0.00150713  1.88e-09
#>      locus lead     gene   dist
#> 13       1 TRUE    PRMT6  33118
#> 55       2 TRUE SLC25A24  41258
#> 92       3 TRUE   STXBP3 -21432
#> 672      4 TRUE   CELSR2 -24949
#> 1542     5 TRUE    GSTM5   8495
#> 1564     6 TRUE  DENND2D  -6593