Get tidy model output of exposure(s) on outcome(s)

To easily get tidy model output (from linear or logistic GLM, or CoxPH) for a categorical or continuous exposure, including sample size (and N cases if logistic), outcome, and model info. Idea is to make quick PheWAS trivially easy.

For all exposures, it gets the N. For categorical exposures, the N is split by group, and a row is included for the reference category

Can provide multiple exposures and/or outcomes

Usage

get_assoc(
  d,
  x,
  y,
  z = "",
  model = "lm",
  af = FALSE,
  af_base = FALSE,
  note = "",
  get_fit = FALSE,
  extreme_ps = FALSE,
  scale_x = FALSE,
  scale_y = FALSE,
  inv_norm_x = FALSE,
  inv_norm_y = FALSE,
  winsorize_x = FALSE,
  winsorize_y = FALSE,
  winsorize_n = 5,
  return_all_terms = FALSE,
  interacts_with = "",
  progress = TRUE,
  verbose = FALSE,
  ...
)

Arguments

d: A data.frame or tibble. The data.
x: A string or vector of strings. The exposure variable name(s), found in `d`. (character)
y: A string. The outcome variable name, found in `d` -- if model is 'coxph' then paste the survival object here e.g., 'Surv(time, event)' where `time` and `event` are variables in `d`. (character)
z: A string. The covariate formula (e.g., " + age + sex"), found in `d`. Default is no covariates. default="" (character)
model: A string. The type of model to perform. Can be "lm", "logistic" or "coxph". default="lm" (character)
af: Logical. Is `x` categorical? I.e., include in formula like `haven::as_factor(x)`. To use base R `as.factor()` instead set option "af_base" to TRUE default=FALSE
af_base: Logical. Use base R `as.factor()` instead of `haven::as_factor(x)`? default=FALSE
note: A string. If you want to include a note like "All", "Males", "C282Y homozygotes" to describe the model or sample. default="" (character)
get_fit: Logical. Default is FALSE. Get model fit? (for each model: lm=R2, logistic=McFadden's pseudo-R2, coxph=Harrell's c-statistic). default=FALSE
extreme_ps: Logical. Default is FALSE. If p==0 then return "extreme p-values" as strings. default=FALSE
scale_x: Logical. Default is FALSE. Apply scale() function to exposure? default=FALSE
scale_y: Logical. Default is FALSE. Apply scale() function to outcome? default=FALSE
inv_norm_x: Logical. Apply inv_norm() function to exposure? default=FALSE
inv_norm_y: Logical. Apply inv_norm() function to outcome? default=FALSE
winsorize_x: Logical. Apply Winzorization to exposure? default=FALSE
winsorize_y: Logical. Apply Winzorization to outcome? default=FALSE
winsorize_n: Numeric. Standard deviations from the mean to Winzorize. I.e., participants with values beyond this N will be set to N. default=5
return_all_terms: Logical. Return estimates for all independent variables (terms) in the model? If TRUE, includes an adddition column 'term' default=FALSE
interacts_with: A string. A variable found in `d`. Will add to regression formula like `x*i` and catch output default="" (character)
progress: Logical. Show progress bar from purrr `map()` function (useful when multiple exposures/outcomes provided). default=TRUE
verbose: Logical. Be verbose, default=FALSE
...: Other `tidy_ci()` options

Value

Returns a tibble - summary statistics from a model

Author

Luke Pilling

Examples

# for one outcome, equivalent to `tidy_ci(glm(sbp ~ bmi +age+sex, d=example_data))` - with added `n`
get_assoc(x="bmi", y="sbp", z="+age+sex", d=example_data)
#> # A tibble: 1 × 10
#>   outcome exposure estimate std.error statistic  p.value conf.low conf.high
#>   <chr>   <chr>       <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 sbp     bmi          1.49    0.0730      20.4 5.42e-89     1.35      1.64
#> # ℹ 2 more variables: n <int>, model <chr>

# categorical exposure, binary outcome, and stratified analysis (with note)
#  - note that data can be passed using the pipe if desired
example_data |> dplyr::filter(sex==1) |>
  get_assoc(x="bmi_cat", y="event", z="+age", model="logistic", af=TRUE, note="Males only")  |> print(width=500)
#> # A tibble: 3 × 12
#>   outcome exposure           estimate std.error statistic    p.value conf.low
#>   <chr>   <chr>                 <dbl>     <dbl>     <dbl>      <dbl>    <dbl>
#> 1 event   bmi_cat-Normal        NA      NA          NA    NA            NA   
#> 2 event   bmi_cat-Overweight     1.50    0.0986      4.09  0.0000426     1.23
#> 3 event   bmi_cat-Obese          1.93    0.154       4.27  0.0000194     1.43
#>   conf.high     n n_cases model    note      
#>       <dbl> <dbl>   <dbl> <chr>    <chr>     
#> 1     NA      803     268 logistic Males only
#> 2      1.82  1018     443 logistic Males only
#> 3      2.60   226     111 logistic Males only

# multiple exposures and/or outcomes - get pseudo R^2
x_vars = c("bmi","sbp","dbp","scl")
y_vars = c("event","sex")
get_assoc(x=x_vars, y=y_vars, z="+age", d=example_data, model="logistic", get_fit=TRUE)  |> print(width=500)
#> # A tibble: 8 × 13
#>   outcome exposure estimate std.error statistic  p.value conf.low conf.high
#>   <chr>   <chr>       <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 event   bmi         1.07   0.00791       9.07 1.19e-19    1.06       1.09
#> 2 sex     bmi         1.04   0.00736       5.09 3.54e- 7    1.02       1.05
#> 3 event   sbp         1.01   0.00149       8.86 7.80e-19    1.01       1.02
#> 4 sex     sbp         0.998  0.00141      -1.12 2.61e- 1    0.996      1.00
#> 5 event   dbp         1.03   0.00259       9.73 2.33e-22    1.02       1.03
#> 6 sex     dbp         1.01   0.00240       4.98 6.35e- 7    1.01       1.02
#> 7 event   scl         1.01   0.000749      9.47 2.69e-21    1.01       1.01
#> 8 sex     scl         0.999  0.000692     -1.08 2.82e- 1    0.998      1.00
#>       n n_cases fit_stat fit_stat_se model   
#>   <int>   <int>    <dbl> <lgl>       <chr>   
#> 1  4690    1472 0.0317   NA          logistic
#> 2  4690    2047 0.00454  NA          logistic
#> 3  4699    1473 0.0310   NA          logistic
#> 4  4699    2049 0.000666 NA          logistic
#> 5  4699    1473 0.0338   NA          logistic
#> 6  4699    2049 0.00435  NA          logistic
#> 7  4666    1466 0.0331   NA          logistic
#> 8  4666    2042 0.000649 NA          logistic

# if desired, can also return estimates for other independent variables (terms) in the model
x_vars = c("bmi","sbp","dbp")
get_assoc(x=x_vars, y="event", z="+age+sex", d=example_data, model="logistic", return_all_terms=TRUE)  |> print(width=500)
#> # A tibble: 9 × 12
#>   outcome exposure term  estimate std.error statistic  p.value conf.low
#>   <chr>   <chr>    <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>
#> 1 event   bmi      bmi       1.07   0.00801      8.66 4.74e-18     1.06
#> 2 event   bmi      age       1.04   0.00388      8.95 3.47e-19     1.03
#> 3 event   bmi      sex       2.10   0.0654      11.3  1.02e-29     1.84
#> 4 event   sbp      sbp       1.01   0.00152      9.47 2.86e-21     1.01
#> 5 event   sbp      age       1.03   0.00412      6.15 7.89e-10     1.02
#> 6 event   sbp      sex       2.23   0.0658      12.2  4.15e-34     1.96
#> 7 event   dbp      dbp       1.02   0.00261      9.24 2.43e-20     1.02
#> 8 event   dbp      age       1.03   0.00394      8.07 7.03e-16     1.02
#> 9 event   dbp      sex       2.10   0.0655      11.3  1.01e-29     1.85
#>   conf.high     n n_cases model   
#>       <dbl> <int>   <int> <chr>   
#> 1      1.09  4690    1472 logistic
#> 2      1.04    NA      NA logistic
#> 3      2.38    NA      NA logistic
#> 4      1.02  4699    1473 logistic
#> 5      1.03    NA      NA logistic
#> 6      2.53    NA      NA logistic
#> 7      1.03  4699    1473 logistic
#> 8      1.04    NA      NA logistic
#> 9      2.39    NA      NA logistic