To easily get tidy model output (from linear or logistic GLM, or CoxPH) for a categorical or continuous exposure, including sample size (and N cases if logistic), outcome, and model info. Idea is to make quick PheWAS trivially easy.
For all exposures, it gets the N. For categorical exposures, the N is split by group, and a row is included for the reference category
Can provide multiple exposures and/or outcomes
Usage
get_assoc(
d,
x,
y,
z = "",
model = "lm",
af = FALSE,
af_base = FALSE,
note = "",
get_fit = FALSE,
extreme_ps = FALSE,
scale_x = FALSE,
scale_y = FALSE,
inv_norm_x = FALSE,
inv_norm_y = FALSE,
winsorize_x = FALSE,
winsorize_y = FALSE,
winsorize_n = 5,
return_all_terms = FALSE,
interacts_with = "",
progress = TRUE,
verbose = FALSE,
...
)
Arguments
- d
A
data.frame
ortibble
. The data.- x
A string or vector of strings. The exposure variable name(s), found in `d`. (character)
- y
A string. The outcome variable name, found in `d` -- if model is 'coxph' then paste the survival object here e.g., 'Surv(time, event)' where `time` and `event` are variables in `d`. (character)
- z
A string. The covariate formula (e.g., " + age + sex"), found in `d`. Default is no covariates.
default=""
(character)- model
A string. The type of model to perform. Can be "lm", "logistic" or "coxph".
default="lm"
(character)- af
Logical. Is `x` categorical? I.e., include in formula like `haven::as_factor(x)`. To use base R `as.factor()` instead set option "af_base" to TRUE
default=FALSE
- af_base
Logical. Use base R `as.factor()` instead of `haven::as_factor(x)`?
default=FALSE
- note
A string. If you want to include a note like "All", "Males", "C282Y homozygotes" to describe the model or sample.
default=""
(character)- get_fit
Logical. Default is FALSE. Get model fit? (for each model: lm=R2, logistic=McFadden's pseudo-R2, coxph=Harrell's c-statistic).
default=FALSE
- extreme_ps
Logical. Default is FALSE. If p==0 then return "extreme p-values" as strings.
default=FALSE
- scale_x
Logical. Default is FALSE. Apply scale() function to exposure?
default=FALSE
- scale_y
Logical. Default is FALSE. Apply scale() function to outcome?
default=FALSE
- inv_norm_x
Logical. Apply inv_norm() function to exposure?
default=FALSE
- inv_norm_y
Logical. Apply inv_norm() function to outcome?
default=FALSE
- winsorize_x
Logical. Apply Winzorization to exposure?
default=FALSE
- winsorize_y
Logical. Apply Winzorization to outcome?
default=FALSE
- winsorize_n
Numeric. Standard deviations from the mean to Winzorize. I.e., participants with values beyond this N will be set to N.
default=5
- return_all_terms
Logical. Return estimates for all independent variables (terms) in the model? If TRUE, includes an adddition column 'term'
default=FALSE
- interacts_with
A string. A variable found in `d`. Will add to regression formula like `x*i` and catch output
default=""
(character)- progress
Logical. Show progress bar from purrr `map()` function (useful when multiple exposures/outcomes provided).
default=TRUE
- verbose
Logical. Be verbose,
default=FALSE
- ...
Other `tidy_ci()` options
Examples
# for one outcome, equivalent to `tidy_ci(glm(sbp ~ bmi +age+sex, d=example_data))` - with added `n`
get_assoc(x="bmi", y="sbp", z="+age+sex", d=example_data)
#> # A tibble: 1 × 10
#> outcome exposure estimate std.error statistic p.value conf.low conf.high
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 sbp bmi 1.49 0.0730 20.4 5.42e-89 1.35 1.64
#> # ℹ 2 more variables: n <int>, model <chr>
# categorical exposure, binary outcome, and stratified analysis (with note)
# - note that data can be passed using the pipe if desired
example_data |> dplyr::filter(sex==1) |>
get_assoc(x="bmi_cat", y="event", z="+age", model="logistic", af=TRUE, note="Males only") |> print(width=500)
#> # A tibble: 3 × 12
#> outcome exposure estimate std.error statistic p.value conf.low
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 event bmi_cat-Normal NA NA NA NA NA
#> 2 event bmi_cat-Overweight 1.50 0.0986 4.09 0.0000426 1.23
#> 3 event bmi_cat-Obese 1.93 0.154 4.27 0.0000194 1.43
#> conf.high n n_cases model note
#> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 NA 803 268 logistic Males only
#> 2 1.82 1018 443 logistic Males only
#> 3 2.60 226 111 logistic Males only
# multiple exposures and/or outcomes - get pseudo R^2
x_vars = c("bmi","sbp","dbp","scl")
y_vars = c("event","sex")
get_assoc(x=x_vars, y=y_vars, z="+age", d=example_data, model="logistic", get_fit=TRUE) |> print(width=500)
#> # A tibble: 8 × 13
#> outcome exposure estimate std.error statistic p.value conf.low conf.high
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 event bmi 1.07 0.00791 9.07 1.19e-19 1.06 1.09
#> 2 sex bmi 1.04 0.00736 5.09 3.54e- 7 1.02 1.05
#> 3 event sbp 1.01 0.00149 8.86 7.80e-19 1.01 1.02
#> 4 sex sbp 0.998 0.00141 -1.12 2.61e- 1 0.996 1.00
#> 5 event dbp 1.03 0.00259 9.73 2.33e-22 1.02 1.03
#> 6 sex dbp 1.01 0.00240 4.98 6.35e- 7 1.01 1.02
#> 7 event scl 1.01 0.000749 9.47 2.69e-21 1.01 1.01
#> 8 sex scl 0.999 0.000692 -1.08 2.82e- 1 0.998 1.00
#> n n_cases fit_stat fit_stat_se model
#> <int> <int> <dbl> <lgl> <chr>
#> 1 4690 1472 0.0317 NA logistic
#> 2 4690 2047 0.00454 NA logistic
#> 3 4699 1473 0.0310 NA logistic
#> 4 4699 2049 0.000666 NA logistic
#> 5 4699 1473 0.0338 NA logistic
#> 6 4699 2049 0.00435 NA logistic
#> 7 4666 1466 0.0331 NA logistic
#> 8 4666 2042 0.000649 NA logistic
# if desired, can also return estimates for other independent variables (terms) in the model
x_vars = c("bmi","sbp","dbp")
get_assoc(x=x_vars, y="event", z="+age+sex", d=example_data, model="logistic", return_all_terms=TRUE) |> print(width=500)
#> # A tibble: 9 × 12
#> outcome exposure term estimate std.error statistic p.value conf.low
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 event bmi bmi 1.07 0.00801 8.66 4.74e-18 1.06
#> 2 event bmi age 1.04 0.00388 8.95 3.47e-19 1.03
#> 3 event bmi sex 2.10 0.0654 11.3 1.02e-29 1.84
#> 4 event sbp sbp 1.01 0.00152 9.47 2.86e-21 1.01
#> 5 event sbp age 1.03 0.00412 6.15 7.89e-10 1.02
#> 6 event sbp sex 2.23 0.0658 12.2 4.15e-34 1.96
#> 7 event dbp dbp 1.02 0.00261 9.24 2.43e-20 1.02
#> 8 event dbp age 1.03 0.00394 8.07 7.03e-16 1.02
#> 9 event dbp sex 2.10 0.0655 11.3 1.01e-29 1.85
#> conf.high n n_cases model
#> <dbl> <int> <int> <chr>
#> 1 1.09 4690 1472 logistic
#> 2 1.04 NA NA logistic
#> 3 2.38 NA NA logistic
#> 4 1.02 4699 1473 logistic
#> 5 1.03 NA NA logistic
#> 6 2.53 NA NA logistic
#> 7 1.03 4699 1473 logistic
#> 8 1.04 NA NA logistic
#> 9 2.39 NA NA logistic