Identification of Maximal Effect Modifiers Using Stochastic Shift Interventions

The find_max_effect_mods function identifies subpopulations with the maximum differential impact of stochastic shift interventions on exposures within a mixture. The method estimates the individual effects of shifting each exposure while controlling for other exposures and covariates, using ensemble machine learning and targeted maximum likelihood estimation (TMLE). Once the individual intervention effects and influence curve estimates given an intervention on each exposure are derived, we then use a simple t-statistic partitioning algorithm to find the region with the maximum significant difference in intervention effects.

find_max_effect_mods(
  at,
  av,
  deltas,
  a_names,
  w_names,
  outcome,
  outcome_type,
  mu_learner,
  g_learner,
  top_n = 3,
  seed,
  min_obs,
  fold,
  density_classification = TRUE,
  max_depth
)

Arguments

deltas: A named list or vector specifying the shift in exposures to define the target parameter. Each element should correspond to an exposure variable specified in a_names, detailing the amount by which that exposure is to be shifted.
a_names: A character vector specifying the names of the exposure variables within data.
w_names: A character vector specifying the names of the covariate variables within data.
outcome: The name of the outcome variable in data.
outcome_type: A character string indicating the type of the outcome variable; either "continuous", "binary", or "count".
mu_learner: A list of Lrnr_sl learners specifying the ensemble machine learning models to be used for outcome prediction within the Super Learner framework.
g_learner: A list of Lrnr_sl learners specifying the ensemble machine learning models to be used for estimating the conditional density of the exposures.
top_n: An integer specifying the number of top positive and negative effects to return.
seed: An integer value to set the seed for reproducibility.
min_obs: Minimum number of observations in a region to warrant a split.
data: A data.frame containing all the variables needed for the analysis, including baseline covariates, exposures, and the outcome.

Value

A list containing the top effects and interactions identified and estimated by the function. It includes elements for top positive and negative individual effects as well as top synergistic and antagonistic interactions.

Examples

if (FALSE) {
data <- data.frame(matrix(rnorm(100 * 10), ncol = 10))
names(data) <- c(paste0("X", 1:8), "exposure", "outcome")
deltas <- list(exposure = 0.1)
a_names <- "exposure"
w_names <- paste0("X", 1:8)
outcome <- "outcome"
outcome_type <- "continuous"
mu_learner <- list(sl3::Lrnr_mean$new(), sl3::Lrnr_glm$new())
top_n <- 3
seed <- 123

results <- find_max_effect_mods(data, deltas, a_names, w_names, outcome, outcome_type, mu_learner, g_learner, top_n, seed, min_obs = 10)
print(results)
}