This function estimates effect modification in an RCT (with known randomization probability \(\)) under two different parameters, determined by rct_type:

  • If rct_type = "ate", we estimate a subject-level \(ATE\), meaning we look at \(Q(1,W_i) - Q(0,W_i)\). Then we do a TMLE-style update (with known \(\)) to get the influence function for the ATE. The script then performs a data-adaptive partition to find subpopulations with the largest ATE difference.

  • If rct_type = "incps", we do a two-stage incremental-propensity-shift approach, going from \(\) to \(+\). We produce subject-level differences \(Q,+(i) - Q,(i)\), and partition on this "shift effect."

Note: If you want more valid subpopulation inference, do sample splitting or cross-validation externally. The p-values from a single pass can be too optimistic.

find_max_effect_mods_rct(
  at,
  av,
  delta,
  a_name,
  w_names,
  outcome,
  outcome_type,
  mu_learner,
  alpha = NULL,
  top_n = 3,
  seed,
  min_obs,
  fold,
  max_depth = 2,
  pval_thresh = 0.05,
  rct_type = c("ate", "incps")
)

Arguments

at

A training fold data.frame, with columns w_names, a_name, outcome.

av

A validation fold data.frame (or the same set, if single pass).

delta

A numeric scalar for the incremental coverage shift \( + \).

a_name

Name of the binary exposure (e.g. "A").

w_names

Character vector of baseline covariate names.

outcome

Name of the outcome variable.

outcome_type

"continuous","binary","count" (for sl3 tasks).

mu_learner

A list of sl3 learners for the outcome regression.

alpha

Known randomization prob; if NULL, we estimate it from at.

top_n

Number of top rules to return from the partition search.

seed

Random seed for reproducibility.

min_obs

Min # of obs in a valid split branch.

fold

Label for fold index (for cross-validation).

max_depth

Maximum depth of the partition search tree.

pval_thresh

p-value threshold for accepting a split.

rct_type

Either "ate" or "incps".

Value

A list with:

K_fold_EM_results

Data frame with 2 rows per discovered region: one for \(V\), one for \(V^c\).

av_q_estimates

Either the subject-level ATE difference (\(Q(1)-Q(0)\)) or inc. shift difference in validation.

av_hn_estimates

Corresponding influence function (or difference of shift’s IF) for each subject in validation.

q_region_v

Vector of av_q_estimates in the discovered region \(V\) (for the first discovered partition).

q_region_vc

Vector of av_q_estimates outside that region (complement).

g_region_v

Vector of av_hn_estimates in \(V\).

g_region_vc

Vector of av_hn_estimates in \(V^c\).

data_region_v

The rows in \(V\).

data_region_vc

The rows in \(V^c\).

data

The full validation set (with appended columns) for post-hoc usage.