simulated-rct-trialist.RmdThis article walks through a small randomized trial where the active treatment improves a binary response endpoint, but the treatment effect is much larger in participants with marker_high == 1. The goal is to show how a trialist should read the main rct = TRUE outputs:
V, V^c, and the V - V^c contrastThe example is intentionally simple. In a real trial, the endpoint definition, eligible subgroup variables, analysis role, and tuning parameters should be written in the protocol or SAP before looking at outcomes.
library(EffectXshift)
library(sl3)
set.seed(20240624)
n <- 600
marker_high <- rbinom(n, 1, 0.5)
age <- round(rnorm(n, mean = 62, sd = 9))
biomarker <- rnorm(n)
prior_tx <- rbinom(n, 1, plogis(-0.5 + 0.4 * biomarker))
# 1:1 randomized treatment
treatment <- rbinom(n, 1, 0.5)The true response risk is generated so the treatment improves response by about 35 percentage points when marker_high == 1, and by about 0 percentage points otherwise.
true_effect <- ifelse(marker_high == 1, 0.35, 0.00)
baseline_response <- plogis(
-1.2 - 0.02 * (age - 62) + 0.3 * biomarker - 0.2 * prior_tx
)
response_prob <- pmin(pmax(baseline_response + true_effect * treatment, 0.01), 0.99)
response <- rbinom(n, 1, response_prob)
w <- data.frame(
marker_high = marker_high,
age = age,
biomarker = biomarker,
prior_tx = prior_tx
)
a <- data.frame(treatment = treatment)
aggregate(true_effect ~ marker_high, data = data.frame(marker_high, true_effect), mean)
#> marker_high true_effect
#> 1 0 0.00
#> 2 1 0.35For a trial, pass the known allocation probability through alpha. Here alpha = 0.5 because this is a 1:1 randomized design.
mu_learner <- list(
Lrnr_ranger$new(num.trees = 150),
Lrnr_glm$new()
)
trial_results <- EffectXshift(
w = w,
a = a,
y = response,
deltas = 0,
outcome_type = "binary",
mu_learner = mu_learner,
n_folds = 2,
parallel = FALSE,
seed = 12,
rct = TRUE,
rct_type = "ate",
alpha = 0.5,
min_obs = 50,
max_depth = 1,
pval_thresh = 0.3
)pval_thresh = 0.3 is intentionally permissive for this small teaching simulation. In applied work, treat the split rule as a tuning choice and state it before running the analysis.
kfold <- trial_results$`Effect Modification K-Fold Results`
kfold$Rule <- sub("&\\s*$", "", kfold$Rule)
kfold
#> Exposure RegionType Rule Effect SE
#> 1 treatment V marker_high!=0 0.43989127 0.07028117
#> 2 treatment Vc NOT(marker_high!=0 & ) 0.12031517 0.07072862
#> 3 treatment V marker_high!=0 0.38676548 0.07558511
#> 4 treatment Vc NOT(marker_high!=0 & ) -0.01208861 0.06619750
#> LowerCI UpperCI Fold Rank N_in_Region
#> 1 0.30214019 0.5776424 1 1 152
#> 2 -0.01831293 0.2589433 1 1 148
#> 3 0.23861866 0.5349123 2 1 141
#> 4 -0.14183570 0.1176585 2 1 159Interpretation:
RegionType == "V" is the discovered higher-effect region in that fold.RegionType == "Vc" is the held-out complement.Rule is the selected baseline subgroup rule. Here, marker_high != 0 means the selected region is the marker_high == 1 subgroup.Effect is the held-out region treatment-effect estimate on the response-risk difference scale.N_in_Region is the number of validation-fold participants in that region.The fold-level table is the first place to look. If different folds select different variables or opposite orientations, the pooled result should be reported as exploratory and interpreted cautiously.
trial_results$`Pooled Region Effects`
#> Region Psi SE Lower CI Upper CI P-value N
#> 1 V 0.41432562 0.05143612 0.3135 0.5151 7.938162e-16 293
#> 2 V^c 0.05174123 0.04842216 -0.0432 0.1466 2.852750e-01 307
#> 3 V - V^c 0.36258439 0.07064262 0.2241 0.5010 2.856788e-07 600Interpretation:
V is the treatment effect in the selected subgroup.V^c is the treatment effect in everyone outside that subgroup.V - V^c is the effect-modification contrast.In this simulation, the selected V region has a large positive response-risk difference, while V^c is near zero. A positive V - V^c contrast indicates that treatment benefit is larger in the selected subgroup than in its complement.
trial_results$`Trial Region Diagnostics`
#> Region N N_Control N_Treated N_Missing_Treatment N_Missing_Outcome
#> 1 V 293 145 148 0 0
#> 2 V^c 307 156 151 0 0
#> Prop_Treated Outcome_Mean Outcome_Mean_Control Outcome_Mean_Treated
#> 1 0.5051195 0.3959044 0.1862069 0.6013514
#> 2 0.4918567 0.2280130 0.2051282 0.2516556
#> Observed_Mean_Diff_Treated_Minus_Control Flag_Small_Arm_N
#> 1 0.41514445 FALSE
#> 2 0.04652742 FALSEThese diagnostics are descriptive, not adjusted effect estimates. Use them to check whether the discovered regions have usable arm counts and whether the observed outcomes broadly agree with the adjusted results.
Important columns:
N_Control, N_Treated: arm counts in the selected regionProp_Treated: observed treatment fraction; in a randomized trial this should be close to the design allocation probability, allowing for random variationOutcome_Mean_Control, Outcome_Mean_Treated: observed endpoint means by armObserved_Mean_Diff_Treated_Minus_Control: unadjusted descriptive differenceFlag_Small_Arm_N: quick flag for regions where one arm is too small to interpret comfortably
diagnose_selection(trial_results)$stability
#> Rank Folds Unique_Exposures Most_Common_Exposure Most_Common_Exposure_Folds
#> 1 1 2 1 treatment 2
#> Unique_Rules Most_Common_Rule Most_Common_Rule_Folds
#> 1 1 marker_high!=0 & 2This summarizes whether the same treatment and rule were selected across folds. Stable fold-level selection supports a more interpretable finding. Unstable selection does not invalidate the held-out estimation procedure, but it does mean the discovered subgroup should be framed as exploratory.
For this simulated example, an interpretation could read:
In a 1:1 randomized trial with a binary response endpoint, EffectXshift selected the baseline subgroup
marker_high == 1as the higher-benefit region. The pooled held-out treatment effect was larger inVthan inV^c, and theV - V^ccontrast was positive. Arm counts in both selected regions were adequate, and fold-level selection was stable across the two folds.
For a real trial, also report whether this was prespecified or exploratory, the eligible baseline variables, the endpoint time point, censoring or missingness handling, alpha, and all subgroup-search tuning parameters.