Skip to contents

This guide explains the learner options exposed through formatArguments(). There are two separate nuisance-learning tasks:

  • treatment assignment models, supplied through the SuperLearner package
  • censoring and event hazard models, supplied as a candidate hazard library

The hazard library currently works as a cross-validated discrete selector: for each censoring or event type, concrete evaluates the candidate hazard learners and uses the learner with the lowest validation loss.

The code snippets below assume a trial data.table as built in the Trialist quickstart.

Conservative first library

For first use in a trial, start with simple and stable learners.

Model <- list(
  arm = c("SL.mean", "SL.glm"),
  "0" = list(Censor = survival::Surv(time, event == 0) ~ arm + age + sex),
  "1" = list(Event = survival::Surv(time, event == 1) ~ arm + age + sex)
)

For competing risks, add one hazard model list for each positive event code.

Model <- list(
  arm = c("SL.mean", "SL.glm"),
  "0" = list(Censor = survival::Surv(time, event == 0) ~ arm + age + sex),
  "1" = list(Death = survival::Surv(time, event == 1) ~ arm + age + sex),
  "2" = list(Competing = survival::Surv(time, event == 2) ~ arm + age + sex)
)

Treatment Super Learner

For randomized trials, the treatment model is often simple. If treatment was randomized 1:1 and the analysis is intent-to-treat, a simple library is a good starting point:

Model <- list(
  arm = c("SL.mean", "SL.glm")
)

For observational or covariate-adaptive settings, add flexible learners already available through SuperLearner.

Model <- list(
  arm = c("SL.mean", "SL.glm", "SL.glmnet", "SL.xgboost")
)

Only include treatment learners whose packages are installed and appropriate for the sample size.

Hazard learner aliases

Hazard learners are specified inside Model[["0"]], Model[["1"]], and other event-specific entries.

Alias Learner Package
Cox formula Cox proportional hazards survival
"coxnet" Penalized Cox model glmnet
"rsf" or "randomForestSRC" Random survival forest randomForestSRC
"aareg" or "additive_hazards" Additive hazards survival
"hal" or "hal9001" HAL pooled discrete-time hazard hal9001

Optional packages:

install.packages(c("glmnet", "randomForestSRC", "hal9001"))

To test optional hazard learners on a small built-in example before trying your own data:

Sys.setenv(CONCRETE_RUN_OPTIONAL_LEARNERS = "true")
source(system.file("examples", "trialist-smoke-test.R", package = "concrete"))

When optional learners are installed, the smoke test prints a summary like this:

analysis status elapsed_sec converged step max_ratio failing_components
cox_only ok 1.2 TRUE 4 0.743 0
additive_hazards ok 1.2 TRUE 4 0.899 0
coxnet ok 2.3 TRUE 13 0.838 0
rsf ok 1.1 TRUE 4 0.748 0
hal ok 1.5 TRUE 4 0.747 0

This table is not a benchmark. It is a quick installation and learner-path check. On real trial data, compare estimates, convergence diagnostics, and runtime across the learner ladder.

Cox plus machine-learning hazards

This example gives each event type a small candidate library. The selected hazard learner can differ across censoring, the event of interest, and competing events.

Model <- list(
  arm = c("SL.mean", "SL.glm", "SL.glmnet"),
  "0" = list(
    Cox = survival::Surv(time, event == 0) ~ arm + age + sex + albumin,
    Coxnet = "coxnet",
    Aalen = "aareg"
  ),
  "1" = list(
    Cox = survival::Surv(time, event == 1) ~ arm + age + sex + albumin,
    RSF = "rsf",
    HAL = "hal"
  ),
  "2" = list(
    Cox = survival::Surv(time, event == 2) ~ arm + age + sex + albumin,
    RSF = "rsf"
  )
)

ConcreteArgs <- formatArguments(
  DataTable = trial,
  EventTime = "time",
  EventType = "event",
  Treatment = "arm",
  ID = "id",
  Intervention = makeITT(),
  TargetTime = c(365, 730),
  TargetEvent = 1,
  CVArg = list(V = 5),
  Model = Model,
  UpdateMethod = "adaptive",
  EICStopRule = "absolute",
  EICStopAbsTol = 0.02 / sqrt(nrow(trial)),
  Verbose = FALSE
)

ConcreteEst <- doConcrete(ConcreteArgs)

Suggested libraries by trial size

These are starting points, not rules.

Setting Suggested treatment library Suggested hazard library
Small trial or rare event SL.mean, SL.glm Cox formulas, additive hazards
Moderate trial SL.mean, SL.glm, SL.glmnet Cox, Coxnet, additive hazards
Larger trial with nonlinear risk add tree/boosting learners Cox, Coxnet, RSF, HAL
First convergence debugging SL.mean, SL.glm Cox only

Inspect selected hazard learners

When ReturnModels = TRUE, fitted objects keep initial learner information.

fits <- attr(ConcreteEst, "InitFits")

# Treatment model Super Learner weights.
fits[["arm"]]

# Hazard model selection risks are stored on each fitted hazard object.
lapply(fits[setdiff(names(fits), "arm")], function(fit) {
  attr(fit, "HazSL")
})

The hazard learner output records cross-validated risks and the selected candidate for each event type. A simplified example looks like:

$`1`
$`1`$SupLrnCVRisks
      Cox    Coxnet       RSF       HAL
   118.20    116.75    117.40    119.05

$`1`$SLCoef
   Cox Coxnet    RSF    HAL
     0      1      0      0

Here, Coxnet was selected for event type 1. The exact object names depend on the names you used in Model.

Practical testing sequence

For a trial testing week, use the same estimand and data set across a ladder of learner libraries.

model_cox <- list(
  arm = c("SL.mean", "SL.glm"),
  "0" = list(Cox = survival::Surv(time, event == 0) ~ .),
  "1" = list(Cox = survival::Surv(time, event == 1) ~ .)
)

model_penalized <- list(
  arm = c("SL.mean", "SL.glm", "SL.glmnet"),
  "0" = list(Cox = survival::Surv(time, event == 0) ~ ., Coxnet = "coxnet"),
  "1" = list(Cox = survival::Surv(time, event == 1) ~ ., Coxnet = "coxnet")
)

model_flexible <- list(
  arm = c("SL.mean", "SL.glm", "SL.glmnet"),
  "0" = list(Cox = survival::Surv(time, event == 0) ~ ., Coxnet = "coxnet"),
  "1" = list(
    Cox = survival::Surv(time, event == 1) ~ .,
    Coxnet = "coxnet",
    RSF = "rsf",
    Aalen = "aareg",
    HAL = "hal"
  )
)

Compare:

  • point estimates and confidence intervals
  • selected hazard learners
  • convergence diagnostics
  • runtime
  • whether the result is stable to removing the most flexible learners