Trialist quickstart

This article is written for trial analysts who want to test concrete on a randomized trial or trial-like observational data set. The core output is a covariate-adjusted marginal absolute risk at prespecified follow-up times. From those risks, concrete reports risk differences and risk ratios.

What question is being answered?

For a binary baseline treatment, concrete estimates counterfactual absolute risks such as:

risk of event 1 by day 365 if everyone followed the active arm
risk of event 1 by day 365 if everyone followed the control arm
active minus control risk difference
active divided by control risk ratio

For competing risks, the estimate is the cumulative incidence of the target event in the presence of the competing events. For a standard right-censored survival endpoint with no competing risks, use one positive event code and code censoring as 0.

Data checklist

Your analysis data set should have one row per participant for the current package interface.

Item	Requirement
Subject id	Unique participant id column
Time	Observed event or censoring time, numeric, positive
Event type	`0` for censoring, positive integers for events
Treatment	Binary baseline treatment coded `0` and `1`
Covariates	Baseline covariates measured before treatment; `NA`s are allowed and are imputed (median / mode) with a missingness indicator added per affected column
Randomization strata (optional)	If randomization was stratified (permuted blocks, biased coin), keep the stratum column(s) and pass them as `Strata` so the standard errors reflect the design
Switch times (optional)	If participants crossed over to the other arm, a per-subject switch-time column (`NA` if never) can be passed as `Crossover` for the hypothetical no-switching estimand

The id, time, event-type, and treatment columns must be complete — only baseline covariates may carry NAs.

Check these before fitting:

trial[, .N, by = event]
trial[, .N, by = .(arm, event)]
summary(trial$time)
stopifnot(all(trial$event >= 0))
stopifnot(all(trial$arm %in% c(0, 1)))

Current scope:

baseline binary treatment
one row per participant
right-censored event time data
optional competing risks
baseline covariate adjustment (missing baseline values imputed automatically)
stratified / covariate-adaptive randomization (Strata; corrected standard errors)
treatment switching via a crossover hazard (Crossover; hypothetical no-switching estimand)
time-varying post-randomization covariates in the censoring model (CensoringTV)

Currently out of scope:

longitudinal treatment regimes
recurrent events
left truncation or delayed entry
non-binary treatment without custom intervention work

The intercurrent-event and design features are walked through in the Trial-design and regulatory toolkit article.

Example data

The examples below use survival::pbc only to provide a complete data shape. Replace this block with your trial data.

library(concrete)
library(data.table)

trial <- as.data.table(survival::pbc)
trial <- trial[!is.na(trt), .(id, time, status, trt, age, sex, albumin, bili)]

# Example treatment coding: 1 = active arm, 0 = control arm.
trial[, arm := as.integer(trt == 2)]

# Example competing-risk coding:
# 0 = censored, 1 = death, 2 = transplant.
trial[, event := data.table::fifelse(
  status == 2, 1L,
  data.table::fifelse(status == 1, 2L, 0L)
)]

trial <- trial[, .(id, time, event, arm, age, sex, albumin, bili)]

Before fitting your own data, you can run the installed smoke test:

source(system.file("examples", "trialist-smoke-test.R", package = "concrete"))

This verifies that the package can run a small competing-risk analysis and prints the same kinds of outputs and diagnostics you should inspect in your trial.

First intent-to-treat analysis

Start with a conservative analysis before adding flexible hazard learners. This uses the default treatment Super Learner and default Cox candidate hazards.

ConcreteArgs <- formatArguments(
  DataTable = trial,
  EventTime = "time",
  EventType = "event",
  Treatment = "arm",
  ID = "id",
  Intervention = makeITT(),
  TargetTime = c(365, 730, 1095),
  TargetEvent = 1,
  CVArg = list(V = 5),
  Verbose = FALSE
)

ConcreteEst <- doConcrete(ConcreteArgs)
print(ConcreteEst)

makeITT() creates two static interventions:

A=1: everyone assigned to the active arm
A=0: everyone assigned to the control arm

This is the usual starting point for an intent-to-treat comparison when the treatment column is the randomized treatment assignment.

Output for a trial report

ConcreteOut <- getOutput(
  ConcreteEst,
  Estimand = c("Risk", "RD", "RR"),
  Intervention = c(1, 2),
  GComp = TRUE,
  Simultaneous = TRUE
)

ConcreteOut[Event == 1]

The table below is the actual TMLE output from this PBC example (event of interest, selected target times). The exact values will change with your data, target times, learner library, and confidence-interval settings.

	Time	Event	Estimand	Intervention	Estimator	Pt Est	se	CI Low	CI Hi
9	730	1	Abs Risk	A=0	tmle	0.097	0.022	0.054	0.140
11	730	1	Abs Risk	A=1	tmle	0.112	0.027	0.061	0.164
13	730	1	Rel Risk	[A=1] / [A=0]	tmle	1.156	0.378	0.416	1.896
15	730	1	Risk Diff	[A=1] - [A=0]	tmle	0.015	0.034	-0.052	0.083
17	1460	1	Abs Risk	A=0	tmle	0.241	0.034	0.174	0.307
19	1460	1	Abs Risk	A=1	tmle	0.244	0.035	0.176	0.313
21	1460	1	Rel Risk	[A=1] / [A=0]	tmle	1.015	0.204	0.615	1.416
23	1460	1	Risk Diff	[A=1] - [A=0]	tmle	0.004	0.049	-0.092	0.099
25	2190	1	Abs Risk	A=0	tmle	0.325	0.040	0.247	0.402
27	2190	1	Abs Risk	A=1	tmle	0.304	0.038	0.229	0.379
29	2190	1	Rel Risk	[A=1] / [A=0]	tmle	0.937	0.165	0.614	1.259
31	2190	1	Risk Diff	[A=1] - [A=0]	tmle	-0.021	0.055	-0.129	0.088

plot(ConcreteOut) draws the same estimates as covariate-adjusted absolute-risk curves by arm, with pointwise error bars and a shaded simultaneous confidence band:

plot(ConcreteOut, ask = FALSE)

Adjusted absolute risk by arm

Important columns:

Time: target follow-up time
Event: target event code
Estimand: absolute risk, risk difference, or relative risk
Intervention: intervention being estimated or compared
Estimator: tmle or optional gcomp plug-in estimate
Pt Est: point estimate
CI Low, CI Hi: confidence interval

For risk differences, positive values mean the first intervention listed in Intervention = c(1, 2) has higher risk than the second. With makeITT(), that is A=1 minus A=0.

getOutput() also reports a two-sided Wald pValue for the risk difference and risk ratio, and can run a one-sided non-inferiority assessment against a margin:

getOutput(
  ConcreteEst, Estimand = "RD", Intervention = c(1, 2),
  NIMargin = 0.05, NIDirection = "upper"   # active is non-inferior if its excess risk stays below 0.05
)

Restricted mean survival time and life-years lost

Because the hazard ratio is non-collapsible and depends on proportional hazards, many trials report a restricted mean survival time (RMST) instead. getRMST() integrates the targeted cumulative incidence over the follow-up horizon, so request a reasonably dense TargetTime grid when you plan to use it.

# RMST (event-free time) and cause-specific life-years lost, with the
# between-arm difference in restricted mean times:
getRMST(ConcreteEst, Horizon = 1095, Intervention = c(1, 2))

The RMST Diff row is the difference in mean event-free time up to the horizon (e.g. “extra days alive and event-free under the active arm”); LYL Diff rows are the cause-specific differences in time lost.

getRMST() integrates the pointwise-targeted risks, so it needs a reasonably dense TargetTime grid. If you only fit a few target times, or you have rare events or a long horizon, targetRMST() targets the RMST estimating equation directly (fluctuating the hazards for the integrated clever covariate) and is better conditioned:

targetRMST(ConcreteEst, Horizon = 1095, Intervention = c(1, 2))

How much did covariate adjustment buy you?

Covariate adjustment does not change the estimand in a randomized trial, but it can tighten the confidence intervals. Fit an unadjusted (treatment-only) version and compare with getRelativeEfficiency():

unadj_args <- formatArguments(
  DataTable = trial, EventTime = "time", EventType = "event", Treatment = "arm",
  ID = "id", Intervention = makeITT(), TargetTime = c(365, 730, 1095),
  TargetEvent = 1, CVArg = list(V = 5),
  Model = list(arm = "SL.mean",
               "0" = list(Cox = survival::Surv(time, event == 0) ~ arm),
               "1" = list(Cox = survival::Surv(time, event == 1) ~ arm))
)
unadj_est <- doConcrete(unadj_args)

getRelativeEfficiency(
  Adjusted   = getOutput(ConcreteEst, Estimand = c("Risk", "RD"), Intervention = c(1, 2)),
  Unadjusted = getOutput(unadj_est,   Estimand = c("Risk", "RD"), Intervention = c(1, 2))
)

RelEfficiency above 1 and a positive VarReductionPct mean adjustment was worth it: a value of 1.25 says the adjusted analysis on these n participants has the precision of an unadjusted analysis on 25% more.

Compare with familiar analyses

Use standard analyses as context, not as exact substitutes. A Cox hazard ratio is not the same estimand as a marginal risk difference or cumulative incidence risk ratio.

# Event counts by arm are always the first check.
trial[, .N, by = .(arm, event)]

# Cause-specific Cox model for the event of interest.
cox_event1 <- survival::coxph(
  survival::Surv(time, event == 1) ~ arm + age + sex + albumin + bili,
  data = trial
)
summary(cox_event1)

# Kaplan-Meier style context for a non-competing-risk endpoint.
# For competing risks, prefer a cumulative incidence estimator in your usual
# trial reporting workflow and compare target times to ConcreteOut.
km_event1 <- survival::survfit(
  survival::Surv(time, event == 1) ~ arm,
  data = trial
)
summary(km_event1, times = c(365, 730, 1095))

Useful comparison questions:

Are event counts and censoring patterns plausible by treatment arm?
Are concrete absolute risks close to unadjusted estimates in a balanced trial?
Do adjusted estimates move in the expected direction when important prognostic covariates are included?
Are the TMLE estimates and g-computation plug-in estimates close?
Are confidence intervals wider in rare-event settings?

Check convergence

components <- getTmleDiagnostics(ConcreteEst, type = "components")
components[order(ratio, decreasing = TRUE)][1:10]

trace <- getTmleDiagnostics(ConcreteEst, type = "trace")
trace

For a clean fit, the component diagnostics should have check = TRUE for all targeted components under the selected stopping rule. A compact success summary from the smoke test looks like:

analysis	status	converged	step	max_ratio	failing_components
cox_only	ok	TRUE	4	0.743	0

If the fit does not converge, try the settings in the convergence diagnostics article before changing the estimand.

ConcreteArgs$UpdateMethod <- "adaptive"
ConcreteArgs$EICStopRule <- "absolute"
ConcreteArgs$EICStopAbsTol <- 0.02 / sqrt(nrow(ConcreteArgs$Data))
ConcreteArgs <- formatArguments(ConcreteArgs)

ConcreteEst <- doConcrete(ConcreteArgs)
getTmleDiagnostics(ConcreteEst, type = "components")

When sharing results with collaborators or opening a GitHub issue, include:

package version from packageVersion("concrete")
event counts by treatment arm
target event and target time
learner library used in Model
TMLE controls: UpdateMethod, EICStopRule, EICStopAbsTol
getTmleDiagnostics(ConcreteEst, type = "components")
whether the issue reproduces with the conservative Cox-only analysis