Interaction Identification and Estimation using Data-Adaptive Stochastic Interventions

library(InterXshift)
library(devtools)
#> Loading required package: usethis
library(kableExtra)
library(sl3)

seed <- 429153
set.seed(seed)

Motivation

The motivation behind the package InterXshift is to address the limitations of traditional statistical methods in environmental epidemiology studies. Analysts are often interested in understanding the joint impact of a mixed exposure, i.e. a vector of exposures, but the most important exposures and exposure sets remain unknown. Traditional methods make overly simplistic assumptions, such as linear and additive relationships, and the resulting statistical quantities may not be directly applicable to public policy decisions. More flexible methods offer dose response relationships but do not result in an easily interpretable estimate. Here, we utilize cross-validated targeted learning with stochastic shift interventions to identify exposures in a mixture that have the strongest positive and negative effect. We also define interaction as a comparison of the expected outcome under joint shift to additive shifts. With this definition we identify the top synergistic and antagonistic relationships and provide efficient estimates of stochastic shift interventions.

Target Parameter

InterXshift uses data-adaptive machine learning methods to identify the exposures and exposure sets that have the most synergy or antagonism. This is defined based on our target parameter:

$$ \[\begin{align} \Psi_{\text{IE}} = &E[Y | A_i + \delta_i, A_j + \delta_j, \mathbf{A}_{- i,j}, W] \nonumber \\ &- \big(E[Y | A_i + \delta_i, \mathbf{A}_{- i}, W] + E[Y | A_j + \delta_j, \mathbf{A}_{- j}, W]\big) \nonumber \\ &+ E[Y] \end{align}\]

Which measures the expected outcome under a shift of two exposures to the sum of their individual impacts. The exposure sets with the largest positive values have synergy meaning their joint impact is more than additive. The exposure sets with the largest negative value have the most antagonism.

G-computation Approximation

To identify the most synergistic and antagonistic exposure sets we employ a g-computation framework. We first construct a super learner, or ensemble of machine learning algorithms. We then construct all two-way combinations and for each combination calculate our interaction. We rank these results and select the top positive and negative results which represent synergy and antagonism.

Sample Splitting

Because we don’t know which exposure sets to estimate our interaction parameter on a priori, we need to split our data and in the training fold find the exposure sets, using the above g-computation framework, and in an estimation sample, efficiently estimate the interaction parameter for these exposure sets so that we can get valid confidence intervals for these data-adaptively identified interactions. This is done using targeted learning.

Targeted Learning

We use targeted minimum loss based estimation (TMLE) to debias our initial outcome estimates given a shift in exposure such that the resulting estimator is asymptotically unbiased and has the smallest variance. This procedure involves constructing a least favorable submodel which uses a ratio of conditional exposure densities under shift and now shift as a covariate to debias the initial estimates. More details of targeted learning for stochastic shifts are here Dı́az and van der Laan (2012)

Data Adaptive Delta

For each exposure, the user inputs a respective delta, for example for exposures $M1, M2, M3$ the analyst puts in the deltas vector

deltas <- c("M1" = 1, "M2" = 2.3, "M3" = 1.4)

Which assigns a delta shift amount to each exposure. However, because the user doesn’t know the underlying experimentation in the data, a delta that is too big may result in positivity violations which leads to bias and high variance of the estimator. That is, if the user asks for a shift in exposure density that is very unlikely given the data. To solve this, the analyst can also choose to set adaptive_delta = TRUE which then makes the shift amount a data adaptive parameter as well. The delta will be reduced until the max of the ratio of densities for each exposure is less than or equal to hn_trunc_thresh.

Inputs and Outputs

InterXshift takes in covariates, exposures, an outcome, a list of deltas to shift each exposure by, the estimator (tmle or one step) number of folds for the CV procedure, parallelization parameters and if the delta should be data-adaptive. Also the user defines top_n which is the number of ranked synergistic, antagonistic and individual positive and negative effects to extract from the mixed exposure.

The package then outputs K-fold specific results, the result found in each fold, and the oracle result which is the parameter that is pooled across all the folds. For example, we will get the fold specific synergy results for the rank 1 and rank 2 exposure sets found to have the highest expectation under joint shift compared to positive shift. The pooled parameter is the oracle rank 1, meaning we do a pooled TMLE for all the rank 1 findings across the folds. The same is true for the other parameters.

We get pooled rank synergy, antagonism, positive and negative effects. If the same exposure sets are used across the folds, the pooled estimate is interpretable and has greater power compared to k-fold. If not, the analyst must investigate the k-fold specific results and report consistency of findings.

NHANES Data

Here we will try and replicate the analysis by Gibson et. al.:

https://ehjournal.biomedcentral.com/articles/10.1186/s12940-019-0515-1

and Mitro et. al, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4858394/

Who used the NHANES 2001-2002 data to investigate 18 POP exposures on telomere length.

Let’s first load the data and investigate the variables:

data("NHANES_eurocim")

exposures <- c("LBX074LA", # PCB74 Lipid Adj (ng/g)
               "LBX099LA", # PCB99 Lipid Adj (ng/g)
               "LBX118LA", # PCB118 Lipid Adj (ng/g)
               "LBX138LA", # PCB138 Lipid Adj (ng/g)
               "LBX153LA", # PCB153 Lipid Adj (ng/g)
               "LBX170LA", # PCB170 Lipid Adj (ng/g)
               "LBX180LA", # PCB180 Lipid Adj (ng/g)
               "LBX187LA", # PCB187 Lipid Adj (ng/g)
               "LBX194LA", # PCB194 Lipid Adj (ng/g)
               "LBXD03LA", # 1,2,3,6,7,8-hxcdd Lipid Adj (pg/g)
               "LBXD05LA", # 1,2,3,4,6,7,8-hpcdd Lipid Adj (pg/g)
               "LBXD07LA", # 1,2,3,4,6,7,8,9-ocdd Lipid Adj (pg/g)
               "LBXF03LA", # 2,3,4,7,8-pncdf Lipid Adj (pg/g)
               "LBXF04LA", # 1,2,3,4,7,8-hxcdf Lipid Adj (pg/g)
               "LBXF05LA", # 1,2,3,6,7,8-hxcdf Lipid Adj (pg/g)
               "LBXF08LA", # 1,2,3,4,6,7,8-hxcdf Lipid Adj (pg/g)
               "LBXHXCLA", # 3,3',4,4',5,5'-hxcb Lipid Adj (pg/g)
               "LBXPCBLA") # 3,3',4,4',5-pcnb Lipid Adj (pg/g)

NHANES_eurocim <- NHANES_eurocim[complete.cases(NHANES_eurocim[, exposures]), ]

outcome <- "TELOMEAN"

covariates <- c("LBXWBCSI", # White blood cell count (SI)
                "LBXLYPCT", # Lymphocyte percent (%)
                "LBXMOPCT", # Monocyte percent (%)
                "LBXEOPCT", # Eosinophils percent (%)
                "LBXBAPCT", # Basophils percent (%)
                "LBXNEPCT", # Segmented neutrophils percent (%)
                "male", # Sex
                "age_cent", # Age at Screening, centered
                "race_cat", # race
                "bmi_cat3", # Body Mass Index (kg/m**2)
                "ln_lbxcot", # Cotinine (ng/mL), log-transformed
                "edu_cat") # Education Level - Adults 20+

To improve consistency in the region we find to be the maximizing region, it is best to remove an exposure of highly correlated sets, we do that here:

# Calculate the correlation matrix for the exposures
cor_matrix <- cor(NHANES_eurocim[, exposures], use = "complete.obs")

# Set a threshold for high correlation
threshold <- 0.8

# Find pairs of highly correlated exposures
highly_correlated_pairs <- which(abs(cor_matrix) > threshold & lower.tri(cor_matrix), arr.ind = TRUE)

# Initiate a vector to keep track of exposures to remove
exposures_to_remove <- c()

# Loop through the highly correlated pairs and decide which exposure to remove
for (pair in seq_len(nrow(highly_correlated_pairs))) {
  row <- highly_correlated_pairs[pair, "row"]
  col <- highly_correlated_pairs[pair, "col"]

  if (!(colnames(cor_matrix)[row] %in% exposures_to_remove)) {
    exposures_to_remove <- c(exposures_to_remove, colnames(cor_matrix)[row])
  }
}

# Keep only uncorrelated exposures
exposures_to_keep <- setdiff(exposures, exposures_to_remove)

Run InterXshift

deltas <- list(
  "LBX074LA" = 1, "LBX099LA" = 1, "LBXD03LA" = 1,
  "LBXD05LA" = 1, "LBXF03LA" = 1, "LBXF04LA" = 1, "LBXF08LA" = 1, 
  "LBXPCBLA" = 1
)

ptm <- proc.time()

w <- NHANES_eurocim[, covariates]
a <- NHANES_eurocim[, exposures_to_keep]
y <- NHANES_eurocim$TELOMEAN

NIEH_results <- InterXshift(
  w = w,
  a = a,
  y = y,
  deltas = deltas,
  n_folds = 5,
  outcome_type = "continuous",
  parallel = TRUE,
  parallel_type = "multi_session",
  num_cores = 5,
  seed = seed,
  adaptive_delta = FALSE,
  top_n = 1
)
#> 
#> Iter: 1 fn: 2325.8904     Pars:  0.15996 0.84004
#> Iter: 2 fn: 2325.8904     Pars:  0.15996 0.84004
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3165.9080     Pars:  0.06705 0.93295
#> Iter: 2 fn: 3165.9080     Pars:  0.06704 0.93296
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2857.9107     Pars:  0.07182 0.92818
#> Iter: 2 fn: 2857.9107     Pars:  0.07182 0.92818
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3142.8885     Pars:  0.07186 0.92814
#> Iter: 2 fn: 3142.8885     Pars:  0.07186 0.92814
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3046.6289     Pars:  0.05747 0.94253
#> Iter: 2 fn: 3046.6289     Pars:  0.05747 0.94253
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2137.4014     Pars:  0.11305 0.88695
#> Iter: 2 fn: 2137.4014     Pars:  0.11305 0.88695
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2150.0854     Pars:  0.34428 0.65572
#> Iter: 2 fn: 2150.0854     Pars:  0.34428 0.65572
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1964.8342     Pars:  0.37014 0.62986
#> Iter: 2 fn: 1964.8342     Pars:  0.37014 0.62986
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2324.8068     Pars:  0.39370 0.60630
#> Iter: 2 fn: 2324.8068     Pars:  0.39370 0.60630
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2760.7986     Pars:  0.57526 0.42474
#> Iter: 2 fn: 2760.7986     Pars:  0.57526 0.42474
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3020.5766     Pars:  0.06794 0.93206
#> Iter: 2 fn: 3020.5766     Pars:  0.06794 0.93206
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2778.2071     Pars:  0.67232 0.32768
#> Iter: 2 fn: 2778.2071     Pars:  0.67232 0.32768
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2568.9175     Pars:  0.17363 0.82637
#> Iter: 2 fn: 2568.9175     Pars:  0.17363 0.82637
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2274.4333     Pars:  0.20559 0.79441
#> Iter: 2 fn: 2274.4333     Pars:  0.20559 0.79441
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1602.7867     Pars:  0.07339 0.92661
#> Iter: 2 fn: 1602.7867     Pars:  0.07339 0.92661
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1228.9266     Pars:  0.04116 0.95884
#> Iter: 2 fn: 1228.9266     Pars:  0.04116 0.95884
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2308.6854     Pars:  0.29665 0.70335
#> Iter: 2 fn: 2308.6854     Pars:  0.29666 0.70334
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2947.1401     Pars:  0.06799 0.93201
#> Iter: 2 fn: 2947.1401     Pars:  0.06799 0.93201
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3160.0092     Pars:  0.05740 0.94260
#> Iter: 2 fn: 3160.0092     Pars:  0.05740 0.94260
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3181.2465     Pars:  0.07397 0.92603
#> Iter: 2 fn: 3181.2465     Pars:  0.07397 0.92603
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2905.0564     Pars:  0.04282 0.95718
#> Iter: 2 fn: 2905.0564     Pars:  0.04282 0.95718
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2224.6723     Pars:  0.08223 0.91777
#> Iter: 2 fn: 2224.6723     Pars:  0.08223 0.91777
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1629.4414     Pars:  0.05270 0.94730
#> Iter: 2 fn: 1629.4414     Pars:  0.05270 0.94730
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1980.5925     Pars:  0.31678 0.68322
#> Iter: 2 fn: 1980.5925     Pars:  0.31678 0.68322
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1631.8139     Pars:  0.09748 0.90252
#> Iter: 2 fn: 1631.8139     Pars:  0.09748 0.90252
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3000.6226     Pars:  0.07931 0.92069
#> Iter: 2 fn: 3000.6226     Pars:  0.07931 0.92069
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2868.0589     Pars:  0.06890 0.93110
#> Iter: 2 fn: 2868.0589     Pars:  0.06890 0.93110
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2914.1244     Pars:  0.04442 0.95558
#> Iter: 2 fn: 2914.1244     Pars:  0.04441 0.95559
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2829.1469     Pars:  0.04008 0.95992
#> Iter: 2 fn: 2829.1469     Pars:  0.04008 0.95992
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2401.3614     Pars:  0.53997 0.46003
#> Iter: 2 fn: 2401.3614     Pars:  0.53997 0.46003
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1261.2039     Pars:  0.05576 0.94424
#> Iter: 2 fn: 1261.2039     Pars:  0.05576 0.94424
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1357.7149     Pars:  0.07634 0.92366
#> Iter: 2 fn: 1357.7149     Pars:  0.07635 0.92365
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2282.2687     Pars:  0.22833 0.77167
#> Iter: 2 fn: 2282.2687     Pars:  0.22834 0.77166
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3417.7656     Pars:  0.10997 0.89003
#> Iter: 2 fn: 3417.7656     Pars:  0.10996 0.89004
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3297.2807     Pars:  0.11893 0.88107
#> Iter: 2 fn: 3297.2807     Pars:  0.11893 0.88107
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 3472.6381     Pars:  0.11233 0.88767
#> Iter: 2 fn: 3472.6381     Pars:  0.11233 0.88767
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2853.6616     Pars:  0.04173 0.95827
#> Iter: 2 fn: 2853.6616     Pars:  0.04173 0.95827
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2335.5537     Pars:  0.27978 0.72022
#> Iter: 2 fn: 2335.5537     Pars:  0.27977 0.72023
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 1892.6252     Pars:  0.25059 0.74941
#> Iter: 2 fn: 1892.6252     Pars:  0.25059 0.74941
#> solnp--> Completed in 2 iterations
#> 
#> Iter: 1 fn: 2036.2921     Pars:  0.58975 0.41025
#> Iter: 2 fn: 2036.2921     Pars:  0.58975 0.41025
#> solnp--> Completed in 2 iterations

proc.time() - ptm
#>      user    system   elapsed 
#>  1440.082   514.657 17518.854

Investigating Results

One note is that for this example we will get more variability than normal because we only run 3 fold CV due to computational time.

Let’s first look at the results for each fold:

NIEH_results$`Pos Shift Results by Rank` %>%
  kableExtra::kbl(caption = "Top Positive Effects") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Top Positive Effects

Condition	Psi	Variance	SE	Lower CI	Upper CI	P-value	Fold	Type	Variables	N	Delta
LBXF03LA	0.0752633	0.0003448	0.0185682	0.0389	0.1117	0.0000505	1	Indiv Shift	LBXF03LA	202	1
LBXF03LA	-0.0063054	0.0001419	0.0119117	-0.0297	0.0170	0.5965658	2	Indiv Shift	LBXF03LA	202	1
LBXF03LA	0.0323595	0.0001715	0.0130950	0.0067	0.0580	0.0134685	3	Indiv Shift	LBXF03LA	201	1
LBXF04LA	0.0001602	0.0000959	0.0097917	-0.0190	0.0194	0.9869470	4	Indiv Shift	LBXF04LA	201	1
LBXF03LA	0.0110933	0.0001405	0.0118534	-0.0121	0.0343	0.3493350	5	Indiv Shift	LBXF03LA	201	1
Rank 1	0.0002385	0.0000644	0.0080267	-0.0155	0.0160	0.9763000	Pooled TMLE	Indiv Shift	Rank 1	1007	1

Here we see LBXF03LA is found in all the folds for the top positive effects. Estimates under Psi indicate the change in expected outcome under a 1-unit intervention compared to the mean outcome under the observed exposure distribution. Fold is the fold number estimates are being made for and the last column is the pooled TMLE estimate for all of our rank 1 estimates. Variance is estimated from the influence function for stochastic shift interventions with subsequent standard error and CI estimates.

N is the number of observations used in the calculation. In the pooled results this should be equal to the total sample size. Here, because we are interested in positive associations, if the associations are significant we expect our Psi estimates to be positive, indicating a shift is greater than the observed mean outcome.

Users here should look for inconsistencies in the exposure that is found as the top positive effect. Inconsistencies indicate there is not “strong signal” for this estimate and make the pooled rank 1 estimate difficult to interpret.

NIEH_results$`Neg Shift Results by Rank` %>%
  kableExtra::kbl(caption = "Top Negative Effects") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Top Negative Effects

Condition	Psi	Variance	SE	Lower CI	Upper CI	P-value	Fold	Type	Variables	N	Delta
LBXD05LA	0.0330499	0.0003561	0.0188693	-0.0039	0.0700	0.0798577	1	Indiv Shift	LBXD05LA	202	1
LBXF08LA	-0.0427118	0.0001757	0.0132536	-0.0687	-0.0167	0.0012701	2	Indiv Shift	LBXF08LA	202	1
LBXD03LA	0.0545727	0.0001489	0.0122022	0.0307	0.0785	0.0000077	3	Indiv Shift	LBXD03LA	201	1
LBXD03LA	0.2675288	0.0327868	0.1810712	-0.0874	0.6224	0.1395475	4	Indiv Shift	LBXD03LA	201	1
LBXD03LA	0.0439702	0.0002065	0.0143718	0.0158	0.0721	0.0022173	5	Indiv Shift	LBXD03LA	201	1
Rank 1	0.0466035	0.0015466	0.0393273	-0.0305	0.1237	0.2360108	Pooled TMLE	Indiv Shift	Rank 1	1007	1

Here is the same type of table but for the top negative results. Output is in the same format as above for the positive results. If the Psi estimates are significant, we expect these effects to be negative. Indicating the shift by 1 unit reduces the expected outcome compared to the observed mean.

k_fold_synergy <- do.call(rbind, NIEH_results$`K Fold Synergy Results`)

k_fold_synergy %>%
  kableExtra::kbl(caption = "Top Synergy Effects") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Top Synergy Effects
	Rank	Psi	Variance	SE	Lower CI	Upper CI	P-value	Fold	N	Delta Exposure 1	Delta Exposure 2	Type
Rank 1.1	1	-0.0739546	0.0026787	0.0517558	-0.1754	0.0275	0.7451230	1	202	1	1	LBXD03LA
Rank 1.2	1	-0.0845122	0.0067418	0.0821086	-0.2454	0.0764	0.7680443	1	202	1	1	LBXD05LA
Rank 1.3	1	0.0101848	0.0000861	0.0092778	-0.0080	0.0284	0.9157903	1	202	1	1	LBXD03LA-LBXD05LA
Rank 1.4	1	0.1686516	0.0092959	0.0964154	-0.0203	0.3576	0.5870288	1	202	1	1	Interaction
Rank 1.5	1	-0.0540903	0.0001286	0.0113391	-0.0763	-0.0319	0.6114807	2	202	1	1	LBXD03LA
Rank 1.6	1	-0.0391537	0.0001620	0.0127277	-0.0641	-0.0142	0.7285498	2	202	1	1	LBXF08LA
Rank 1.7	1	-0.0323316	0.0001356	0.0116457	-0.0552	-0.0095	0.7644805	2	202	1	1	LBXD03LA-LBXF08LA
Rank 1.8	1	0.0609124	0.0001829	0.0135256	0.0344	0.0874	0.6004499	2	202	1	1	Interaction
Rank 1.9	1	-0.1252976	0.0123424	0.1110964	-0.3430	0.0924	0.7069781	3	201	1	1	LBXD03LA
Rank 1.10	1	-0.1355719	0.0180296	0.1342743	-0.3987	0.1276	0.7114006	3	201	1	1	LBXD05LA
Rank 1.11	1	0.0575058	0.0001232	0.0110977	0.0358	0.0793	0.5851505	3	201	1	1	LBXD03LA-LBXD05LA
Rank 1.12	1	0.3183752	0.0305139	0.1746823	-0.0240	0.6607	0.4462068	3	201	1	1	Interaction
Rank 1.13	1	0.2725149	0.0129081	0.1136138	0.0498	0.4952	0.4188084	4	201	1	1	LBXD03LA
Rank 1.14	1	0.0235456	0.0027286	0.0522361	-0.0788	0.1259	0.9179465	4	201	1	1	LBXD05LA
Rank 1.15	1	0.0284417	0.0022409	0.0473379	-0.0643	0.1212	0.8959947	4	201	1	1	LBXD03LA-LBXD05LA
Rank 1.16	1	-0.2676189	0.0186074	0.1364088	-0.5350	-0.0003	0.4687002	4	201	1	1	Interaction
Rank 1.17	1	-0.0002447	0.0000535	0.0073114	-0.0146	0.0141	0.9977163	5	201	1	1	LBXD03LA
Rank 1.18	1	-0.1147112	0.0150682	0.1227528	-0.3553	0.1259	0.7433588	5	201	1	1	LBXD05LA
Rank 1.19	1	-0.1123677	0.0151475	0.1230752	-0.3536	0.1289	0.7487412	5	201	1	1	LBXD03LA-LBXD05LA
Rank 1.20	1	0.0025882	0.0302730	0.1739915	-0.3384	0.3436	0.9950493	5	201	1	1	Interaction

These are now estimates for the top synergistic impact. The format of the table is similar but now we have the Type column which indicates which exposure the shift is for. Where one exposure is listed, that is a univariate shift, where two is listed that is a joint shift and for interaction this is comparing the expectation under joint shift to the sum of the estimates under the univariate shifts.

If the interaction estimates are significant, we would expect the Psi under interaction to be positive, indicating the joint shift is larger than the sum of individual shifts. Variance estimates are similarly derived from the influence function and delta method.

k_fold_antagonistic <- do.call(rbind, NIEH_results$`K Fold Antagonism Results`)

k_fold_antagonistic %>%
  kableExtra::kbl(caption = "Top Antagonistic Effects") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Top Antagonistic Effects
	Rank	Psi	Variance	SE	Lower CI	Upper CI	P-value	Fold	N	Delta Exposure 1	Delta Exposure 2	Type
Rank 1.1	1	0.0730422	0.0003387	0.0184047	0.0370	0.1091	0.5902975	1	202	1	1	LBXF03LA
Rank 1.2	1	0.0195870	0.0002035	0.0142650	-0.0084	0.0475	0.8697346	1	202	1	1	LBXF04LA
Rank 1.3	1	0.0065742	0.0002298	0.0151587	-0.0231	0.0363	0.9574162	1	202	1	1	LBXF03LA-LBXF04LA
Rank 1.4	1	-0.0860550	0.0005404	0.0232468	-0.1316	-0.0405	0.5724749	1	202	1	1	Interaction
Rank 1.5	1	-0.0102030	0.0001494	0.0122247	-0.0342	0.0138	0.9264757	2	202	1	1	LBXF03LA
Rank 1.6	1	0.0027759	0.0000643	0.0080176	-0.0129	0.0185	0.9752688	2	202	1	1	LBXF04LA
Rank 1.7	1	0.0155900	0.0002475	0.0157325	-0.0152	0.0464	0.9010833	2	202	1	1	LBXF03LA-LBXF04LA
Rank 1.8	1	0.0230170	0.0002578	0.0160567	-0.0085	0.0545	0.8558619	2	202	1	1	Interaction
Rank 1.9	1	0.0365694	0.0001913	0.0138329	0.0095	0.0637	0.7558547	3	201	1	1	LBXF03LA
Rank 1.10	1	0.0048776	0.0000917	0.0095742	-0.0139	0.0236	0.9602428	3	201	1	1	LBXF04LA
Rank 1.11	1	0.0029412	0.0002745	0.0165671	-0.0295	0.0354	0.9817695	3	201	1	1	LBXF03LA-LBXF04LA
Rank 1.12	1	-0.0385058	0.0004507	0.0212296	-0.0801	0.0031	0.7915681	3	201	1	1	Interaction
Rank 1.13	1	-0.0068210	0.0003370	0.0183576	-0.0428	0.0292	0.9598487	4	201	1	1	LBXF03LA
Rank 1.14	1	-0.0117301	0.0001073	0.0103578	-0.0320	0.0086	0.9082412	4	201	1	1	LBXF04LA
Rank 1.15	1	0.0044069	0.0001401	0.0118372	-0.0188	0.0276	0.9676905	4	201	1	1	LBXF03LA-LBXF04LA
Rank 1.16	1	0.0229581	0.0003880	0.0196985	-0.0157	0.0616	0.8700651	4	201	1	1	Interaction
Rank 1.17	1	0.0122342	0.0001369	0.0117024	-0.0107	0.0352	0.9099565	5	201	1	1	LBXF03LA
Rank 1.18	1	-0.0210257	0.0001546	0.0124327	-0.0454	0.0033	0.8504313	5	201	1	1	LBXF04LA
Rank 1.19	1	-0.0102182	0.0001578	0.0125601	-0.0348	0.0144	0.9273530	5	201	1	1	LBXF03LA-LBXF04LA
Rank 1.20	1	-0.0014267	0.0002581	0.0160662	-0.0329	0.0301	0.9910192	5	201	1	1	Interaction

This table is formatted in the same way as our synergistic results where we compare joint shifts to individual shifts. Now however, if interaction effects are significant, we expect the joint shift to be less than the sum of individual shifts and therefore the interaction effect should be negative.

Again, it is important to assess if the same exposures are found in the synergist and antagonistic relationships. If they are, we can interpret the pooled TMLE estimates, here we show our pooled estimates for the antagonistic interaction:


NIEH_results$`Pooled Antagonism Results by Rank` %>%
  kableExtra::kbl(caption = "Pooled Antagonistic Effects") %>%
  kable_classic(full_width = F, html_font = "Cambria")

Pooled Antagonistic Effects

Rank	Psi	Variance	SE	Lower CI	Upper CI	P-value	Fold	N	Delta Exposure 1	Delta Exposure 2	Type
Rank 1	-0.0029595	6.97e-05	0.0083486	-0.0193	0.0134	0.9741608	Pooled TMLE	1007	1	1	Var 1
Rank 1	0.0009444	5.26e-05	0.0072535	-0.0133	0.0152	0.9911528	Pooled TMLE	1007	1	1	Var 2
Rank 1	0.0194557	7.29e-05	0.0085384	0.0027	0.0362	0.8332369	Pooled TMLE	1007	1	1	Joint
Rank 1	0.0214709	9.63e-05	0.0098110	0.0022	0.0407	0.8283903	Pooled TMLE	1007	1	1	Interaction

This table is formatted similarly but now instead of having variable names under Type we have Var 1, Var 2, Joint, Interaction. This is because we are pooling estimates for the top antagonistic interaction and the variables used in this interation may differ. If there is consistency across all the folds, Var 1 corresponds to the first variable that is shifted, Var 2 the second, Joint is the joint shift and Interaction is the same comparison.

Again, if the interaction is significant and results are consistent we would expect this estimate to be negative. By pooling, we are able to leverage the full data which gives us more narrow confidence intervals.

Comparing to Previous Findings

Positive Association

For a shift of 1 unit in all exposures we found LBXF03LA (2,3,4,7,8-pncdf Lipid Adj (pg/g)) to have the strongest positive association. This matches the penalized regression results of Gibson et. al. Similarly, this chemical had the second highest weight in WQS regression. Using BKMR this chemical showed the highest positive association in the dose-response plots.

Negative Association

We found LBXD03LA (1,2,3,6,7,8-hxcdd Lipid Adj (pg/g)) to have the strongest negative association. Which is corroborated by the results from Gibson et al., however there are many null/negative chemical associations.

Synergy Results

LBXD03LA-LBXD05LA ((1,2,3,6,7,8-hxcdd - 1,2,3,4,6,7,8-hpcdd) had the most consistency, 3 of 4 folds in this small example. From Gibson, no interactions were found and the other methods did not assess for interactions. None of the interventions were significant here with CI’s spanning the range from antagonism to synergy for the interaction effect. For the synergy effect we would expect the joint shift to be larger than the additive effect.

Antagonistic Results

LBXF03LA (2,3,4,7,8-pncdf) and LBXF04LA (1,2,3,4,7,8-hxcdf) showed the strongest antagonism and this relationship was found across all the folds. This effect is not significant but what we would expect to see for this result is that the joint shift is less than the additive effect

Dı́az, Iván, and Mark J van der Laan. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.

David McCoy

2024-04-04