Title: | Personalized Treatment Evaluator |
---|---|
Description: | We provide inference for personalized medicine models. Namely, we answer the questions: (1) how much better does a purported personalized recommendation engine for treatments do over a business-as-usual approach and (2) is that difference statistically significant? |
Authors: | Adam Kapelner, Alina Levine & Justin Bleich |
Maintainer: | Adam Kapelner <[email protected]> |
License: | GPL-3 |
Version: | 1.7 |
Built: | 2024-11-25 05:44:27 UTC |
Source: | https://github.com/cran/PTE |
A list with two objects (a) X
, a dataframe with n rows representing clinical subjects and columns:
treatment, x1, x2, x3, x4 and x5 where treatment is binary indicating the two arms of the clinical trial
and x1, ..., x5 are covariates that were collected about each subject and (b) y
, a length n vector storing
the continuous response values where, in this mock dataset, larger values indicate "better" outcomes for the
subjects.
My Name [email protected]
Plots a summary of the bootstrap samples.
## S3 method for class 'PTE_bootstrap_results' plot(x, ...)
## S3 method for class 'PTE_bootstrap_results' plot(x, ...)
x |
A |
... |
Other methods passed to plot |
Adam Kapelner
Prints a summary of the model to the console
## S3 method for class 'PTE_bootstrap_results' print(x, ...)
## S3 method for class 'PTE_bootstrap_results' print(x, ...)
x |
A |
... |
Other methods passed to print |
Adam Kapelner
Personalized Medicine...
Adam Kapelner [email protected], Alina Levine and Justin Bleich
Kapelner, A, Bleich, J, Cohen, ZD, DeRubeis, RJ and Berk, R (2014) Inference for Treatment Regime Models in Personalized Medicine, arXiv
Runs B bootstrap samples using a prespecified model then computes the two I estimates based on cross validation.
p values of the two I estimates are computed for a given and
confidence intervals are provided using the basic, percentile methods by default and the BCa method as well
if desired.
PTE_bootstrap_inference(X, y, regression_type = "continuous", incidence_metric = "odds_ratio", personalized_model_build_function = NULL, censored = NULL, predict_function = function(mod, Xyleftout) { predict(mod, Xyleftout) }, difference_function = NULL, cleanup_mod_function = NULL, y_higher_is_better = TRUE, verbose = FALSE, full_verbose = FALSE, H_0_mu_equals = NULL, pct_leave_out = 0.1, m_prop = 1, B = 3000, alpha = 0.05, run_bca_bootstrap = FALSE, display_adversarial_score = FALSE, num_cores = NULL)
PTE_bootstrap_inference(X, y, regression_type = "continuous", incidence_metric = "odds_ratio", personalized_model_build_function = NULL, censored = NULL, predict_function = function(mod, Xyleftout) { predict(mod, Xyleftout) }, difference_function = NULL, cleanup_mod_function = NULL, y_higher_is_better = TRUE, verbose = FALSE, full_verbose = FALSE, H_0_mu_equals = NULL, pct_leave_out = 0.1, m_prop = 1, B = 3000, alpha = 0.05, run_bca_bootstrap = FALSE, display_adversarial_score = FALSE, num_cores = NULL)
X |
A |
y |
An |
regression_type |
A string indicating the regression problem. Legal values are "continous" (the response |
incidence_metric |
Ignored unless the |
personalized_model_build_function |
An R function that will be evaluated to construct the personalized medicine / recommendation
model. In the formula for the model, the response is "y", the treatment vector is
"treatment" and the data is "Xytrain". This function must return some type of object
that can be used for prediction later via personalized_model_build_function = switch(regression_type, continuous = function(Xytrain) #defalt is OLS regression lm(y ~ . * treatment, data = Xytrain) , incidence = function(Xytrain) #default is logistic regression glm(y ~ . * treatment, data = Xytrain, family = "binomial") , survival = function(Xytrain) #default is Weibull regression survreg(Surv(Xytrain$y, Xytrain$censored) ~ (. - censored) * treatment, data = Xytrain, dist = "weibull") ) |
censored |
Only required if the |
predict_function |
An R function that will be evaluated on left out data after the model is built with the training data. This function
uses the object "mod" that is the result of the function(mod, Xyleftout) predict(mod, Xyleftout) |
difference_function |
A function which takes the result of one out of sample experiment (boostrap or not) of all n samples and converts it into a difference that will be used as a score in a score distribution to determine if the personalization model is statistically significantly able to distinguish subjects. The function looks as follows: function(results, indices_1_1, indices_0_0, indices_0_1, indices_1_0) ... c(rec_vs_non_rec_diff_score, rec_vs_all_score, rec_vs_best_score) where est_true est_counterfactual given_tx rec_tx real_y censored 166.8 152.2 1 1 324 1 1679.1 2072.0 1 0 160 0 The arguments This function should return three numeric scores: the recommend vs. the non-recommended (adversarial), the recommended vs. all (all) and the recommended vs. the best average treatment (best) as a 3-dimensional vector as illustrated above. By default, this parameter is |
cleanup_mod_function |
A function that is called at the end of a cross validation iteration to cleanup the model
in some way. This is used for instance if you would like to release the memory your model is using but generally does not apply.
The default is |
y_higher_is_better |
True if a response value being higher is clinically "better" than one that is lower (e.g. cognitive ability in a drug trial for the
mentally ill). False if the response value being lower is clinically "better" than one that is higher (e.g. amount of weight lost
in a weight-loss trial). Default is |
verbose |
Prints out a dot for each bootstrap sample. This only works on some platforms. |
full_verbose |
Prints out full information for each cross validation model for each bootstrap sample. This only works on some platforms. |
H_0_mu_equals |
The |
pct_leave_out |
In the cross-validation, the proportion of the original dataset left out to estimate out-of-sample metrics. The default is 0.1 which corresponds to 10-fold cross validation. |
m_prop |
Within each bootstrap sample, the proportion of the total number of rows of |
B |
The number of bootstrap samples to take. We recommend making this as high as you can tolerate given speed considerations. The default is 3000. |
alpha |
Defines the confidence interval size (1 - alpha). Defaults to 0.05. |
run_bca_bootstrap |
Do the BCA bootstrap as well. This takes double the time. It defaults to |
display_adversarial_score |
The adversarial score records the personalization metric versus the deliberate opposite of the personalization. This does not correspond
to any practical situation but it is useful for debugging. Default is |
num_cores |
The number of cores to use in parallel to run the bootstrap samples more rapidly.
Defaults to |
A results object of type "PTE_bootstrap_results" that contains much information about the observed results and the bootstrap runs, including hypothesis testing and confidence intervals.
Adam Kapelner
## Not run: library(PTE) B = 1000 #lower this for quicker demos ##response: continuous data(continuous_example) X = continuous_example$X y = continuous_example$y pte_results = PTE_bootstrap_inference(X, y, regression_type = "continuous", B = B) pte_results ##response: incidence data(continuous_example) X = continuous_example$X y = continuous_example$y y = ifelse(y > quantile(y, 0.75), 1, 0) #force incidence and pretend y came to you this way #there are three ways to assess incidence effects below: # odds ratio, risk ratio and probability difference pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B) pte_results pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B, incidence_metric = "risk_ratio") pte_results pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B, incidence_metric = "probability_difference") pte_results ##response: survival data(survival_example) X = survival_example$X y = survival_example$y censored = survival_example$censored pte_results = PTE_bootstrap_inference(X, y, censored = censored, regression_type = "survival", B = 1000) pte_results ## End(Not run)
## Not run: library(PTE) B = 1000 #lower this for quicker demos ##response: continuous data(continuous_example) X = continuous_example$X y = continuous_example$y pte_results = PTE_bootstrap_inference(X, y, regression_type = "continuous", B = B) pte_results ##response: incidence data(continuous_example) X = continuous_example$X y = continuous_example$y y = ifelse(y > quantile(y, 0.75), 1, 0) #force incidence and pretend y came to you this way #there are three ways to assess incidence effects below: # odds ratio, risk ratio and probability difference pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B) pte_results pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B, incidence_metric = "risk_ratio") pte_results pte_results = PTE_bootstrap_inference(X, y, regression_type = "incidence", B = B, incidence_metric = "probability_difference") pte_results ##response: survival data(survival_example) X = survival_example$X y = survival_example$y censored = survival_example$censored pte_results = PTE_bootstrap_inference(X, y, censored = censored, regression_type = "survival", B = 1000) pte_results ## End(Not run)
Prints a summary of the model to the console
## S3 method for class 'PTE_bootstrap_results' summary(object, ...)
## S3 method for class 'PTE_bootstrap_results' summary(object, ...)
object |
A |
... |
Other methods passed to summary |
Adam Kapelner
A list with three objects (a) X
, a dataframe with n rows representing clinical subjects and columns:
treatment, x1, x2, x3 and x4 where treatment is binary indicating the two arms of the clinical trial
and x1, ..., x4 are covariates that were collected about each subject (b) y
, a length n vector storing
the survival response values (a time measurement) where, in this mock dataset, smaller values indicate "better"
survival outcomes for the subjects and (c) censored
, a length n vector storing the censor dummies where
c_16 = 1 means the response y_16 was censored and thus the truth value of y_16 is unknown and y_16 only represents
the moment it was censored (and c_16 = 0 means it was uncensored and y_16 is the true response value).
My Name [email protected]