Title: | Optimal Rerandomization Experimental Designs |
---|---|
Description: | This is a tool to find the optimal rerandomization threshold in non-sequential experiments. We offer three procedures. |
Authors: | Adam Kapelner, Michael Sklar, Abba M. Krieger and David Azriel |
Maintainer: | Adam Kapelner <[email protected]> |
License: | GPL-3 |
Version: | 1.1 |
Built: | 2024-11-08 04:28:12 UTC |
Source: | https://github.com/kapelner/optimalrerandexpdesigns |
Implements the complete randomization design (CRD) AKA Bernoulli Trial
complete_randomization_plus_one_min_one(n, r)
complete_randomization_plus_one_min_one(n, r)
n |
number of observations |
r |
number of randomized designs you would like |
a matrix where each column is one of the r
designs
Adam Kapelner
Implements the balanced complete randomization design (BCRD)
complete_randomization_with_forced_balance_plus_one_min_one(n, r)
complete_randomization_with_forced_balance_plus_one_min_one(n, r)
n |
number of observations |
r |
number of randomized designs you would like |
a matrix where each column is one of the r
designs
Adam Kapelner
Returns the objective value given a design vector as well an an objective function. This is code duplication since this is implemented within Java. This is only to be run if...
compute_objective_val_plus_one_min_one_enc( X, indic_T, objective = "abs_sum_diff", inv_cov_X = NULL )
compute_objective_val_plus_one_min_one_enc( X, indic_T, objective = "abs_sum_diff", inv_cov_X = NULL )
X |
The n x p design matrix |
indic_T |
The n-length binary allocation vector |
objective |
The objective function to use. Default is |
inv_cov_X |
Optional: the inverse sample variance covariance matrix. Use this argument if you will be doing many calculations since passing this in will cache this data. |
Adam Kapelner
Compute naive / vanilla squared Frobenius Norm of matrix A
frob_norm_sq(A)
frob_norm_sq(A)
A |
The matrix of interest |
Adam Kapelner
Compute debiased Frobenius Norm of matrix Sigmahat (Appendix 5.8). Note that for S <= 2, it returns the naive estimate.
frob_norm_sq_debiased( Sigmahat, s, n, frob_norm_sq_bias_correction_min_samples = 10 )
frob_norm_sq_debiased( Sigmahat, s, n, frob_norm_sq_bias_correction_min_samples = 10 )
Sigmahat |
The var-cov matrix of interest |
s |
The number of vectors |
n |
The length of each vector |
frob_norm_sq_bias_correction_min_samples |
This estimate suffers from high variance when there are not enough samples. Thus, we only implement the correction beginning at this number of samples otherwise we return the naive estimate. Default is 10. |
Adam Kapelner
Compute debiased Frobenius Norm of matrix P times Sigmahat (Appendix 5.9). Note that for S <= 2, it returns the naive estimate.
frob_norm_sq_debiased_times_matrix( Sigmahat, A, s, n, frob_norm_sq_bias_correction_min_samples = 10 )
frob_norm_sq_debiased_times_matrix( Sigmahat, A, s, n, frob_norm_sq_bias_correction_min_samples = 10 )
Sigmahat |
The var-cov matrix of interest |
A |
The matrix that multiplies Sigmahat |
s |
The number of vectors |
n |
The length of each vector |
frob_norm_sq_bias_correction_min_samples |
This estimate suffers from high variance when there are not enough samples. Thus, we only implement the correction beginning at this number of samples otherwise we return the naive estimate. Default is 10. |
Adam Kapelner
Generates the base vectors to be used when locating the optimal rerandomization threshold
generate_W_base_and_sort( X, max_designs = 25000, imbalance_function = "mahal_dist", r = 0, max_max_iters = 5 )
generate_W_base_and_sort( X, max_designs = 25000, imbalance_function = "mahal_dist", r = 0, max_max_iters = 5 )
X |
The data as an |
max_designs |
The maximum number of designs. Default is 25,000. |
imbalance_function |
A string indicating the imbalance function. Currently, "abs_sum_difference" and "mahal_dist" are the options with the latter being the default. |
r |
An experimental feature that adds lower imbalance vectors
to the base set using the |
max_max_iters |
An experimental feature that adds lower imbalance vectors
to the base set using the |
A list including all arguments plus a matrix W_base_sorted
whose max_designs
rows are n
-length allocation vectors
and the allocation vectors are in
Adam Kapelner
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) W_base_obj ## End(Not run)
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) W_base_obj ## End(Not run)
Finds the optimal rerandomization threshold based on a user-defined quantile and a function that generates the non-linear component of the response
optimal_rerandomization_exact( W_base_object, estimator = "linear", q = 0.95, skip_search_length = 1, smoothing_degree = 1, smoothing_span = 0.1, z_sim_fun, N_z = 1000, dot_every_x_iters = 100 )
optimal_rerandomization_exact( W_base_object, estimator = "linear", q = 0.95, skip_search_length = 1, smoothing_degree = 1, smoothing_span = 0.1, z_sim_fun, N_z = 1000, dot_every_x_iters = 100 )
W_base_object |
An object that contains the assignments to begin with sorted by |
estimator |
"linear" for the covariate-adjusted linear regression estimator (default). |
q |
The tail criterion's quantile of MSE over z's. The default is 95%. |
skip_search_length |
In the exhaustive search, how many designs are skipped? Default is 1 for
full exhaustive search through all assignments provided for in |
smoothing_degree |
The smoothing degree passed to |
smoothing_span |
The smoothing span passed to |
z_sim_fun |
This function returns vectors of numeric values of size |
N_z |
The number of times to simulate z's within each strategy. |
dot_every_x_iters |
Print out a dot every this many iterations. The default is 100. Set to
|
A list containing the optimal design threshold, strategy, and other information.
Adam Kapelner
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_exact(W_base_obj, z_sim_fun = function(){rnorm(n)}, skip_search_length = 10) design ## End(Not run)
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_exact(W_base_obj, z_sim_fun = function(){rnorm(n)}, skip_search_length = 10) design ## End(Not run)
Finds the optimal rerandomization threshold based on a user-defined quantile and a function that generates the non-linear component of the response
optimal_rerandomization_normality_assumed( W_base_object, estimator = "linear", q = 0.95, skip_search_length = 1, dot_every_x_iters = 100 )
optimal_rerandomization_normality_assumed( W_base_object, estimator = "linear", q = 0.95, skip_search_length = 1, dot_every_x_iters = 100 )
W_base_object |
An object that contains the assignments to begin with sorted by |
estimator |
"linear" for the covariate-adjusted linear regression estimator (default). |
q |
The tail criterion's quantile of MSE over z's. The default is 95%. |
skip_search_length |
In the exhaustive search, how many designs are skipped? Default is 1 for
full exhaustive search through all assignments provided for in |
dot_every_x_iters |
Print out a dot every this many iterations. The default is 100. Set to
|
A list containing the optimal design threshold, strategy, and other information.
Adam Kapelner
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_normality_assumed(W_base_obj, skip_search_length = 10) design ## End(Not run)
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_normality_assumed(W_base_obj, skip_search_length = 10) design ## End(Not run)
Finds the optimal rerandomization threshold based on a user-defined quantile and kurtosis based on an approximation of tail standard errors
optimal_rerandomization_tail_approx( W_base_object, estimator = "linear", q = 0.95, c_val = NULL, skip_search_length = 1, binary_search = FALSE, excess_kurtosis_z = 0, use_frob_norm_sq_unbiased_estimator = TRUE, frob_norm_sq_bias_correction_min_samples = 10, smoothing_degree = 1, smoothing_span = 0.1, dot_every_x_iters = 100 )
optimal_rerandomization_tail_approx( W_base_object, estimator = "linear", q = 0.95, c_val = NULL, skip_search_length = 1, binary_search = FALSE, excess_kurtosis_z = 0, use_frob_norm_sq_unbiased_estimator = TRUE, frob_norm_sq_bias_correction_min_samples = 10, smoothing_degree = 1, smoothing_span = 0.1, dot_every_x_iters = 100 )
W_base_object |
An object that contains the assignments to begin with sorted by imbalance. |
estimator |
"linear" for the covariate-adjusted linear regression estimator (default). |
q |
The tail criterion's quantile of MSE over z's. The default is 95%. |
c_val |
The c value used (see Equation 8 in the paper). The default is |
skip_search_length |
In the exhaustive search, how many designs are skipped? Default is 1 for
full exhaustive search through all assignments provided for in |
binary_search |
If |
excess_kurtosis_z |
An estimate of the excess kurtosis in the measure on z. Default is 0. |
use_frob_norm_sq_unbiased_estimator |
If |
frob_norm_sq_bias_correction_min_samples |
The bias-corrected estimate suffers from high variance when there
are not enough samples. Thus, we only implement
the correction beginning at this number of vectors. Default is 10 and
this parameter is only applicable if |
smoothing_degree |
The smoothing degree passed to |
smoothing_span |
The smoothing span passed to |
dot_every_x_iters |
Print out a dot every this many iterations. The default is 100. Set to
|
A list containing the optimal design threshold, strategy, and other information.
Adam Kapelner
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_tail_approx(W_base_obj, skip_search_length = 10) design ## End(Not run)
## Not run: n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) X = apply(X, 2, function(xj){(xj - mean(xj)) / sd(xj)}) S = 25000 W_base_obj = generate_W_base_and_sort(X, max_designs = S) design = optimal_rerandomization_tail_approx(W_base_obj, skip_search_length = 10) design ## End(Not run)
A tool to find the optimal rerandomization threshold in non-sequential experiments
Adam Kapelner [email protected]
Kapelner, A
optimal_rerandomization_obj
objectPlots a summary of a optimal_rerandomization_obj
object
## S3 method for class 'optimal_rerandomization_obj' plot(x, ...)
## S3 method for class 'optimal_rerandomization_obj' plot(x, ...)
x |
The |
... |
The option |
Adam Kapelner
W_base_object
objectPlots a summary of the imbalances in a W_base_object
object
## S3 method for class 'W_base_object' plot(x, ...)
## S3 method for class 'W_base_object' plot(x, ...)
x |
The |
... |
|
Adam Kapelner
optimal_rerandomization_obj
objectPrints a summary of a optimal_rerandomization_obj
object
## S3 method for class 'optimal_rerandomization_obj' print(x, ...)
## S3 method for class 'optimal_rerandomization_obj' print(x, ...)
x |
The |
... |
Other parameters to pass to the default print function |
Adam Kapelner
W_base_object
objectPrints a summary of a W_base_object
object
## S3 method for class 'W_base_object' print(x, ...)
## S3 method for class 'W_base_object' print(x, ...)
x |
The |
... |
Other parameters to pass to the default print function |
Adam Kapelner
optimal_rerandomization_obj
objectPrints a summary of a optimal_rerandomization_obj
object
## S3 method for class 'optimal_rerandomization_obj' summary(object, ...)
## S3 method for class 'optimal_rerandomization_obj' summary(object, ...)
object |
The |
... |
Other parameters to pass to the default summary function |
Adam Kapelner
W_base_object
objectPrints a summary of a W_base_object
object
## S3 method for class 'W_base_object' summary(object, ...)
## S3 method for class 'W_base_object' summary(object, ...)
object |
The |
... |
Other parameters to pass to the default summary function |
Adam Kapelner