Title: | Fast Logistic Regression Wrapper |
---|---|
Description: | Provides very fast logistic regression with coefficient inferences plus other useful methods such as a forward stepwise model generator (see the benchmarks by visiting the github page at the URL below). The inputs are flexible enough to accomodate GPU computations. The coefficient estimation employs the fastLR() method in the 'RcppNumerical' package by Yixuan Qiu et al. This package allows their work to be more useful to a wider community that consumes inference. |
Authors: | Adam Kapelner [aut, cre] , Beau Walker [rev, dtc] , Gabriel Mayer [fnd, dtc] |
Maintainer: | Adam Kapelner <[email protected]> |
License: | GPL-3 |
Version: | 1.2.1 |
Built: | 2025-01-06 03:16:17 UTC |
Source: | https://github.com/kapelner/fastlogisticregressionwrap |
Given a set of desired proportions of predicted outcomes, what is the error rate for each of those models?
asymmetric_cost_explorer( phat, ybin, steps = seq(from = 0.001, to = 0.999, by = 0.001), outcome_of_analysis = 0, proportions_desired = seq(from = 0.1, to = 0.9, by = 0.1), proportion_tolerance = 0.01 )
asymmetric_cost_explorer( phat, ybin, steps = seq(from = 0.001, to = 0.999, by = 0.001), outcome_of_analysis = 0, proportions_desired = seq(from = 0.1, to = 0.9, by = 0.1), proportion_tolerance = 0.01 )
phat |
The vector of probability estimates to be thresholded to make a binary decision |
ybin |
The true binary responses |
steps |
All possibile thresholds which must be a vector of numbers in (0, 1). Default is |
outcome_of_analysis |
Which class do you care about performance? Either 0 or 1 for the negative class or positive class. Default is |
proportions_desired |
Which proportions of |
proportion_tolerance |
If the model cannot match the proportion_desired within this amount, it does not return that model's performance. Default is |
K_folds |
If not |
A table with column 1: proportions_desired
, column 2: actual proportions (as close as possible), column 3: error rate, column 4: probability threshold.
Adam Kapelner
Given a set of desired proportions of predicted outcomes, what is the error rate for each of those models?
asymmetric_cost_explorer_cross_validated(phat, ybin, K_CV = 5, ...)
asymmetric_cost_explorer_cross_validated(phat, ybin, K_CV = 5, ...)
phat |
The vector of probability estimates to be thresholded to make a binary decision |
ybin |
The true binary responses |
K_CV |
We wish to fit the |
... |
Other parameters to be passed into the |
A table with column 1: proportions_desired
, column 2: actual proportions (as close as possible), column 3: error rate, column 4: probability threshold.
Adam Kapelner
Provides a binary confusion table and error metrics
confusion_results(yhat, ybin, skip_argument_checks = FALSE)
confusion_results(yhat, ybin, skip_argument_checks = FALSE)
yhat |
The binary predictions |
ybin |
The true binary responses |
skip_argument_checks |
If |
A list of raw results
library(MASS); data(Pima.te) ybin = as.numeric(Pima.te$type == "Yes") flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = ybin ) phat = predict(flr, model.matrix(~ . - type, Pima.te)) confusion_results(phat > 0.5, ybin)
library(MASS); data(Pima.te) ybin = as.numeric(Pima.te$type == "Yes") flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = ybin ) phat = predict(flr, model.matrix(~ . - type, Pima.te)) confusion_results(phat > 0.5, ybin)
Via the eigen package's conjugate gradient descent algorithm.
eigen_compute_single_entry_of_diagonal_matrix(M, j, num_cores = 1)
eigen_compute_single_entry_of_diagonal_matrix(M, j, num_cores = 1)
M |
The symmetric matrix which to invert (and then extract one element of its diagonal) |
j |
The diagonal entry of |
num_cores |
The number of cores to use. Default is 1. |
The value of m^-1_j,j
Adam Kapelner
n = 500 X = matrix(rnorm(n^2), nrow = n, ncol = n) M = t(X) %*% X j = 137 eigen_compute_single_entry_of_diagonal_matrix(M, j) solve(M)[j, j] #to ensure it's the same value
n = 500 X = matrix(rnorm(n^2), nrow = n, ncol = n) M = t(X) %*% X j = 137 eigen_compute_single_entry_of_diagonal_matrix(M, j) solve(M)[j, j] #to ensure it's the same value
Via the eigen package
eigen_det(X, num_cores = 1)
eigen_det(X, num_cores = 1)
X |
A numeric matrix of size p x p |
num_cores |
The number of cores to use. Unless p is large, keep to the default of 1. |
The determinant as a scalar numeric value
p = 30 eigen_det(matrix(rnorm(p^2), nrow = p))
p = 30 eigen_det(matrix(rnorm(p^2), nrow = p))
Via the eigen package
eigen_inv(X, num_cores = 1)
eigen_inv(X, num_cores = 1)
X |
A numeric matrix of size p x p |
num_cores |
The number of cores to use. Unless p is large, keep to the default of 1. |
The resulting matrix
p = 10 eigen_inv(matrix(rnorm(p^2), nrow = p))
p = 10 eigen_inv(matrix(rnorm(p^2), nrow = p))
Via the eigen package
eigen_Xt_times_diag_w_times_X(X, w, num_cores = 1)
eigen_Xt_times_diag_w_times_X(X, w, num_cores = 1)
X |
A numeric matrix of size n x p |
w |
A numeric vector of length p |
num_cores |
The number of cores to use. Unless p is large, keep to the default of 1. |
The resulting matrix
n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) w = rnorm(p) eigen_Xt_times_diag_w_times_X(t(X), w)
n = 100 p = 10 X = matrix(rnorm(n * p), nrow = n, ncol = p) w = rnorm(p) eigen_Xt_times_diag_w_times_X(t(X), w)
Returns most of what you get from glm
fast_logistic_regression( Xmm, ybin, drop_collinear_variables = FALSE, lm_fit_tol = 1e-07, do_inference_on_var = "none", Xt_times_diag_w_times_X_fun = NULL, sqrt_diag_matrix_inverse_fun = NULL, num_cores = 1, ... )
fast_logistic_regression( Xmm, ybin, drop_collinear_variables = FALSE, lm_fit_tol = 1e-07, do_inference_on_var = "none", Xt_times_diag_w_times_X_fun = NULL, sqrt_diag_matrix_inverse_fun = NULL, num_cores = 1, ... )
Xmm |
The model.matrix for X (you need to create this yourself before) |
ybin |
The binary response vector |
drop_collinear_variables |
Should we drop perfectly collinear variables? Default is |
lm_fit_tol |
When |
do_inference_on_var |
Which variables should we compute approximate standard errors of the coefficients and approximate p-values for the test of
no linear log-odds probability effect? Default is |
Xt_times_diag_w_times_X_fun |
A custom function whose arguments are |
sqrt_diag_matrix_inverse_fun |
A custom function that returns a numeric vector which is square root of the diagonal of the inverse of the inputted matrix. Its arguments are |
num_cores |
Number of cores to use to speed up matrix multiplication and matrix inversion (used only during inference computation). Default is 1.
Unless the number of variables, i.e. |
... |
Other arguments to be passed to |
A list of raw results
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") )
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") )
Roughly duplicates the following glm
-style code:
fast_logistic_regression_stepwise_forward( Xmm, ybin, mode = "aic", pval_threshold = 0.05, use_intercept = TRUE, verbose = TRUE, drop_collinear_variables = FALSE, lm_fit_tol = 1e-07, ... )
fast_logistic_regression_stepwise_forward( Xmm, ybin, mode = "aic", pval_threshold = 0.05, use_intercept = TRUE, verbose = TRUE, drop_collinear_variables = FALSE, lm_fit_tol = 1e-07, ... )
Xmm |
The model.matrix for X (you need to create this yourself before). |
ybin |
The binary response vector. |
mode |
"aic" (default, fast) or "pval" (slow, but possibly yields a better model). |
pval_threshold |
The significance threshold to include a new variable. Default is |
use_intercept |
Should we automatically begin with an intercept? Default is |
verbose |
Print out messages during the loop? Default is |
drop_collinear_variables |
Parameter used in |
lm_fit_tol |
Parameter used in |
... |
Other arguments to be passed to |
nullmod = glm(ybin ~ 0, data.frame(Xmm), family = binomial)
fullmod = glm(ybin ~ 0 + ., data.frame(Xmm), family = binomial)
forwards = step(nullmod, scope = list(lower = formula(nullmod), upper = formula(fullmod)), direction = "forward", trace = 0)
A list of raw results
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") )
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") )
A tool to find many types of a priori experimental designs
Adam Kapelner [email protected]
Kapelner, A
Provides a confusion table and error metrics for general factor vectors. There is no need for the same levels in the two vectors.
general_confusion_results(yhat, yfac, proportions_scaled_by_column = FALSE)
general_confusion_results(yhat, yfac, proportions_scaled_by_column = FALSE)
yhat |
The factor predictions |
yfac |
The true factor responses |
proportions_scaled_by_column |
When returning the proportion table, scale by column? Default is |
A list of raw results
library(MASS); data(Pima.te) ybin = as.numeric(Pima.te$type == "Yes") flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = ybin ) phat = predict(flr, model.matrix(~ . - type, Pima.te)) yhat = array(NA, length(ybin)) yhat[phat <= 1/3] = "no" yhat[phat >= 2/3] = "yes" yhat[is.na(yhat)] = "maybe" general_confusion_results(factor(yhat, levels = c("no", "yes", "maybe")), factor(ybin)) #you want the "no" to align with 0, the "yes" to align with 1 and the "maybe" to be #last to align with nothing
library(MASS); data(Pima.te) ybin = as.numeric(Pima.te$type == "Yes") flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = ybin ) phat = predict(flr, model.matrix(~ . - type, Pima.te)) yhat = array(NA, length(ybin)) yhat[phat <= 1/3] = "no" yhat[phat >= 2/3] = "yes" yhat[is.na(yhat)] = "maybe" general_confusion_results(factor(yhat, levels = c("no", "yes", "maybe")), factor(ybin)) #you want the "no" to align with 0, the "yes" to align with 1 and the "maybe" to be #last to align with nothing
Predicts returning p-hats
## S3 method for class 'fast_logistic_regression' predict(object, newdata, type = "response", ...)
## S3 method for class 'fast_logistic_regression' predict(object, newdata, type = "response", ...)
object |
The object built using the |
newdata |
A matrix of observations where you wish to predict the binary response. |
type |
The type of prediction required. The default is |
... |
Further arguments passed to or from other methods |
A numeric vector of length nrow(newdata)
of estimates of P(Y = 1) for each unit in newdata
.
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") ) phat = predict(flr, model.matrix(~ . - type, Pima.te))
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") ) phat = predict(flr, model.matrix(~ . - type, Pima.te))
Predicts returning p-hats
## S3 method for class 'fast_logistic_regression_stepwise' predict(object, newdata, type = "response", ...)
## S3 method for class 'fast_logistic_regression_stepwise' predict(object, newdata, type = "response", ...)
object |
The object built using the |
newdata |
A matrix of observations where you wish to predict the binary response. |
type |
The type of prediction required. The default is |
... |
Further arguments passed to or from other methods |
A numeric vector of length nrow(newdata)
of estimates of P(Y = 1) for each unit in newdata
.
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") ) phat = predict(flr, model.matrix(~ . - type, Pima.te))
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes") ) phat = predict(flr, model.matrix(~ . - type, Pima.te))
Returns the summary table a la glm
## S3 method for class 'fast_logistic_regression' print(x, ...)
## S3 method for class 'fast_logistic_regression' print(x, ...)
x |
The object built using the |
... |
Other arguments to be passed to print |
The summary as a data.frame
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) print(flr)
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) print(flr)
Returns the summary table a la glm
## S3 method for class 'fast_logistic_regression_stepwise' print(x, ...)
## S3 method for class 'fast_logistic_regression_stepwise' print(x, ...)
x |
The object built using the |
... |
Other arguments to be passed to print |
The summary as a data.frame
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) print(flr)
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) print(flr)
Returns the summary table a la glm
## S3 method for class 'fast_logistic_regression' summary(object, alpha_order = TRUE, ...)
## S3 method for class 'fast_logistic_regression' summary(object, alpha_order = TRUE, ...)
object |
The object built using the |
alpha_order |
Should the coefficients be ordered in alphabetical order? Default is |
... |
Other arguments to be passed to |
The summary as a data.frame
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) summary(flr)
library(MASS); data(Pima.te) flr = fast_logistic_regression( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) summary(flr)
Returns the summary table a la glm
## S3 method for class 'fast_logistic_regression_stepwise' summary(object, ...)
## S3 method for class 'fast_logistic_regression_stepwise' summary(object, ...)
object |
The object built using the |
... |
Other arguments to be passed to |
The summary as a data.frame
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) summary(flr)
library(MASS); data(Pima.te) flr = fast_logistic_regression_stepwise_forward( Xmm = model.matrix(~ . - type, Pima.te), ybin = as.numeric(Pima.te$type == "Yes")) summary(flr)