| Title: | Weighted Empirical Risk Minimization for Changepoint Regression |
|---|---|
| Description: | R interface to the 'weightederm' package for 'Python', which provides 'scikit-learn'-style estimators for offline change point regression (data segmentation) via weighted empirical risk minimization. Supports least-squares, Huber, and logistic losses with fixed or cross-validated numbers of change points. Wraps 'Python' via 'reticulate'. Arpino and Venkataramanan (2026) <doi:10.48550/arXiv.2604.11746>. |
| Authors: | Gabriel Arpino [aut, cre] |
| Maintainer: | Gabriel Arpino <[email protected]> |
| License: | Apache License (>= 2.0) |
| Version: | 0.1.0 |
| Built: | 2026-05-28 14:54:41 UTC |
| Source: | https://github.com/gabrielarpino/weightederm-r |
Returns the coefficient vector of the unweighted base-loss refit on the
last detected segment (the same coefficients used by predict()).
## S3 method for class 'werm_fit' coef(object, ...)## S3 method for class 'werm_fit' coef(object, ...)
object |
A |
... |
Ignored. |
Named numeric vector of length p.
Evaluates the unweighted base-loss model fitted on the last detected segment. For logistic estimators, returns class labels.
## S3 method for class 'werm_fit' predict(object, newdata, type = "class", ...)## S3 method for class 'werm_fit' predict(object, newdata, type = "class", ...)
object |
A |
newdata |
Numeric matrix of shape |
type |
Character. For logistic estimators: |
... |
Ignored. |
Numeric vector of length m (regression) or character vector /
probability matrix (logistic).
Summarise a fitted WERM model
## S3 method for class 'werm_fit' summary(object, ...)## S3 method for class 'werm_fit' summary(object, ...)
object |
A |
... |
Ignored. |
Invisibly returns object. Called for its printed side-effect.
A thin wrapper around reticulate::use_python() that must be called
before any werm_* function if the default Python does not have
weightederm installed.
weightederm_configure_python(python, required = TRUE)weightederm_configure_python(python, required = TRUE)
python |
Path to the Python binary or virtual-environment directory. |
required |
Passed to |
Invisibly returns NULL. Called for side effects only.
if (nzchar(Sys.which("python"))) { weightederm_configure_python(Sys.which("python"), required = FALSE) }if (nzchar(Sys.which("python"))) { weightederm_configure_python(Sys.which("python"), required = FALSE) }
Like werm_least_squares() but uses the Huber loss, which is more robust
to outliers than squared loss.
werm_huber( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, epsilon = 1.35, max_iter = 100L, tol = 1e-05, penalty = "none", alpha = 0 )werm_huber( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, epsilon = 1.35, max_iter = 100L, tol = 1e-05, penalty = "none", alpha = 0 )
X |
Numeric matrix of shape |
y |
Numeric vector of length |
num_chgpts |
Integer. Number of changepoints to detect. |
delta |
Integer. Minimum gap between candidate changepoints during
search. Default |
search_method |
Character. |
fit_intercept |
Logical. Whether each segment model includes an
intercept. Default |
epsilon |
Numeric. Huber transition parameter. Default |
max_iter |
Integer. Maximum L-BFGS-B iterations. Default |
tol |
Numeric. Gradient-norm tolerance. Default |
penalty |
Character. |
alpha |
Numeric. Penalty strength. Default |
An object of class c("werm_huber", "werm_fit").
Same elements as werm_least_squares().
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMHuber")) { set.seed(2) n <- 12L; p <- 1L; true_cp <- 6L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:true_cp, , drop = FALSE] %*% 3, X[(true_cp + 1L):n, , drop = FALSE] %*% -3 ) + rnorm(n, sd = 0.03) fit <- werm_huber(X, y, num_chgpts = 1L, delta = 1L, fit_intercept = FALSE, max_iter = 5L, tol = 1e-3) fit$changepoints }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMHuber")) { set.seed(2) n <- 12L; p <- 1L; true_cp <- 6L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:true_cp, , drop = FALSE] %*% 3, X[(true_cp + 1L):n, , drop = FALSE] %*% -3 ) + rnorm(n, sd = 0.03) fit <- werm_huber(X, y, num_chgpts = 1L, delta = 1L, fit_intercept = FALSE, max_iter = 5L, tol = 1e-3) fit$changepoints }
Fit a WERM changepoint model with Huber loss and CV selection
werm_huber_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, epsilon = 1.35, max_iter = 100L, tol = 1e-05, use_base_loss_for_cv = FALSE, penalty = "none", alpha = 0 )werm_huber_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, epsilon = 1.35, max_iter = 100L, tol = 1e-05, use_base_loss_for_cv = FALSE, penalty = "none", alpha = 0 )
X |
Numeric matrix of shape |
y |
Numeric vector of length |
max_num_chgpts |
Integer. Upper bound of the CV search grid. |
delta |
Integer. Minimum gap between candidate changepoints.
Default |
search_method |
Character. |
cv |
Integer. Number of interleaved folds. Default |
fit_intercept |
Logical. Default |
epsilon |
Numeric. Huber transition parameter. Default |
max_iter |
Integer. Maximum L-BFGS-B iterations. Default |
tol |
Numeric. Gradient-norm tolerance. Default |
use_base_loss_for_cv |
Logical. If |
penalty |
Character. Passed to the inner fixed model. Default |
alpha |
Numeric. Penalty strength. Default |
An object of class c("werm_huber_cv", "werm_fit").
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMHuberCV")) { set.seed(11) n <- 24L; p <- 2L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:8, ] %*% c(4, 0), X[9:16, ] %*% c(-4, 0), X[17:24, ] %*% c(4, 0) ) + rnorm(n, sd = 0.02) fit <- werm_huber_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 2L, fit_intercept = FALSE, max_iter = 20L) fit$best_num_chgpts }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMHuberCV")) { set.seed(11) n <- 24L; p <- 2L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:8, ] %*% c(4, 0), X[9:16, ] %*% c(-4, 0), X[17:24, ] %*% c(4, 0) ) + rnorm(n, sd = 0.02) fit <- werm_huber_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 2L, fit_intercept = FALSE, max_iter = 20L) fit$best_num_chgpts }
Detects num_chgpts changepoints in ordered regression data by minimising
a Weighted Empirical Risk with squared loss.
werm_least_squares( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, fit_solver = "direct", penalty = "none", alpha = 0 )werm_least_squares( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, fit_solver = "direct", penalty = "none", alpha = 0 )
X |
Numeric matrix of shape |
y |
Numeric vector of length |
num_chgpts |
Integer. Number of changepoints to detect. |
delta |
Integer. Minimum gap between candidate changepoints during
search. Default |
search_method |
Character. |
fit_intercept |
Logical. Whether each segment model includes an
intercept. Default |
fit_solver |
Character. |
penalty |
Character. |
alpha |
Numeric. Penalty strength. Default |
An object of class c("werm_least_squares", "werm_fit") with the
following named elements:
changepointsInteger vector of detected changepoints
(1-indexed, R convention). Length equals num_chgpts.
num_chgptsInteger. Number of detected changepoints.
num_signalsInteger. Number of segments (num_chgpts + 1).
objectiveNumeric. Minimised WERM objective value.
last_segment_coefNumeric vector. Coefficient of the
unweighted base-loss refit on the last segment (used by predict()).
last_segment_interceptNumeric or NULL.
signal_coefsMatrix (num_signals x p). Stage-1 WERM
coefficient estimates.
signal_interceptsNumeric vector or NULL.
n_features_inInteger. Number of features.
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLeastSquares")) { set.seed(1) n <- 24L; p <- 2L; true_cp <- 12L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:true_cp, ] %*% c(3, -1.5), X[(true_cp + 1L):n, ] %*% c(-3, 1.5) ) + rnorm(n, sd = 0.05) fit <- werm_least_squares(X, y, num_chgpts = 1L, delta = 3L, fit_intercept = FALSE) fit$changepoints }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLeastSquares")) { set.seed(1) n <- 24L; p <- 2L; true_cp <- 12L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:true_cp, ] %*% c(3, -1.5), X[(true_cp + 1L):n, ] %*% c(-3, 1.5) ) + rnorm(n, sd = 0.05) fit <- werm_least_squares(X, y, num_chgpts = 1L, delta = 3L, fit_intercept = FALSE) fit$changepoints }
Selects the number of changepoints from {0, …, max_num_chgpts} by
interleaved cross-validation, then refits on the full data.
werm_least_squares_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, use_base_loss_for_cv = FALSE, penalty = "none", alpha = 0 )werm_least_squares_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, use_base_loss_for_cv = FALSE, penalty = "none", alpha = 0 )
X |
Numeric matrix of shape |
y |
Numeric vector of length |
max_num_chgpts |
Integer. Upper bound of the CV search grid. |
delta |
Integer. Minimum gap between candidate changepoints.
Default |
search_method |
Character. |
cv |
Integer. Number of interleaved folds. Default |
fit_intercept |
Logical. Default |
use_base_loss_for_cv |
Logical. If |
penalty |
Character. Passed to the inner fixed model. Default |
alpha |
Numeric. Penalty strength. Default |
An object of class c("werm_least_squares_cv", "werm_fit") with all
elements from werm_least_squares() plus:
best_num_chgptsInteger. CV-selected number of changepoints.
best_scoreNumeric. Mean held-out score for the best model.
cv_resultsData frame with columns num_chgpts and
mean_test_score.
num_chgpts_gridInteger vector. Full CV search grid.
segment_boundsList of integer pairs [start, stop]
(1-indexed, half-open).
segment_coefsMatrix (num_signals x p).
segment_interceptsNumeric vector or NULL.
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLeastSquaresCV")) { set.seed(10) n <- 30L; p <- 2L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:10, ] %*% c(3.5, 0), X[11:20, ] %*% c(-3.5, 0), X[21:30, ] %*% c(3.5, 0) ) + rnorm(n, sd = 0.05) fit <- werm_least_squares_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 3L, fit_intercept = FALSE) fit$best_num_chgpts fit$changepoints fit$cv_results }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLeastSquaresCV")) { set.seed(10) n <- 30L; p <- 2L X <- matrix(rnorm(n * p), n, p) y <- c( X[1:10, ] %*% c(3.5, 0), X[11:20, ] %*% c(-3.5, 0), X[21:30, ] %*% c(3.5, 0) ) + rnorm(n, sd = 0.05) fit <- werm_least_squares_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 3L, fit_intercept = FALSE) fit$best_num_chgpts fit$changepoints fit$cv_results }
Detects num_chgpts changepoints in ordered binary classification data.
werm_logistic( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, max_iter = 100L, tol = 1e-05, penalty = "l2", alpha = 1 )werm_logistic( X, y, num_chgpts, delta = 1L, search_method = "efficient", fit_intercept = TRUE, max_iter = 100L, tol = 1e-05, penalty = "l2", alpha = 1 )
X |
Numeric matrix of shape |
y |
Integer or factor vector of binary labels (two unique values). |
num_chgpts |
Integer. Number of changepoints to detect. |
delta |
Integer. Minimum gap between candidate changepoints during
search. Default |
search_method |
Character. |
fit_intercept |
Logical. Whether each segment model includes an
intercept. Default |
max_iter |
Integer. Maximum L-BFGS-B iterations. Default |
tol |
Numeric. Gradient-norm tolerance. Default |
penalty |
Character. Default |
alpha |
Numeric. Default |
An object of class c("werm_logistic", "werm_fit").
Contains all elements from werm_least_squares() plus:
classesCharacter vector of length 2. The two class labels in
sorted order. classes[2] is the positive class.
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLogistic")) { set.seed(3) n <- 30L; p <- 2L; true_cp <- 15L X <- matrix(rnorm(n * p), n, p) eta <- c( X[1:true_cp, ] %*% c(3, -3), X[(true_cp + 1L):n, ] %*% c(-3, 3) ) y <- rbinom(n, 1L, 1 / (1 + exp(-eta))) fit <- werm_logistic(X, y, num_chgpts = 1L, delta = 3L, fit_intercept = FALSE, max_iter = 100L) fit$changepoints fit$classes }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLogistic")) { set.seed(3) n <- 30L; p <- 2L; true_cp <- 15L X <- matrix(rnorm(n * p), n, p) eta <- c( X[1:true_cp, ] %*% c(3, -3), X[(true_cp + 1L):n, ] %*% c(-3, 3) ) y <- rbinom(n, 1L, 1 / (1 + exp(-eta))) fit <- werm_logistic(X, y, num_chgpts = 1L, delta = 3L, fit_intercept = FALSE, max_iter = 100L) fit$changepoints fit$classes }
Fit a WERM changepoint model with logistic loss and CV selection
werm_logistic_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, max_iter = 100L, tol = 1e-05, use_base_loss_for_cv = FALSE, penalty = "l2", alpha = 1 )werm_logistic_cv( X, y, max_num_chgpts, delta = 1L, search_method = "efficient", cv = 5L, fit_intercept = TRUE, max_iter = 100L, tol = 1e-05, use_base_loss_for_cv = FALSE, penalty = "l2", alpha = 1 )
X |
Numeric matrix of shape |
y |
Integer or factor vector of binary labels. |
max_num_chgpts |
Integer. Upper bound of the CV search grid. |
delta |
Integer. Minimum gap between candidate changepoints.
Default |
search_method |
Character. |
cv |
Integer. Number of interleaved folds. Default |
fit_intercept |
Logical. Default |
max_iter |
Integer. Maximum L-BFGS-B iterations. Default |
tol |
Numeric. Gradient-norm tolerance. Default |
use_base_loss_for_cv |
Logical. If |
penalty |
Character. Default |
alpha |
Numeric. Default |
An object of class c("werm_logistic_cv", "werm_fit").
Contains all CV elements plus classes (character vector of length 2).
# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLogisticCV")) { set.seed(12) n <- 30L; p <- 2L X <- matrix(rnorm(n * p), n, p) eta <- c( X[1:10, ] %*% c(3, -3), X[11:20, ] %*% c(-3, 3), X[21:30, ] %*% c(3, -3) ) y <- rbinom(n, 1L, 1 / (1 + exp(-eta))) fit <- werm_logistic_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 3L, fit_intercept = FALSE, max_iter = 100L) fit$best_num_chgpts fit$changepoints }# Limit BLAS/OpenMP threads so example CPU time stays proportional to # elapsed time on multicore CRAN machines. Sys.setenv( OMP_NUM_THREADS = "1", OPENBLAS_NUM_THREADS = "1", MKL_NUM_THREADS = "1", BLAS_NUM_THREADS = "1" ) if (nzchar(Sys.getenv("RETICULATE_PYTHON")) && weightederm:::.weightederm_examples_available("WERMLogisticCV")) { set.seed(12) n <- 30L; p <- 2L X <- matrix(rnorm(n * p), n, p) eta <- c( X[1:10, ] %*% c(3, -3), X[11:20, ] %*% c(-3, 3), X[21:30, ] %*% c(3, -3) ) y <- rbinom(n, 1L, 1 / (1 + exp(-eta))) fit <- werm_logistic_cv(X, y, max_num_chgpts = 2L, cv = 3L, delta = 3L, fit_intercept = FALSE, max_iter = 100L) fit$best_num_chgpts fit$changepoints }