| Title: | Bayesian Sparse Estimation of a Covariance Matrix |
|---|---|
| Description: | Bayesian estimations of a covariance matrix for multivariate normal data. Assumes that the covariance matrix is sparse or band matrix and positive-definite. Methods implemented include the beta-mixture shrinkage prior (Lee et al. (2022) <doi:10.1016/j.jmva.2022.105067>), screened beta-mixture prior (Lee et al. (2024) <doi:10.1214/24-BA1495>), and post-processed posteriors for banded and sparse covariances (Lee et al. (2023) <doi:10.1214/22-BA1333>; Lee and Lee (2023) <doi:10.1016/j.jeconom.2023.105475>). This software has been developed using funding supported by Basic Science Research Program through the National Research Foundation of Korea ('NRF') funded by the Ministry of Education ('RS-2023-00211979', 'NRF-2022R1A5A7033499', 'NRF-2020R1A4A1018207' and 'NRF-2020R1C1C1A01013338'). |
| Authors: | Kwangmin Lee [aut], Kyeongwon Lee [aut, cre], Kyoungjae Lee [aut], Seongil Jo [aut], Jaeyong Lee [ctb] |
| Maintainer: | Kyeongwon Lee <[email protected]> |
| License: | GPL-2 |
| Version: | 1.0.3 |
| Built: | 2026-05-11 07:17:20 UTC |
| Source: | https://github.com/statjs/bspcov |
Provides a post-processed posterior for Bayesian inference of a banded covariance matrix.
bandPPP(X, k, eps, prior = list(), nsample = 2000)bandPPP(X, k, eps, prior = list(), nsample = 2000)
X |
a n |
k |
a scalar value (natural number) specifying the bandwidth of covariance matrix. |
eps |
a small positive number decreasing to |
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
nsample |
a scalar value giving the number of the post-processed posterior samples. |
Lee, Lee, and Lee (2023+) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a banded covariance matrix:
Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior
where .
Post-processing step: Post-process the samples generated from the initial samples
where are the initial posterior samples,
is a small positive number decreasing to as ,
and denotes the -band operation given as
For more details, see Lee, Lee and Lee (2023+).
Sigma |
a nsample |
p |
dimension of covariance matrix. |
Kwangmin Lee
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
cv.bandPPP estimate
n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X,2,0.01,nsample=100)n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X,2,0.01,nsample=100)
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via the beta-mixture shrinkage prior.
bmspcov( X, Sigma, prior = list(), nsample = list(), nchain = 1, seed = NULL, do.parallel = FALSE, show_progress = TRUE )bmspcov( X, Sigma, prior = list(), nsample = list(), nchain = 1, seed = NULL, do.parallel = FALSE, show_progress = TRUE )
X |
a n |
Sigma |
an initial guess for Sigma. |
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
nsample |
a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
|
nchain |
number of MCMC chains to run. Default is 1. |
seed |
random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1. |
do.parallel |
logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured. |
show_progress |
logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel. |
Lee, Jo and Lee (2022) proposed the beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The beta-mixture shrinkage prior for is defined as
where is the unconstrained prior given by
For more details, see Lee, Jo and Lee (2022).
Sigma |
a nmc |
Phi |
a nmc |
p |
dimension of covariance matrix. |
mcmctime |
elapsed time for MCMC sampling. For multiple chains, this becomes a list. |
nchain |
number of chains used. |
burnin |
number of burn-in samples discarded. |
nmc |
number of MCMC samples retained for analysis. |
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
Lee, K., Jo, S., and Lee, J. (2022), "The beta-mixture shrinkage prior for sparse covariances with near-minimax posterior convergence rate", Journal of Multivariate Analysis, 192, 105067.
sbmspcov
set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X)))) post.est.m <- bspcov::estimate(fout) sqrt(mean((post.est.m - True.Sigma)^2)) sqrt(mean((cov(X) - True.Sigma)^2)) # Multiple chains example with parallel processing: # fout_multi <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))), # nchain = 4, do.parallel = TRUE) # post.est.multi <- bspcov::estimate(fout_multi) # When do.parallel = TRUE, the function automatically sets up # a multisession plan with nchain workers for parallel execution.set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X)))) post.est.m <- bspcov::estimate(fout) sqrt(mean((post.est.m - True.Sigma)^2)) sqrt(mean((cov(X) - True.Sigma)^2)) # Multiple chains example with parallel processing: # fout_multi <- bspcov::bmspcov(X = X, Sigma = diag(diag(cov(X))), # nchain = 4, do.parallel = TRUE) # post.est.multi <- bspcov::estimate(fout_multi) # When do.parallel = TRUE, the function automatically sets up # a multisession plan with nchain workers for parallel execution.
The colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
data.frame
http://genomics-pubs.princeton.edu/oncology/affydata/.
data("colon") head(colon)data("colon") head(colon)
Performs leave-one-out cross-validation (LOOCV) to calculate the predictive log-likelihood for a post-processed posterior of a banded covariance matrix and selects the optimal parameters.
cv.bandPPP(X, kvec, epsvec, prior = list(), nsample = 2000, ncores = 2)cv.bandPPP(X, kvec, epsvec, prior = list(), nsample = 2000, ncores = 2)
X |
a n |
kvec |
a vector of natural numbers specifying the bandwidth of covariance matrix. |
epsvec |
a vector of small positive numbers decreasing to |
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
nsample |
a scalar value giving the number of the post-processed posterior samples. |
ncores |
a scalar value giving the number of CPU cores. |
The predictive log-likelihood for each and is estimated as follows:
where is the ith observation, is the sth posterior sample based on , and represents the banding post-processing function.
For more details, see (3) in Lee, Lee and Lee (2023+).
elpd |
a M |
Kwangmin Lee
Lee, K., Lee, K., and Lee, J. (2023+), "Post-processes posteriors for banded covariances", Bayesian Analysis, DOI: 10.1214/22-BA1333.
Gelman, A., Hwang, J., and Vehtari, A. (2014). "Understanding predictive information criteria for Bayesian models." Statistics and computing, 24(6), 997-1016.
bandPPP
Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) kvec <- 1:2 epsvec <- c(0.01,0.05) res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4) plot(res)Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) kvec <- 1:2 epsvec <- c(0.01,0.05) res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4) plot(res)
Performs cross-validation to estimate spectral norm error for a post-processed posterior of a sparse covariance matrix.
cv.thresPPP( X, thresvec, epsvec, prior = NULL, thresfun = "hard", nsample = 2000, ncores = 2 )cv.thresPPP( X, thresvec, epsvec, prior = NULL, thresfun = "hard", nsample = 2000, ncores = 2 )
X |
a n |
thresvec |
a vector of real numbers specifying the parameter of the threshold function. |
epsvec |
a vector of small positive numbers decreasing to |
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
thresfun |
a string to specify the type of threshold function. |
nsample |
a scalar value giving the number of the post-processed posterior samples. |
ncores |
a scalar value giving the number of CPU cores. |
Given a set of train data and validation data, the spectral norm error for each and is estimated as follows:
where is the estimate for the covariance based on the train data and is the sample covariance matrix based on the validation data.
The spectral norm error is estimated by the -fold cross-validation.
For more details, see the first paragraph on page 9 in Lee and Lee (2023).
CVdf |
a M |
Kwangmin Lee
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics, 236(3), 105475.
thresPPP
Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) thresvec <- c(0.01,0.1) epsvec <- c(0.01,0.1) res <- bspcov::cv.thresPPP(X,thresvec,epsvec,nsample=100) plot(res)Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) thresvec <- c(0.01,0.1) epsvec <- c(0.01,0.1) res <- bspcov::cv.thresPPP(X,thresvec,epsvec,nsample=100) plot(res)
Compute the point estimate (mean) to describe posterior distribution. For multiple chains, combines all chains to compute a more robust estimate.
estimate(object, ...) ## S3 method for class 'bspcov' estimate(object, ...)estimate(object, ...) ## S3 method for class 'bspcov' estimate(object, ...)
object |
an object from bandPPP, bmspcov, sbmspcov, and thresPPP. |
... |
additional arguments for estimate. |
Sigma |
the point estimate (mean) of covariance matrix. For multiple chains, uses combined samples from all chains. |
Seongil Jo and Kyeongwon Lee
plot.postmean.bspcov
n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X,2,0.01,nsample=100) est <- bspcov::estimate(res)n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X,2,0.01,nsample=100) est <- bspcov::estimate(res)
Provides a trace plot of posterior samples and a plot of a learning curve for cross-validation. For multiple chains, each chain is displayed in a different color for convergence assessment.
## S3 method for class 'bspcov' plot(x, ..., cols, rows)## S3 method for class 'bspcov' plot(x, ..., cols, rows)
x |
an object from bmspcov, sbmspcov, cv.bandPPP, and cv.thresPPP. |
... |
additional arguments for ggplot2. |
cols |
a scalar or a vector including specific column indices for the trace plot. |
rows |
a scalar or a vector including specific row indices greater than or equal to columns indices for the trace plot. |
plot |
a plot for diagnostics of posterior samples x. For multiple chains, returns colored trace plots with each chain in a different color. |
Seongil Jo and Kyeongwon Lee
set.seed(1) n <- 100 p <- 20 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) plot(fout, cols = c(1, 3, 4), rows = c(1, 3, 4)) plot(fout, cols = 1, rows = 1:3) # Cross-Validation for Banded Structure Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) kvec <- 1:2 epsvec <- c(0.01,0.05) res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4) plot(res)set.seed(1) n <- 100 p <- 20 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) plot(fout, cols = c(1, 3, 4), rows = c(1, 3, 4)) plot(fout, cols = 1, rows = 1:3) # Cross-Validation for Banded Structure Sigma0 <- diag(1,50) X <- mvtnorm::rmvnorm(25,sigma = Sigma0) kvec <- 1:2 epsvec <- c(0.01,0.05) res <- bspcov::cv.bandPPP(X,kvec,epsvec,nsample=10,ncores=4) plot(res)
Create heatmap visualization for posterior mean estimate of sparse covariance matrix. Provides flexible visualization options with customizable aesthetics and labeling.
## S3 method for class 'postmean.bspcov' plot( x, title = NULL, color_limits = NULL, color_low = "black", color_high = "white", base_size = 14, legend_position = "bottom", x_label = "Variable", y_label = "Variable", show_values = FALSE, ... )## S3 method for class 'postmean.bspcov' plot( x, title = NULL, color_limits = NULL, color_low = "black", color_high = "white", base_size = 14, legend_position = "bottom", x_label = "Variable", y_label = "Variable", show_values = FALSE, ... )
x |
an object of class |
title |
character string for plot title. If NULL, auto-generated title is used. |
color_limits |
optional vector of length 2 specifying color scale limits. If NULL, computed from data. |
color_low |
color for low values in heatmap. Default is "black". |
color_high |
color for high values in heatmap. Default is "white". |
base_size |
base font size for plot theme. Default is 14. |
legend_position |
position of legend. Default is "bottom". |
x_label |
label for x-axis. Default is "Variable". |
y_label |
label for y-axis. Default is "Variable". |
show_values |
logical indicating whether to display values on tiles. Default is FALSE. |
... |
additional arguments passed to plotting functions. |
A ggplot object showing heatmap visualization of the posterior mean covariance matrix.
Seongil Jo, Kyeongwon Lee
estimate, plot.bspcov, plot.quantile.bspcov
# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100) est <- bspcov::estimate(res) # Basic plot plot(est) # Plot with custom color scheme plot(est, color_low = "blue", color_high = "red")# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100) est <- bspcov::estimate(res) # Basic plot plot(est) # Plot with custom color scheme plot(est, color_low = "blue", color_high = "red")
Create visualization plots for posterior quantiles of covariance matrices. Produces heatmap visualizations showing uncertainty across different quantile levels.
## S3 method for class 'quantile.bspcov' plot( x, type = "heatmap", titles = NULL, ncol = 3, color_limits = NULL, color_low = "black", color_high = "white", base_size = 14, legend_position = "bottom", x_label = "Variable", y_label = "Variable", width = NULL, height = 6, ... )## S3 method for class 'quantile.bspcov' plot( x, type = "heatmap", titles = NULL, ncol = 3, color_limits = NULL, color_low = "black", color_high = "white", base_size = 14, legend_position = "bottom", x_label = "Variable", y_label = "Variable", width = NULL, height = 6, ... )
x |
an object of class |
type |
character string specifying plot type. Options are "heatmap" (default), "uncertainty", or "comparison". |
titles |
character vector of titles for each quantile plot. If NULL, auto-generated titles are used. |
ncol |
number of columns in the plot layout. Default is 3. |
color_limits |
optional vector of length 2 specifying color scale limits. If NULL, computed from data. |
color_low |
color for low values in heatmap. Default is "black". |
color_high |
color for high values in heatmap. Default is "white". |
base_size |
base font size for plot theme. Default is 14. |
legend_position |
position of legend. Default is "bottom". |
x_label |
label for x-axis. Default is "Variable". |
y_label |
label for y-axis. Default is "Variable". |
width |
plot width when saving. Default is calculated based on number of quantiles. |
height |
plot height when saving. Default is 6. |
... |
additional arguments passed to plotting functions. |
A ggplot object (single quantile) or patchwork object (multiple quantiles) showing heatmap visualizations.
Kyeongwon Lee
quantile, plot.bspcov, plot.postmean.bspcov
# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100) quant <- quantile(res) # Plot uncertainty visualization plot(quant) # Plot with custom titles and labels plot(quant, titles = c("Lower Bound", "Median", "Upper Bound"), x_label = "Variable", y_label = "Variable")# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100) quant <- quantile(res) # Plot uncertainty visualization plot(quant) # Plot with custom titles and labels plot(quant, titles = c("Lower Bound", "Median", "Upper Bound"), x_label = "Variable", y_label = "Variable")
The proc_colon function preprocesses colon gene expression data by:
Log transforming the raw counts.
Performing two-sample t-tests for each gene between normal and tumor samples.
Selecting the top 50 genes by absolute t-statistic.
Returning the filtered expression matrix and sample indices/groups.
proc_colon(colon, tissues)proc_colon(colon, tissues)
colon |
A numeric matrix of raw colon gene expression values (genes × samples). Rows are genes; columns are samples. |
tissues |
A numeric vector indicating tissue type per sample: positive for normal, negative for tumor. |
A list with components:
A numeric matrix (samples x 50 genes) of selected, log‐transformed expression values.
Integer indices of normal‐tissue columns in the original data.
Integer indices of tumor‐tissue columns in the original data.
Integer vector of length ncol(colon), with 1 = normal, 2 = tumor.
data("colon") data("tissues") set.seed(1234) colon_data <- proc_colon(colon, tissues) X <- colon_data$X foo <- bmspcov(X, Sigma = cov(X)) sigmah <- estimate(foo)data("colon") data("tissues") set.seed(1234) colon_data <- proc_colon(colon, tissues) X <- colon_data$X foo <- bmspcov(X, Sigma = cov(X)) sigmah <- estimate(foo)
The proc_SP500 function preprocesses the SP500 stock data by calculating monthly
returns for selected sectors and generating idiosyncratic errors.
proc_SP500(SP500data, sectors)proc_SP500(SP500data, sectors)
SP500data |
A data frame containing SP500 stock data with columns including:
|
sectors |
A character vector specifying the sectors to include in the analysis. |
Calculates monthly returns for each stock in the specified sectors
Estimates the number of factors via hdbinseg::get.factor.model(ic="ah")
Uses POET::POET() to extract factor loadings/factors and form idiosyncratic errors
A list with components:
Uhat |
A matrix of idiosyncratic errors. |
Khat |
Estimated number of factors. |
factorparthat |
Estimated factor returns. |
sectornames |
Sector for each column of |
data("SP500") set.seed(1234) sectors <- c("Energy", "Financials", "Materials", "Real Estate", "Utilities", "Information Technology") Uhat <- proc_SP500(SP500, sectors)$Uhat PPPres <- thresPPP(Uhat, eps = 0, thres = list(value = 0.0020, fun = "hard"), nsample = 100) postmean <- estimate(PPPres) diag(postmean) <- 0 # hide color for diagonal plot(postmean)data("SP500") set.seed(1234) sectors <- c("Energy", "Financials", "Materials", "Real Estate", "Utilities", "Information Technology") Uhat <- proc_SP500(SP500, sectors)$Uhat PPPres <- thresPPP(Uhat, eps = 0, thres = list(value = 0.0020, fun = "hard"), nsample = 100) postmean <- estimate(PPPres) diag(postmean) <- 0 # hide color for diagonal plot(postmean)
Compute quantiles to describe posterior distribution. For multiple chains, combines all chains to compute more robust quantiles.
## S3 method for class 'bspcov' quantile(x, probs = c(0.025, 0.5, 0.975), ...)## S3 method for class 'bspcov' quantile(x, probs = c(0.025, 0.5, 0.975), ...)
x |
an object from bandPPP, bmspcov, sbmspcov, and thresPPP. |
probs |
numeric vector of probabilities with values in [0,1]. Default is c(0.025, 0.5, 0.975). |
... |
additional arguments for quantile. |
quantiles |
a list containing quantile matrices for each probability level. For multiple chains, uses combined samples from all chains. |
Kyeongwon Lee
estimate, plot.postmean.bspcov
# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample=100) quant <- quantile(res) # Get 95% credible intervals quant <- quantile(res, probs = c(0.025, 0.975))# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample=100) quant <- quantile(res) # Get 95% credible intervals quant <- quantile(res, probs = c(0.025, 0.975))
Convenience function to save quantile plots with appropriate dimensions.
save_quantile_plot(x, filename, width = NULL, height = 6, ...)save_quantile_plot(x, filename, width = NULL, height = 6, ...)
x |
an object of class |
filename |
filename to save the plot. |
width |
plot width. If NULL, calculated based on number of quantiles. |
height |
plot height. Default is 6. |
... |
additional arguments passed to |
# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100) quant <- quantile(res)# Example with simulated data n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::bandPPP(X, 2, 0.01, nsample = 100) quant <- quantile(res)
Provides a Bayesian sparse and positive definite estimate of a covariance matrix via screened beta-mixture prior.
sbmspcov( X, Sigma, cutoff = list(), prior = list(), nsample = list(), nchain = 1, seed = NULL, do.parallel = FALSE, show_progress = TRUE )sbmspcov( X, Sigma, cutoff = list(), prior = list(), nsample = list(), nchain = 1, seed = NULL, do.parallel = FALSE, show_progress = TRUE )
X |
a n |
Sigma |
an initial guess for Sigma. |
cutoff |
a list giving the information for the threshold.
The list includes the following parameters (with default values in parentheses):
|
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
nsample |
a list giving the MCMC parameters.
The list includes the following integers (with default values in parentheses):
|
nchain |
number of MCMC chains to run. Default is 1. |
seed |
random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1. |
do.parallel |
logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured. |
show_progress |
logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel. |
Lee, Jo, Lee, and Lee (2023+) proposed the screened beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix.
The screened beta-mixture shrinkage prior for is defined as
where is the unconstrained prior given by
where , are sample correlations, and is a threshold given by user.
For more details, see Lee, Jo, Lee and Lee (2022+).
Sigma |
a nmc |
p |
dimension of covariance matrix. |
Phi |
a nmc |
INDzero |
a list including indices of off-diagonal elements screened by sure screening. For multiple chains, this becomes a list of lists. |
cutoff |
the cutoff value specified by FNR-approach. For multiple chains, this becomes a list. |
mcmctime |
elapsed time for MCMC sampling. For multiple chains, this becomes a list. |
nchain |
number of chains used. |
burnin |
number of burn-in samples discarded. |
nmc |
number of MCMC samples retained for analysis. |
Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee
Lee, K., Jo, S., Lee, K., and Lee, J. (2023+), "Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior", arXiv:2206.12773.
bmspcov
set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) post.est.m <- bspcov::estimate(fout) sqrt(mean((post.est.m - True.Sigma)^2)) sqrt(mean((cov(X) - True.Sigma)^2)) # Multiple chains example with parallel processing: # fout_multi <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))), # nchain = 4, seed = 123, do.parallel = TRUE) # post.est.multi <- bspcov::estimate(fout_multi) # summary(fout_multi, cols = 1, rows = 1:3) # Shows MCMC diagnostics # plot(fout_multi, cols = 1, rows = 1:3) # Shows colored trace plots # When do.parallel = TRUE, the function automatically sets up # a multisession plan with nchain workers for parallel execution.set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) post.est.m <- bspcov::estimate(fout) sqrt(mean((post.est.m - True.Sigma)^2)) sqrt(mean((cov(X) - True.Sigma)^2)) # Multiple chains example with parallel processing: # fout_multi <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))), # nchain = 4, seed = 123, do.parallel = TRUE) # post.est.multi <- bspcov::estimate(fout_multi) # summary(fout_multi, cols = 1, rows = 1:3) # Shows MCMC diagnostics # plot(fout_multi, cols = 1, rows = 1:3) # Shows colored trace plots # When do.parallel = TRUE, the function automatically sets up # a multisession plan with nchain workers for parallel execution.
The S&P 500 dataset from State Street Global Advisors with the collection period from Jan 2013 to Nov 2023.
list
State Street Global Advisors.
data("SP500") names(SP500)data("SP500") names(SP500)
Provides the summary statistics for posterior samples of covariance matrix.
## S3 method for class 'bspcov' summary(object, cols, rows, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), ...)## S3 method for class 'bspcov' summary(object, cols, rows, quantiles = c(0.025, 0.25, 0.5, 0.75, 0.975), ...)
object |
an object from bandPPP, bmspcov, sbmspcov, and thresPPP. |
cols |
a scalar or a vector including specific column indices. |
rows |
a scalar or a vector including specific row indices greater than or equal to columns indices. |
quantiles |
a numeric vector of quantiles to compute. Default is c(0.025, 0.25, 0.5, 0.75, 0.975). |
... |
additional arguments for the summary function. |
summary |
a table of summary statistics including empirical mean, standard deviation, and quantiles for posterior samples. For multiple chains, also includes effective sample size (n_eff) and R-hat diagnostics. |
If both cols and rows are vectors, they must have the same length.
Seongil Jo and Kyeongwon Lee
# Example with simulated data set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) summary(fout, cols = c(1, 3, 4), rows = c(1, 3, 4)) summary(fout, cols = 1, rows = 1:p) # Custom quantiles: summary(fout, cols = 1, rows = 1:3, quantiles = c(0.05, 0.5, 0.95))# Example with simulated data set.seed(1) n <- 20 p <- 5 # generate a sparse covariance matrix: True.Sigma <- matrix(0, nrow = p, ncol = p) diag(True.Sigma) <- 1 Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8) nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1) zeroIND = (1:(p*(p-1)/2))[-nonzeroIND] Values[zeroIND] <- 0 True.Sigma[lower.tri(True.Sigma)] <- Values True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)] if(min(eigen(True.Sigma)$values) <= 0){ delta <- -min(eigen(True.Sigma)$values) + 1.0e-5 True.Sigma <- True.Sigma + delta*diag(p) } # generate a data X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma) # compute sparse, positive covariance estimator: fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X)))) summary(fout, cols = c(1, 3, 4), rows = c(1, 3, 4)) summary(fout, cols = 1, rows = 1:p) # Custom quantiles: summary(fout, cols = 1, rows = 1:3, quantiles = c(0.05, 0.5, 0.95))
Provides a post-processed posterior (PPP) for Bayesian inference of a sparse covariance matrix.
thresPPP(X, eps, thres = list(), prior = list(), nsample = 2000)thresPPP(X, eps, thres = list(), prior = list(), nsample = 2000)
X |
a n |
eps |
a small positive number decreasing to |
thres |
a list giving the information for thresholding PPP procedure.
The list includes the following parameters (with default values in parentheses):
|
prior |
a list giving the prior information.
The list includes the following parameters (with default values in parentheses):
|
nsample |
a scalar value giving the number of the post-processed posterior samples. |
Lee and Lee (2023) proposed a two-step procedure generating samples from the post-processed posterior for Bayesian inference of a sparse covariance matrix:
Initial posterior computing step: Generate random samples from the following initial posterior obtained by using the inverse-Wishart prior
where .
Post-processing step: Post-process the samples generated from the initial samples
where are the initial posterior samples,
is a positive real number, and denotes the generalized threshodling operator given as
where is the element of and is a generalized thresholding function.
For more details, see Lee and Lee (2023).
Sigma |
a nsample |
p |
dimension of covariance matrix. |
Kwangmin Lee
Lee, K. and Lee, J. (2023), "Post-processes posteriors for sparse covariances", Journal of Econometrics.
cv.thresPPP
n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100) est <- bspcov::estimate(res)n <- 25 p <- 50 Sigma0 <- diag(1, p) X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = Sigma0) res <- bspcov::thresPPP(X, eps=0.01, thres=list(value=0.5,fun='hard'), nsample=100) est <- bspcov::estimate(res)
The tissues data of colon cancer dataset, which includes gene expression values from 22 colon tumor tissues and 40 non-tumor tissues.
numeric
http://genomics-pubs.princeton.edu/oncology/affydata/.
data("tissues") head(tissues)data("tissues") head(tissues)