Skip to content

ivanuricardo/GraphicalLasso.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphicalLasso.jl

Build Status codecov

This package provides efficient tools for generating sparse covariance matrices, estimating sparse precision matrices using the graphical lasso (glasso) algorithm (Friedman, Hastie, and Tibshirani 2008; Meinshausen and Bühlmann 2006), and selecting optimal regularization parameters.

Key Features:

  • Sparse Covariance Matrix Generation: Generate random sparse covariance matrices with customizable sparsity thresholds.
  • Graphical Lasso (glasso) Implementation: Apply the glasso algorithm to estimate sparse precision matrices from empirical covariance matrices.
  • Extended Bayesian Information Criterion (EBIC): Calculate EBIC for model selection from Foygel and Drton (2010), supporting edge counting with thresholding.
  • Tuning Parameter Selection: Automatically select optimal regularization parameters for the glasso algorithm using EBIC.
  • Covariance Matrix Validation: Functions to check if a matrix is a valid covariance matrix (square, symmetric, positive semi-definite).

Installation

To install this package, first enter into Pkg mode by pressing ] in the Julia REPL, then run the following command:

pkg> add GraphicalLasso

There is also the option to install the development version of this package directly from the GitHub repository:

pkg> add https://github.com/ivanuricardo/GraphicalLasso.jl

Functions Included:

  • softthresh(x, λ): Applies soft thresholding to an array.
  • cdlasso(W11, s12, λ; max_iter=100, tol=1e-5): Solves the coordinate descent Lasso problem.
  • glasso(s, obs, λ; penalizediag=true, γ=0.0, tol=1e-05, verbose=true, maxiter=100, winit=zeros(size(s))): Estimates a sparse inverse covariance matrix using the glasso algorithm.
  • countedges(x, thr): Counts the number of edges in an array exceeding a threshold.
  • ebic(θ, ll, obs, thr, γ): Calculates the EBIC for a given precision matrix.
  • critfunc(s, θ, rho; penalizediag=true): Computes the objective function for the graphical lasso.
  • tuningselect(s, obs, λ; γ=0.0): Selects the optimal regularization parameter using EBIC.
  • randsparsecov(p, thr): Generates a random sparse covariance matrix.
  • iscov(x): Checks if a matrix is a valid covariance matrix.

Example

Here is an example of how to use this package to generate a sparse covariance matrix, apply the graphical lasso algorithm, and select the optimal tuning parameter:

using LinearAlgebra, GraphicalLasso, Random, Distributions
Random.seed!(123456)

# Generate true sparse covariance matrix
p = 20
thr = 0.5
Σ = randsparsecov(p, thr)

# Check if this is a valid covariance matrix
iscov(Σ)

# Generate data from the true covariance matrix, create sample covariance matrix
obs = 100
μ = zeros(p)
unstddf = rand(MvNormal(zeros(p), Σ), obs)
df = (unstddf .- mean(unstddf, dims=2)) ./ std(unstddf, dims=2)
s = df * df' / obs

# Select the optimal tuning parameter from a range
λvalues = 0.0:0.01:2.0
optimalλ = tuningselect(s, obs, λvalues, verbose=false)
println("Optimal λ: ", optimalλ)

# Apply the graphical lasso algorithm
result = glasso(s, obs, optimalλ)

# Extract results
W = result.W
θ = result.θ
ll = result.ll
bicval = result.bicval

println("Estimated Precision Matrix: ", θ)
println("Log-Likelihood: ", ll)
println("EBIC Value: ", bicval)

# Validate if the result is a valid covariance matrix
is_valid_cov = iscov(W)
println("Is the estimated matrix a valid covariance matrix? ", is_valid_cov)

Moreover, although not the main focus of this package, we also provide a method to compute the lasso solution via coordinate descent. We demonstrate this method below with a generated data set and a sparse response vector.

using LinearAlgebra, GraphicalLasso, Random, Plots
Random.seed!(123456)
N = 100
k = 100
kzeros = 90
X = randn(N, k)
beta = ones(k)
beta[1:kzeros] .= 0
betahat = zeros(k)
y = X * beta + randn(N)

λ = 10.0
cdlassobeta = cdlasso(X'X, X'y, λ)

# We expect the last column to be dense.
heatmap(reshape(cdlassobeta, 10, 10), yflip = true)

Contribution

We welcome contributions to improve the package. If you encounter any issues or have suggestions for new features, feel free to open an issue or submit a pull request.

References

  • Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
  • Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems (pp. 604-612).
  • Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436-1462.

License

This project is licensed under the MIT License - see the LICENSE file for details.