sparseGFM: Sparse Generalized Factor Models with Multiple Penalty Functions

Overview

The sparseGFM package provides methods for fitting sparse generalized factor models with various penalty functions. The package is designed to handle high-dimensional data and can adapt to weak factor scenarios, making it suitable for a wide range of applications in statistics and machine learning.

The package is now available on CRAN under the name sparseGFM.

Installation

The package is now available on CRAN under the name sparseGFM.
On GitHub, the development version is hosted under the repository name sparseGFM.

From CRAN (stable version)

install.packages("sparseGFM")

# Load the package
library(sparseGFM)

From GitHub (development version)

You can also install the development version of sparseGFM from GitHub:

# Install devtools if you haven't already
install.packages("devtools")

# Install sparseGFM from GitHub
devtools::install_github("zjwang1013/sparseGFM")

# Load the package
library(sparseGFM)

Key Features

Sparse Loading Matrix Estimation: Efficiently estimates row-sparse loading matrices
Multiple Penalty Functions: Supports lasso, adaptive group lasso (aglasso), and other penalties
Weak Factor Adaptation: Automatically adapts to scenarios with weak factor structures
Cross-Validation: Built-in cross-validation for optimal parameter selection
Determine the Number of Factors: Automatic selection of the optimal number of factors using multiple information criteria

Functions Overview

Core Functions

sparseGFM(): Main function for fitting sparse generalized factor models
cv.sparseGFM(): Cross-validation for lambda selection
facnum.sparseGFM(): Information criterion-based selection of factor number

Supporting Functions

add_identifiability(): Apply identifiability constraints to factor/loading matrices
evaluate_performance(): Evaluate variable selection performance metrics

Penalty Functions

The package implements 12 different penalty functions:

Penalty	Description	Adaptive Version
`lasso`	L1 penalty	`alasso`
`SCAD`	Smoothly Clipped Absolute Deviation	`agSCAD`
`MCP`	Minimax Concave Penalty	`agMCP`
`group`/`glasso`	Group Lasso	`agroup`/`aglasso`
`gSCAD`	Group SCAD	`agSCAD`
`gMCP`	Group MCP	`agMCP`

Missing Data

The package automatically handles missing values.

Quick Start

Load the Package

library(sparseGFM)
set.seed(123)

Data Generation with Sparse Loading Matrix

# Parameters
n <- 200          # number of observations
p <- 200          # number of variables
a_param <- 0.9    # sparsity parameter
q <- 2            # number of factors

# Generate sparse structure
s <- ceiling(p^a_param)  # number of non-zero loadings

# Generate factor matrix
FF <- matrix(runif(n * q, min = -3, max = 3), nrow = n, ncol = q)

# Generate sparse loading matrix (row-sparse)
BB <- rbind(matrix(runif(s * q, min = 1, max = 2), nrow = s, ncol = q),
            matrix(0, nrow = (p - s), ncol = q))

# Generate intercepts
alpha_true <- runif(p, min = -1, max = 1)

# Add identifiability constraints
ident_res <- add_identifiability(FF, BB, alpha_true)
FF0 <- ident_res$H
BB0 <- ident_res$B
alpha0 <- ident_res$mu

# Generate data matrix
mat_para <- FF0 %*% t(BB0) + as.matrix(rep(1, n)) %*% t(as.matrix(alpha0))
x <- matrix(nrow = n, ncol = p)
for (i in 1:n) {
  for (j in 1:p) {
    x[i, j] <- rnorm(1, mean = mat_para[i, j])
  }
}

# True variable selection indicator
index_true <- numeric(p)
if (s > 0 && s <= p) {
  index_true[1:s] <- 1
}

Examples

Example 1: Cross-Validation with Adaptive Group Lasso

# Perform cross-validation to select optimal lambda
cv_result <- cv.sparseGFM(x,
                          type = "continuous",
                          q = 2,
                          penalty = "aglasso",
                          C = 5,
                          lambda_range = seq(0.1, 1, by = 0.1),
                          verbose = FALSE)

# Extract optimal model
optimal_model <- cv_result$optimal_model
BB_hat <- optimal_model$BB_hat
FF_hat <- optimal_model$FF_hat
alpha_hat <- optimal_model$alpha_hat

# Evaluate model performance
mat_para_hat <- FF_hat %*% t(BB_hat) + as.matrix(rep(1, n)) %*% t(as.matrix(alpha_hat))
relative_error <- base::norm((mat_para_hat - mat_para), type = "F") / base::norm(mat_para, type = "F")

print(paste("Optimal lambda:", cv_result$optimal_lambda))
print(paste("Relative estimation error:", round(relative_error, 4)))

# Variable selection performance
index_pred <- rep(1, p)
index_pred[optimal_model$index] <- 0
performance <- evaluate_performance(index_true, index_pred)
print(performance)

Example 2: Direct Model Fitting

# Fit sparse GFM with fixed lambda
result <- sparseGFM(x, 
                    type = "continuous", 
                    q = 2,
                    penalty = "aglasso",
                    lambda = 0.1,
                    C = 5)

# Extract results
BB_direct <- result$BB_hat
FF_direct <- result$FF_hat
alpha_direct <- result$alpha_hat

# View selected variables
selected_vars <- setdiff(1:p, result$index)
print(paste("Number of selected variables:", length(selected_vars)))
print(paste("Number of zero loadings:", length(result$index)))

# Calculate space angles for evaluation
BB_vcc <- eval.space(BB_direct, BB0)[1]
BB_tcc <- eval.space(BB_direct, BB0)[2]
FF_vcc <- eval.space(FF_direct, FF0)[1] 
FF_tcc <- eval.space(FF_direct, FF0)[2]

print(paste("Loading matrix VCC:", round(BB_vcc, 4)))
print(paste("Loading matrix TCC:", round(BB_tcc, 4)))

Example 3: Determine the Number of Factors

# Select optimal number of factors using multiple criteria
facnum_result <- facnum.sparseGFM(x,
                                  type = "continuous",
                                  q_range = 1:5,
                                  penalty = "aglasso",
                                  lambda_range = c(0.1),
                                  sic_type = "sic1",
                                  C = 6,
                                  verbose = FALSE)

# Extract information criteria results
df_dd <- facnum_result$df_dd
df_as <- facnum_result$df_as

# Get optimal factor numbers from different criteria
optimal_q_sic1 <- facnum_result$optimal_q
optimal_q_sic2 <- which.min(facnum_result$sic2)
optimal_q_sic3 <- which.min(facnum_result$sic3)
optimal_q_sic4 <- which.min(facnum_result$sic4)

print("Optimal number of factors:")
print(paste("SIC1:", optimal_q_sic1))
print(paste("SIC2:", optimal_q_sic2))
print(paste("SIC3:", optimal_q_sic3))
print(paste("SIC4:", optimal_q_sic4))

# Plot information criteria (if plotting functions are available)
plot(1:5, facnum_result$sic1, type = "b", col = "red", 
     xlab = "Number of Factors", ylab = "Information Criterion",
     main = "Determine the Number of Factors")
lines(1:5, facnum_result$sic2, type = "b", col = "blue")
lines(1:5, facnum_result$sic3, type = "b", col = "green")
lines(1:5, facnum_result$sic4, type = "b", col = "purple")
legend("topright", c("SIC1", "SIC2", "SIC3", "SIC4"), 
       col = c("red", "blue", "green", "purple"), lty = 1)

Parameters

Main Parameters

x: Data matrix (n × p)
type: Data type ("continuous" for Gaussian data)
q: Number of factors
penalty: Penalty type ("lasso", "aglasso", etc.)
lambda: Regularization parameter
C: Constraint constant

Cross-Validation Parameters

lambda_range: Range of lambda values for cross-validation
verbose: Whether to print progress information

Determine the Number of Factors Parameters

q_range: Range of factor numbers to consider
sic_type: Type of information criterion for primary selection

Methodological Details

The sparseGFM algorithm employs:

Initialization: Using the GFM package for initial estimates
Alternating minimization: Iteratively updating factors (F) and loadings (B)
Sparsity induction: Applying various penalty functions to achieve variable selection
Identifiability: Ensuring unique solutions through SVD-based constraints
Convergence: Monitoring objective function changes until convergence

Model Characteristics

The sparseGFM package is specifically designed to handle:

High-dimensional data where the number of variables may exceed the number of observations
Sparse loading structures with row-wise sparsity patterns
Weak factor scenarios where factors have relatively small eigenvalues
Flexible penalty structures including adaptive penalties that can provide better variable selection

The adaptive group lasso (aglasso) penalty is particularly effective for row-sparse loading matrices, as it can select entire rows (variables) rather than individual elements.

Dependencies

R (≥ 3.5.0)
GFM
MASS
irlba
stats

Bug Reports and Issues

Please report any bugs or issues on the GitHub Issues page. When reporting, include:

A clear description of the issue
Reproducible code example
Your session information (sessionInfo())
Any relevant error messages

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss proposed modifications.

References

For more detailed information about the methodology and theoretical properties, please refer to the associated research papers and documentation. The related papers are currently under review; we will update the information as soon as they are accepted and published.

License

This package is licensed under GPL-3. See the LICENSE file for details.

Issues and Support

For bug reports, feature requests, or questions, please visit the GitHub repository issues page.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
R		R
man		man
sparseGFM		sparseGFM
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md
sparseGFM.Rproj		sparseGFM.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sparseGFM: Sparse Generalized Factor Models with Multiple Penalty Functions

Overview

Installation

From CRAN (stable version)

From GitHub (development version)

Key Features

Functions Overview

Core Functions

Supporting Functions

Penalty Functions

Missing Data

Quick Start

Load the Package

Data Generation with Sparse Loading Matrix

Examples

Example 1: Cross-Validation with Adaptive Group Lasso

Example 2: Direct Model Fitting

Example 3: Determine the Number of Factors

Parameters

Main Parameters

Cross-Validation Parameters

Determine the Number of Factors Parameters

Methodological Details

Model Characteristics

Dependencies

Bug Reports and Issues

Contributing

References

License

Issues and Support

About

Uh oh!

Releases 1

Packages

Languages

zjwang1013/sparseGFM

Folders and files

Latest commit

History

Repository files navigation

sparseGFM: Sparse Generalized Factor Models with Multiple Penalty Functions

Overview

Installation

From CRAN (stable version)

From GitHub (development version)

Key Features

Functions Overview

Core Functions

Supporting Functions

Penalty Functions

Missing Data

Quick Start

Load the Package

Data Generation with Sparse Loading Matrix

Examples

Example 1: Cross-Validation with Adaptive Group Lasso

Example 2: Direct Model Fitting

Example 3: Determine the Number of Factors

Parameters

Main Parameters

Cross-Validation Parameters

Determine the Number of Factors Parameters

Methodological Details

Model Characteristics

Dependencies

Bug Reports and Issues

Contributing

References

License

Issues and Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages