Core Functions

For more detailed documentation see the R package documentation (PDF). Note that function signatures and exposed functions are equivalent to the R version.

Bayesian Methods

BayesianFactorZoo.BayesianFMFunction
BayesianFM(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int)

Bayesian Fama-MacBeth regression. Similar to BayesianSDF but estimates factors' risk premia rather than risk prices.

Arguments

  • f: Matrix of factors with dimension $t \times k$, where $k$ is the number of factors and $t$ is the number of periods
  • R: Matrix of test assets with dimension $t \times N$, where $t$ is the number of periods and $N$ is the number of test assets
  • sim_length: Length of MCMCs

Details

Unlike BayesianSDF, we use factor loadings, $\beta_f$, instead of covariance exposures, $C_f$, in the Fama-MacBeth regression. After obtaining posterior draws of $\mu_Y$ and $\Sigma_Y$ (see BayesianSDF), we calculate:

Returns

Returns a BayesianFMOutput struct containing:

  • lambda_ols_path::Matrix{Float64}: Matrix of size simlength × (k+1) containing OLS risk premia estimates. First column is ``\lambdac`` for constant term, next k columns are factor risk premia.
  • lambda_gls_path::Matrix{Float64}: Matrix of size sim_length × (k+1) containing GLS risk premia estimates.
  • R2_ols_path::Vector{Float64}: Vector of length sim_length containing OLS $R^2$ draws.
  • R2_gls_path::Vector{Float64}: Vector of length sim_length containing GLS $R^2$ draws.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)
  • sim_length::Int: Number of MCMC iterations performed

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Run Bayesian FM regression with 10,000 iterations  
results = BayesianFM(f, R, 10_000)

# Access results
ols_risk_premia = mean(results.lambda_ols_path, dims=1)  # Mean OLS risk premia
gls_r2 = mean(results.R2_gls_path)  # Mean GLS R²
source
BayesianFactorZoo.BayesianSDFFunction
BayesianSDF(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int=10000; 
           intercept::Bool=true, type::String="OLS", prior::String="Flat",
           psi0::Float64=5.0, d::Float64=0.5)

Bayesian estimation of Linear SDF (B-SDF).

Arguments

  • f: Matrix of factors with dimension $t \times k$
  • R: Matrix of test assets with dimension $t \times N$
  • sim_length: Length of MCMCs
  • intercept: Include intercept if true, default=true
  • type: "OLS" or "GLS", default="OLS"
  • prior: "Flat" or "Normal", default="Flat"
  • psi0: Hyperparameter for normal prior, default=5
  • d: Hyperparameter for normal prior, default=0.5

Returns

Returns a BayesianSDFOutput struct containing:

  • lambda_path::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or simlength × k if false. Contains posterior draws of risk prices.
  • R2_path::Vector{Float64}: Vector of length sim_length containing $R^2$ draws.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)
  • sim_length::Int: Number of MCMC iterations performed
  • prior::String: Prior specification used ("Flat" or "Normal")
  • estimation_type::String: Estimation type used ("OLS" or "GLS")

Notes

  • Input matrices f and R must have the same number of rows (time periods)
  • Number of test assets (N) must be larger than number of factors (k) when including intercept
  • Number of test assets (N) must be >= number of factors (k) when excluding intercept
  • The function performs no pre-standardization of inputs
  • Risk prices are estimated in the units of the input data (typically monthly returns)

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = BayesianSDF(f, R)

# Use GLS with normal prior
results_gls = BayesianSDF(f, R, 10_000; 
                        type="GLS", 
                        prior="Normal",
                        psi0=5.0,
                        d=0.5)

# Access results
risk_prices = mean(results.lambda_path, dims=1)
r2_values = mean(results.R2_path)
source
BayesianFactorZoo.continuous_ss_sdfFunction
continuous_ss_sdf(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int;
                psi0::Float64=1.0, r::Float64=0.001,
                aw::Float64=1.0, bw::Float64=1.0,
                type::String="OLS", intercept::Bool=true)

SDF model selection using continuous spike-and-slab prior.

Arguments

  • f: Matrix of factors with dimension $t \times k$
  • R: Matrix of test assets with dimension $t \times N$
  • sim_length: Length of MCMCs
  • psi0: Hyperparameter in prior distribution of risk prices
  • r: Hyperparameter for spike component ($\ll 1$)
  • aw,bw: Beta prior parameters for factor inclusion probability
  • type: "OLS" or "GLS"
  • intercept: Include intercept if true

Returns

Returns a ContinuousSSSDFOutput struct containing:

  • gammapath::Matrix{Float64}: Matrix of size simlength × k containing posterior draws of factor inclusion indicators.
  • lambdapath::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or sim_length × k if false. Contains posterior draws of risk prices.
  • sdfpath::Matrix{Float64}: Matrix of size simlength × t containing posterior draws of the SDF.
  • bma_sdf::Vector{Float64}: Vector of length t containing the Bayesian Model Averaged SDF.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)
  • sim_length::Int: Number of MCMC iterations performed

Notes

  • Input matrices f and R must have the same number of rows (time periods)
  • The method automatically handles both traded and non-traded factors
  • Prior parameters aw, bw control beliefs about model sparsity (default values favor no sparsity)
  • Parameter psi0 maps into prior beliefs about achievable Sharpe ratios
  • The spike component r should be close to zero to effectively shrink irrelevant factors
  • The resulting SDF is normalized to have mean 1

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = continuous_ss_sdf(f, R, 10_000)

# Use GLS with modified priors for more aggressive selection
results_gls = continuous_ss_sdf(f, R, 10_000;
                             type="GLS",
                             psi0=0.5,     # Tighter prior
                             aw=1.0,       
                             bw=9.0)       # Prior favoring sparsity

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Posterior mean risk prices
sdf = results.bma_sdf                              # Model averaged SDF
source
BayesianFactorZoo.continuous_ss_sdf_v2Function
continuous_ss_sdf_v2(f1::Matrix{Float64}, f2::Matrix{Float64}, R::Matrix{Float64},
                   sim_length::Int; psi0::Float64=1.0, r::Float64=0.001,
                   aw::Float64=1.0, bw::Float64=1.0,
                   type::String="OLS", intercept::Bool=true)

SDF model selection with continuous spike-and-slab prior, treating tradable factors as test assets.

Arguments

  • f1: Matrix of nontradable factors with dimension $t \times k_1$
  • f2: Matrix of tradable factors with dimension $t \times k_2$
  • R: Matrix of test assets with dimension $t \times N$ (should NOT contain f2)
  • sim_length: Length of MCMCs
  • psi0,r,aw,bw,type,intercept: Same as continuoussssdf

Details

Same prior structure and posterior distributions as continuoussssdf, but:

  1. Treats tradable factors f2 as test assets
  2. Total dimension of test assets becomes $N + k_2$
  3. Factor loadings computed on combined test asset set

Returns

Returns a ContinuousSSSDFOutput struct containing:

  • gamma_path::Matrix{Float64}: Matrix of size simlength × k containing posterior draws of factor inclusion indicators, where ``k = k1 + k_2`` (total number of factors).
  • lambda_path::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or simlength × k if false. Contains posterior draws of risk prices.
  • sdf_path::Matrix{Float64}: Matrix of size sim_length × t containing posterior draws of the SDF.
  • bma_sdf::Vector{Float64}: Vector of length t containing the Bayesian Model Averaged SDF.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors ($k_1 + k_2$)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)
  • sim_length::Int: Number of MCMC iterations performed

Notes

  • Input matrices f1, f2, and R must have the same number of rows (time periods)
  • Test assets R should not include the tradable factors f2
  • The factor selection combines both sparsity and density aspects through Bayesian Model Averaging
  • Prior parameters aw, bw control beliefs about model sparsity
  • Parameter psi0 maps into prior beliefs about achievable Sharpe ratios
  • The spike component r should be close to zero to effectively shrink irrelevant factors

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = continuous_ss_sdf_v2(f1, f2, R, 10_000)

# Use GLS with custom priors
results_gls = continuous_ss_sdf_v2(f1, f2, R, 10_000;
                                type="GLS",
                                psi0=2.0,
                                aw=2.0, 
                                bw=2.0)

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Risk price estimates
avg_sdf = results.bma_sdf                          # Model averaged SDF
source
BayesianFactorZoo.dirac_ss_sdf_pvalueFunction
dirac_ss_sdf_pvalue(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int,
                  lambda0::Vector{Float64}; psi0::Float64=1.0,
                  max_k::Union{Int,Nothing}=nothing)

Hypothesis testing for risk prices using Dirac spike-and-slab prior.

Arguments

  • f: Matrix of factors with dimension $t \times k$
  • R: Matrix of test assets with dimension $t \times N$
  • sim_length: Length of MCMCs
  • lambda0: $k \times 1$ vector of null hypothesis values
  • psi0: Hyperparameter in prior distribution
  • max_k: Maximum number of factors in models (optional)

Returns

Returns a DiracSSSDFOutput struct containing:

  • gamma_path::Matrix{Float64}: Matrix of size sim_length × k containing posterior draws of factor inclusion indicators.
  • lambda_path::Matrix{Float64}: Matrix of size sim_length × (k+1) containing posterior draws of risk prices.
  • model_probs::Matrix{Float64}: Matrix of size M × (k+1) where M is the number of possible models. First k columns are model indices (0/1), last column contains model probabilities.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)
  • sim_length::Int: Number of MCMC iterations performed

Notes

  • Input matrices f and R must have the same number of rows (time periods)
  • The method is particularly useful for testing specific hypotheses about risk prices
  • Setting max_k allows for focused testing of sparse models
  • The Dirac spike provides a more stringent test than the continuous spike-and-slab
  • Bayesian p-values can be constructed by integrating 1-p(γ|data)
  • Model probabilities are properly normalized across the considered model space

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Test if all risk prices are zero
lambda0 = zeros(size(f, 2))
results = dirac_ss_sdf_pvalue(f, R, 10_000, lambda0)

# Test specific values with max 3 factors
lambda0_alt = [0.5, 0.3, -0.2, 0.1]
results_sparse = dirac_ss_sdf_pvalue(f, R, 10_000, lambda0_alt; max_k=3)

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Posterior mean risk prices
top_models = results.model_probs[sortperm(results.model_probs[:,end], rev=true)[1:10], :] # Top 10 models
source

Classical Methods

BayesianFactorZoo.SDF_gmmFunction
SDF_gmm(R::Matrix{Float64}, f::Matrix{Float64}, W::Matrix{Float64})

GMM estimation of factor risk prices under linear SDF framework.

Arguments

  • R: Matrix of test assets with dimension $t \times N$
  • f: Matrix of factors with dimension $t \times k$
  • W: Weighting matrix for GMM estimation, dimension $(N+k) \times (N+k)$

Returns

Returns a SDFGMMOutput struct containing:

  • lambda_gmm::Vector{Float64}: Vector of length k+1 containing risk price estimates (includes intercept).
  • mu_f::Vector{Float64}: Vector of length k containing estimated factor means.
  • Avar_hat::Matrix{Float64}: Matrix of size (2k+1) × (2k+1) containing asymptotic covariance matrix.
  • R2_adj::Float64: Adjusted cross-sectional $R^2$.
  • S_hat::Matrix{Float64}: Matrix of size (N+k) × (N+k) containing estimated spectral density matrix.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)

Notes

  • Input matrices R and f must have the same number of rows (time periods)
  • The weighting matrix W should match dimensions (N+k) × (N+k)
  • For tradable factors, weighting matrix should impose self-pricing restrictions
  • Implementation assumes no serial correlation in moment conditions
  • R² is adjusted for degrees of freedom
  • Standard errors are derived under the assumption of correct specification

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Hansen, Lars Peter (1982). "Large Sample Properties of Generalized Method of Moments Estimators." Econometrica, 50(4), 1029-1054.

Examples

# Construct OLS weighting matrix
W_ols = construct_weight_matrix(R, f, "OLS")

# Perform OLS estimation
results_ols = SDF_gmm(R, f, W_ols)

# Construct GLS weighting matrix
W_gls = construct_weight_matrix(R, f, "GLS")

# Perform GLS estimation
results_gls = SDF_gmm(R, f, W_gls)

# Access results
risk_prices = results_ols.lambda_gmm[2:end]  # Factor risk prices (excluding intercept)
std_errors = sqrt.(diag(results_ols.Avar_hat)[2:end])  # Standard errors
r_squared = results_ols.R2_adj  # Adjusted R²

See Also

  • construct_weight_matrix: Function to construct appropriate OLS/GLS weighting matrices
  • BayesianSDF: Bayesian alternative that is robust to weak factors
source
BayesianFactorZoo.TwoPassRegressionFunction
TwoPassRegression(f::Matrix{Float64}, R::Matrix{Float64})

Classical Fama-MacBeth two-pass regression.

Arguments

  • f: Matrix of factors with dimension $t \times k$
  • R: Matrix of test assets with dimension $t \times N$

Returns

Returns a TwoPassRegressionOutput struct containing:

  • lambda::Vector{Float64}: Vector of length k+1 containing OLS risk premia estimates (includes intercept).
  • lambda_gls::Vector{Float64}: Vector of length k+1 containing GLS risk premia estimates.
  • t_stat::Vector{Float64}: Vector of length k+1 containing OLS t-statistics.
  • tstatgls::Vector{Float64}: Vector of length k+1 containing GLS t-statistics.
  • R2_adj::Float64: OLS adjusted R².
  • R2adjGLS::Float64: GLS adjusted R².
  • alpha::Vector{Float64}: Vector of length N containing OLS pricing errors.
  • t_alpha::Vector{Float64}: Vector of length N containing t-statistics for OLS pricing errors.
  • beta::Matrix{Float64}: Matrix of size N × k containing factor loadings.
  • cov_epsilon::Matrix{Float64}: Matrix of size N × N containing residual covariance.
  • cov_lambda::Matrix{Float64}: Matrix of size (k+1) × (k+1) containing OLS covariance matrix of risk premia.
  • covlambdagls::Matrix{Float64}: Matrix of size (k+1) × (k+1) containing GLS covariance matrix of risk premia.
  • R2_GLS::Float64: Unadjusted GLS R².
  • cov_beta::Matrix{Float64}: Matrix of size (N(k+1)) × (N(k+1)) containing covariance matrix of beta estimates.
  • Metadata fields accessible via dot notation:
  • n_factors::Int: Number of factors (k)
  • n_assets::Int: Number of test assets (N)
  • n_observations::Int: Number of time periods (t)

Notes

  • Input matrices f and R must have the same number of rows (time periods)
  • The method is vulnerable to bias from weak and useless factors
  • Standard errors account for the EIV problem but assume serial independence
  • Both OLS and GLS estimates are computed with appropriate standard errors
  • R² values are adjusted for degrees of freedom
  • Includes corrections for using factors as test assets when applicable

References

Fama, Eugene F., and James D. MacBeth, 1973, Risk, return, and equilibrium: Empirical tests, Journal of Political Economy 81, 607-636.

Shanken, Jay, 1992, On the estimation of beta-pricing models, Review of Financial Studies 5, 1-33.

Examples

# Perform two-pass regression
results = TwoPassRegression(f, R)

# Access OLS results
risk_premia = results.lambda[2:end]  # Factor risk premia (excluding intercept)
t_stats = results.t_stat[2:end]      # t-statistics
r2_ols = results.R2_adj              # Adjusted R²
pricing_errors = results.alpha        # Pricing errors

# Access GLS results
risk_premia_gls = results.lambda_gls[2:end]  
t_stats_gls = results.t_stat_gls[2:end]
r2_gls = results.R2_adj_GLS

# First-pass results
betas = results.beta                  # Factor loadings
std_errors_beta = sqrt.(diag(results.cov_beta))  # Standard errors for betas

See Also

  • BayesianFM: Bayesian version that is robust to weak factors
  • SDF_gmm: GMM-based alternative estimation approach
source