Core Functions

For more detailed documentation see the R package documentation (PDF). Note that function signatures and exposed functions are equivalent to the R version.

Bayesian Methods

BayesianFactorZoo.BayesianFM — Function

BayesianFM(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int)

Bayesian Fama-MacBeth regression. Similar to BayesianSDF but estimates factors' risk premia rather than risk prices.

Arguments

f: Matrix of factors with dimension $t \times k$, where $k$ is the number of factors and $t$ is the number of periods
R: Matrix of test assets with dimension $t \times N$, where $t$ is the number of periods and $N$ is the number of test assets
sim_length: Length of MCMCs

Details

Unlike BayesianSDF, we use factor loadings, $\beta_f$, instead of covariance exposures, $C_f$, in the Fama-MacBeth regression. After obtaining posterior draws of $\mu_Y$ and $\Sigma_Y$ (see BayesianSDF), we calculate:

Returns

Returns a BayesianFMOutput struct containing:

lambda_ols_path::Matrix{Float64}: Matrix of size simlength × (k+1) containing OLS risk premia estimates. First column is ``\lambdac`` for constant term, next k columns are factor risk premia.
lambda_gls_path::Matrix{Float64}: Matrix of size sim_length × (k+1) containing GLS risk premia estimates.
R2_ols_path::Vector{Float64}: Vector of length sim_length containing OLS $R^2$ draws.
R2_gls_path::Vector{Float64}: Vector of length sim_length containing GLS $R^2$ draws.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)
sim_length::Int: Number of MCMC iterations performed

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Run Bayesian FM regression with 10,000 iterations  
results = BayesianFM(f, R, 10_000)

# Access results
ols_risk_premia = mean(results.lambda_ols_path, dims=1)  # Mean OLS risk premia
gls_r2 = mean(results.R2_gls_path)  # Mean GLS R²

source

BayesianFactorZoo.BayesianSDF — Function

BayesianSDF(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int=10000; 
           intercept::Bool=true, type::String="OLS", prior::String="Flat",
           psi0::Float64=5.0, d::Float64=0.5)

Bayesian estimation of Linear SDF (B-SDF).

Arguments

f: Matrix of factors with dimension $t \times k$
R: Matrix of test assets with dimension $t \times N$
sim_length: Length of MCMCs
intercept: Include intercept if true, default=true
type: "OLS" or "GLS", default="OLS"
prior: "Flat" or "Normal", default="Flat"
psi0: Hyperparameter for normal prior, default=5
d: Hyperparameter for normal prior, default=0.5

Returns

Returns a BayesianSDFOutput struct containing:

lambda_path::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or simlength × k if false. Contains posterior draws of risk prices.
R2_path::Vector{Float64}: Vector of length sim_length containing $R^2$ draws.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)
sim_length::Int: Number of MCMC iterations performed
prior::String: Prior specification used ("Flat" or "Normal")
estimation_type::String: Estimation type used ("OLS" or "GLS")

Notes

Input matrices f and R must have the same number of rows (time periods)
Number of test assets (N) must be larger than number of factors (k) when including intercept
Number of test assets (N) must be >= number of factors (k) when excluding intercept
The function performs no pre-standardization of inputs
Risk prices are estimated in the units of the input data (typically monthly returns)

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = BayesianSDF(f, R)

# Use GLS with normal prior
results_gls = BayesianSDF(f, R, 10_000; 
                        type="GLS", 
                        prior="Normal",
                        psi0=5.0,
                        d=0.5)

# Access results
risk_prices = mean(results.lambda_path, dims=1)
r2_values = mean(results.R2_path)

source

BayesianFactorZoo.continuous_ss_sdf — Function

continuous_ss_sdf(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int;
                psi0::Float64=1.0, r::Float64=0.001,
                aw::Float64=1.0, bw::Float64=1.0,
                type::String="OLS", intercept::Bool=true)

SDF model selection using continuous spike-and-slab prior.

Arguments

f: Matrix of factors with dimension $t \times k$
R: Matrix of test assets with dimension $t \times N$
sim_length: Length of MCMCs
psi0: Hyperparameter in prior distribution of risk prices
r: Hyperparameter for spike component ($\ll 1$)
aw,bw: Beta prior parameters for factor inclusion probability
type: "OLS" or "GLS"
intercept: Include intercept if true

Returns

Returns a ContinuousSSSDFOutput struct containing:

gammapath::Matrix{Float64}: Matrix of size simlength × k containing posterior draws of factor inclusion indicators.
lambdapath::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or sim_length × k if false. Contains posterior draws of risk prices.
sdfpath::Matrix{Float64}: Matrix of size simlength × t containing posterior draws of the SDF.
bma_sdf::Vector{Float64}: Vector of length t containing the Bayesian Model Averaged SDF.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)
sim_length::Int: Number of MCMC iterations performed

Notes

Input matrices f and R must have the same number of rows (time periods)
The method automatically handles both traded and non-traded factors
Prior parameters aw, bw control beliefs about model sparsity (default values favor no sparsity)
Parameter psi0 maps into prior beliefs about achievable Sharpe ratios
The spike component r should be close to zero to effectively shrink irrelevant factors
The resulting SDF is normalized to have mean 1

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = continuous_ss_sdf(f, R, 10_000)

# Use GLS with modified priors for more aggressive selection
results_gls = continuous_ss_sdf(f, R, 10_000;
                             type="GLS",
                             psi0=0.5,     # Tighter prior
                             aw=1.0,       
                             bw=9.0)       # Prior favoring sparsity

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Posterior mean risk prices
sdf = results.bma_sdf                              # Model averaged SDF

source

BayesianFactorZoo.continuous_ss_sdf_v2 — Function

continuous_ss_sdf_v2(f1::Matrix{Float64}, f2::Matrix{Float64}, R::Matrix{Float64},
                   sim_length::Int; psi0::Float64=1.0, r::Float64=0.001,
                   aw::Float64=1.0, bw::Float64=1.0,
                   type::String="OLS", intercept::Bool=true)

SDF model selection with continuous spike-and-slab prior, treating tradable factors as test assets.

Arguments

f1: Matrix of nontradable factors with dimension $t \times k_1$
f2: Matrix of tradable factors with dimension $t \times k_2$
R: Matrix of test assets with dimension $t \times N$ (should NOT contain f2)
sim_length: Length of MCMCs
psi0,r,aw,bw,type,intercept: Same as continuoussssdf

Details

Same prior structure and posterior distributions as continuoussssdf, but:

Treats tradable factors f2 as test assets
Total dimension of test assets becomes $N + k_2$
Factor loadings computed on combined test asset set

Returns

Returns a ContinuousSSSDFOutput struct containing:

gamma_path::Matrix{Float64}: Matrix of size simlength × k containing posterior draws of factor inclusion indicators, where ``k = k1 + k_2`` (total number of factors).
lambda_path::Matrix{Float64}: Matrix of size simlength × (k+1) if intercept=true, or simlength × k if false. Contains posterior draws of risk prices.
sdf_path::Matrix{Float64}: Matrix of size sim_length × t containing posterior draws of the SDF.
bma_sdf::Vector{Float64}: Vector of length t containing the Bayesian Model Averaged SDF.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors ($k_1 + k_2$)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)
sim_length::Int: Number of MCMC iterations performed

Notes

Input matrices f1, f2, and R must have the same number of rows (time periods)
Test assets R should not include the tradable factors f2
The factor selection combines both sparsity and density aspects through Bayesian Model Averaging
Prior parameters aw, bw control beliefs about model sparsity
Parameter psi0 maps into prior beliefs about achievable Sharpe ratios
The spike component r should be close to zero to effectively shrink irrelevant factors

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Basic usage with default settings
results = continuous_ss_sdf_v2(f1, f2, R, 10_000)

# Use GLS with custom priors
results_gls = continuous_ss_sdf_v2(f1, f2, R, 10_000;
                                type="GLS",
                                psi0=2.0,
                                aw=2.0, 
                                bw=2.0)

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Risk price estimates
avg_sdf = results.bma_sdf                          # Model averaged SDF

source

BayesianFactorZoo.dirac_ss_sdf_pvalue — Function

dirac_ss_sdf_pvalue(f::Matrix{Float64}, R::Matrix{Float64}, sim_length::Int,
                  lambda0::Vector{Float64}; psi0::Float64=1.0,
                  max_k::Union{Int,Nothing}=nothing)

Hypothesis testing for risk prices using Dirac spike-and-slab prior.

Arguments

f: Matrix of factors with dimension $t \times k$
R: Matrix of test assets with dimension $t \times N$
sim_length: Length of MCMCs
lambda0: $k \times 1$ vector of null hypothesis values
psi0: Hyperparameter in prior distribution
max_k: Maximum number of factors in models (optional)

Returns

Returns a DiracSSSDFOutput struct containing:

gamma_path::Matrix{Float64}: Matrix of size sim_length × k containing posterior draws of factor inclusion indicators.
lambda_path::Matrix{Float64}: Matrix of size sim_length × (k+1) containing posterior draws of risk prices.
model_probs::Matrix{Float64}: Matrix of size M × (k+1) where M is the number of possible models. First k columns are model indices (0/1), last column contains model probabilities.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)
sim_length::Int: Number of MCMC iterations performed

Notes

Input matrices f and R must have the same number of rows (time periods)
The method is particularly useful for testing specific hypotheses about risk prices
Setting max_k allows for focused testing of sparse models
The Dirac spike provides a more stringent test than the continuous spike-and-slab
Bayesian p-values can be constructed by integrating 1-p(γ|data)
Model probabilities are properly normalized across the considered model space

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Examples

# Test if all risk prices are zero
lambda0 = zeros(size(f, 2))
results = dirac_ss_sdf_pvalue(f, R, 10_000, lambda0)

# Test specific values with max 3 factors
lambda0_alt = [0.5, 0.3, -0.2, 0.1]
results_sparse = dirac_ss_sdf_pvalue(f, R, 10_000, lambda0_alt; max_k=3)

# Access results
inclusion_probs = mean(results.gamma_path, dims=1)  # Factor inclusion probabilities
risk_prices = mean(results.lambda_path, dims=1)     # Posterior mean risk prices
top_models = results.model_probs[sortperm(results.model_probs[:,end], rev=true)[1:10], :] # Top 10 models

source

Classical Methods

BayesianFactorZoo.SDF_gmm — Function

SDF_gmm(R::Matrix{Float64}, f::Matrix{Float64}, W::Matrix{Float64})

GMM estimation of factor risk prices under linear SDF framework.

Arguments

R: Matrix of test assets with dimension $t \times N$
f: Matrix of factors with dimension $t \times k$
W: Weighting matrix for GMM estimation, dimension $(N+k) \times (N+k)$

Returns

Returns a SDFGMMOutput struct containing:

lambda_gmm::Vector{Float64}: Vector of length k+1 containing risk price estimates (includes intercept).
mu_f::Vector{Float64}: Vector of length k containing estimated factor means.
Avar_hat::Matrix{Float64}: Matrix of size (2k+1) × (2k+1) containing asymptotic covariance matrix.
R2_adj::Float64: Adjusted cross-sectional $R^2$.
S_hat::Matrix{Float64}: Matrix of size (N+k) × (N+k) containing estimated spectral density matrix.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)

Notes

Input matrices R and f must have the same number of rows (time periods)
The weighting matrix W should match dimensions (N+k) × (N+k)
For tradable factors, weighting matrix should impose self-pricing restrictions
Implementation assumes no serial correlation in moment conditions
R² is adjusted for degrees of freedom
Standard errors are derived under the assumption of correct specification

References

Bryzgalova S, Huang J, Julliard C (2023). "Bayesian solutions for the factor zoo: We just ran two quadrillion models." Journal of Finance, 78(1), 487–557.

Hansen, Lars Peter (1982). "Large Sample Properties of Generalized Method of Moments Estimators." Econometrica, 50(4), 1029-1054.

Examples

# Construct OLS weighting matrix
W_ols = construct_weight_matrix(R, f, "OLS")

# Perform OLS estimation
results_ols = SDF_gmm(R, f, W_ols)

# Construct GLS weighting matrix
W_gls = construct_weight_matrix(R, f, "GLS")

# Perform GLS estimation
results_gls = SDF_gmm(R, f, W_gls)

# Access results
risk_prices = results_ols.lambda_gmm[2:end]  # Factor risk prices (excluding intercept)
std_errors = sqrt.(diag(results_ols.Avar_hat)[2:end])  # Standard errors
r_squared = results_ols.R2_adj  # Adjusted R²

See Also

construct_weight_matrix: Function to construct appropriate OLS/GLS weighting matrices
BayesianSDF: Bayesian alternative that is robust to weak factors

source

BayesianFactorZoo.TwoPassRegression — Function

TwoPassRegression(f::Matrix{Float64}, R::Matrix{Float64})

Classical Fama-MacBeth two-pass regression.

Arguments

f: Matrix of factors with dimension $t \times k$
R: Matrix of test assets with dimension $t \times N$

Returns

Returns a TwoPassRegressionOutput struct containing:

lambda::Vector{Float64}: Vector of length k+1 containing OLS risk premia estimates (includes intercept).
lambda_gls::Vector{Float64}: Vector of length k+1 containing GLS risk premia estimates.
t_stat::Vector{Float64}: Vector of length k+1 containing OLS t-statistics.
tstatgls::Vector{Float64}: Vector of length k+1 containing GLS t-statistics.
R2_adj::Float64: OLS adjusted R².
R2adjGLS::Float64: GLS adjusted R².
alpha::Vector{Float64}: Vector of length N containing OLS pricing errors.
t_alpha::Vector{Float64}: Vector of length N containing t-statistics for OLS pricing errors.
beta::Matrix{Float64}: Matrix of size N × k containing factor loadings.
cov_epsilon::Matrix{Float64}: Matrix of size N × N containing residual covariance.
cov_lambda::Matrix{Float64}: Matrix of size (k+1) × (k+1) containing OLS covariance matrix of risk premia.
covlambdagls::Matrix{Float64}: Matrix of size (k+1) × (k+1) containing GLS covariance matrix of risk premia.
R2_GLS::Float64: Unadjusted GLS R².
cov_beta::Matrix{Float64}: Matrix of size (N(k+1)) × (N(k+1)) containing covariance matrix of beta estimates.
Metadata fields accessible via dot notation:
n_factors::Int: Number of factors (k)
n_assets::Int: Number of test assets (N)
n_observations::Int: Number of time periods (t)

Notes

Input matrices f and R must have the same number of rows (time periods)
The method is vulnerable to bias from weak and useless factors
Standard errors account for the EIV problem but assume serial independence
Both OLS and GLS estimates are computed with appropriate standard errors
R² values are adjusted for degrees of freedom
Includes corrections for using factors as test assets when applicable

References

Fama, Eugene F., and James D. MacBeth, 1973, Risk, return, and equilibrium: Empirical tests, Journal of Political Economy 81, 607-636.

Shanken, Jay, 1992, On the estimation of beta-pricing models, Review of Financial Studies 5, 1-33.

Examples

# Perform two-pass regression
results = TwoPassRegression(f, R)

# Access OLS results
risk_premia = results.lambda[2:end]  # Factor risk premia (excluding intercept)
t_stats = results.t_stat[2:end]      # t-statistics
r2_ols = results.R2_adj              # Adjusted R²
pricing_errors = results.alpha        # Pricing errors

# Access GLS results
risk_premia_gls = results.lambda_gls[2:end]  
t_stats_gls = results.t_stat_gls[2:end]
r2_gls = results.R2_adj_GLS

# First-pass results
betas = results.beta                  # Factor loadings
std_errors_beta = sqrt.(diag(results.cov_beta))  # Standard errors for betas

See Also

BayesianFM: Bayesian version that is robust to weak factors
SDF_gmm: GMM-based alternative estimation approach

source