Factor Modeling in Finance

I discuss multi-factor modeling, which generalizes many early financial models into a common prediction and risk framework.

In the 1950s, Harry Markowitz developed modern portfolio theory (Markowitz, 1952). In particular, modern portfolio theory introduced the idea of a risk–reward trade-off between a portfolio’s expected return and its volatility. In its framing, investors want portfolios with high expected returns and low volatility in actual outcomes. This framework is sometimes called mean–variance analysis. However, Markowitz did not define “risk” beyond the notion of portfolio volatility.

Subsequently, several researchers (Sharpe, 1964; Mossin, 1966; Lintner, 1965; Treynor, 1961) extended mean–variance analysis one step further to argue that there is a single explanatory variable, often called a “factor” in both statistics and finance, that explains expected returns: the market. This model became known as the capital asset pricing model (CAPM). The CAPM argues that the only real risk is the market, and so the only real factor is exposure to the market. Formally, let RnR_n and RmR_m be random variables denoting the return of asset nn and the market mm, and let rfr_f be the non-random risk-free rate of return. Then the CAPM states,

E[Rn]rf=βn(E[Rm]rf).(1) \mathbb{E}[R_n] - r_f = \beta_n (\mathbb{E}[R_m] - r_f). \tag{1}

In words, the expected excess return of asset nn is a linear function of the expected excess return of the market. The linear relationship in Equation 11 can be derived directly from the mean–variance analysis framework, particularly from linear efficient frontier. See my post on the CAPM for this derivation. The CAPM is useful because it is both simple and intuitive. There is a single factor, the market, and a single exposure to that factor, βn\beta_n, which captures an asset’s sensitivity to the market. Higher (lower) βn\beta_n means that when the market moves up or down, the return for asset nn moves up or down faster (slower).

However, empirical evidence has not agreed with the CAPM (Fama & French, 2004). To quote (Chamberlain & Rothschild, 1982),

Few believe that asset returns are well described by their first two moments or that some well-defined set of marketable assets contains most of the investment opportunities available to individual investors. Casual observation is sufficient to refute one of the main implications of the CAPM—that everyone holds the market portfolio.

Subsequent economists have proposed many multi-factor models, meaning models with multiple explanatory variables. An early and famous multi-factor model, for example, is the Fama–French three-factor model (Fama & French, 1993). Formally, let “SMB” denote a “small minus big” factor where size is measured via market capitalization, and let “HML” denote a “high minus low” factor, meaning the book-to-market ratio (book or accounting value to market value). Then the Fama–French three-factor model is

E[Rn]rf=βn1(E[Rm]rf)+βn2SMB+βn3HML.(2) \mathbb{E}[R_n] - r_f = \beta_{n1} (\mathbb{E}[R_m] - r_f) + \beta_{n2} \text{SMB} + \beta_{n3} \text{HML}. \tag{2}

Of course, one can extend this logic to any number of macroeconomic indicators, and so there are many other multi-factor models. As an aside, the factors are not indexed by nn. The interpretation of this is that the factors are macroeconomic in nature: exposure to the market, size relative to the market, relative value, interest rates, and so on. Each asset is exposed to or loads on these factors in some quantity, defined by the parameters {βn1,βn2,}\{\beta_{n1}, \beta_{n2}, \dots \}, which are indexed by nn.

A natural extension to multi-factor models with factors specified by some modeler would be a model with latent or unobservable factors. In other words, rather than specifying the factors in advance, can we use multivariate statistics to simply infer the factors? This is precisely what arbitrage pricing theory (APT) does (Ross, 1976). Formally, let KK denote the number of latent factors, let fkf_k denote the kk-th systematic factor that is common to all assets, and let βk\beta_k denote the asset’s sensitivity or factor loading onto the kk-th factor. Finally, let αn\alpha_n denote a linear model’s intercept, and let εn\varepsilon_n denote white noise. Then APT models risky asset returns as

Rn=αn+βn1f1+βn2f2++βnKfK+εn.(3) R_n = \alpha_n + \beta_{n1} f_1 + \beta_{n2} f_2 + \dots + \beta_{nK} f_K + \varepsilon_n. \tag{3}

APT is quite a general, and at this point I think it would be quite clear to a statistician where all this is going: a general framework for multi-factor modeling that looks a lot like linear-Gaussian models such as factor analysis or principal components analysis (PCA).

Multi-factor models

In finance, factor modeling exploits standard methods from multivariate statistics to model returns, variances, and correlations (Rosenberg & McKibben, 1973). To reiterate, a factor is an explanatory variable, and these factors can be observable or unobservable.

Let NN be the number of assets, TT be the number of time periods, and KK be the number of factors. Ideally, KNK \ll N. In words, there are far fewer macroeconomic factors that explain asset returns than their are unique assets. Let rntr_{nt} and εnt\varepsilon_{nt} denote the return and white noise of asset nn at time tt. Let ft\mathbf{f}_t be a KK-vector of factors and let βn\boldsymbol{\beta}_n be a KK-vector of “loadings” or exposures of asset nn onto this factor at time tt. Note that the factors are common across assets (no index nn), while the loadings are common across time (no index tt). Then the multi-factor model is

rnt=αnt+βnft+εnt.(4) r_{nt} = \alpha_{nt} + \boldsymbol{\beta}_n^{\top} \mathbf{f}_t + \varepsilon_{nt}. \tag{4}

As with linear regression, we can push the intercept αnt\alpha_{nt} into the dot product by adding a dummy factor f0t=1f_{0t} = 1. In finance, εnt\varepsilon_{nt} can be interpreted as the idiosyncratic return, or the return of asset nn at time tt that is not shared across assets or time. Note that all other parts of the return in Equation 44 are common to all assets via the factors.

We assume that the idiosyncratic return has zero mean, E[εnt]=0\mathbb{E}[\varepsilon_{nt}] = 0, and that these specific returns are uncorrelated with each other,

cov[εit,εjs]={σit2if j=i and s=t,0otherwise.(5) \text{cov}[\varepsilon_{it}, \varepsilon_{js}] = \begin{cases} \sigma_{it}^2 & \text{if $j = i$ and $s = t$,} \\ 0 & \text{otherwise.} \end{cases} \tag{5}

Furthermore, we assume that the factors are zero mean, E[ft]=0\mathbb{E}[\mathbf{f}_t] = \mathbf{0}, and are uncorrelated with the idiosyncratic return εnt\varepsilon_{nt}. This means that their covariance is zero:

cov[εit,ft]=E[εitft]E[εit]E[ft]=0.(6) \text{cov}[\varepsilon_{it}, \mathbf{f}_t] = \mathbb{E}[\varepsilon_{it} \mathbf{f}_t] - \mathbb{E}[\varepsilon_{it}] \mathbb{E}[\mathbf{f}_t] = \mathbf{0}. \tag{6}

Since E[εit]=0\mathbb{E}[\varepsilon_{it}] = 0, this implies that the cross term must be zero as well, that

E[εitft]=0.(7) \mathbb{E}[\varepsilon_{it} \mathbf{f}_t] = \mathbf{0}. \tag{7}

Finally, let Σf\boldsymbol{\Sigma}_f denote the covariance of ft\mathbf{f}_t. Since ft\mathbf{f}_t is zero mean, this implies

Σf=cov[ft,ft]=E[ftft].(8) \boldsymbol{\Sigma}_f = \text{cov}[\mathbf{f}_t, \mathbf{f}_t] = \mathbb{E}[\mathbf{f}_t \mathbf{f}_t^{\top}]. \tag{8}

Ex ante, rntr_{nt} is a random variable. These assumptions induce some probability distribution onto rntr_{nt}, and we can derive the first and second moments. The first moment of rntr_{nt} is:

E[rnt]=βnE[ft]=0.(9) \mathbb{E}[r_{nt}] = \boldsymbol{\beta}_n^{\top} \mathbb{E}[\mathbf{f}_t] = 0. \tag{9}

So returns are zero mean if the factors are zero mean. The covariance between returns is

cov[rit,rjs]=E[ritrjs]E[rit]E[rjs]=E[ritrjs]=E[(βift+εit)(βjfs+εjs)]=E[βiftfsβj+εitβjfs+βiftεjs+εitεjs]=βiE[ftfs]βj+βjE[εitfs]+βiE[ftεjs]+E[εitεjs]=βiE[ftfs]βj+cov[εit,εjs].(10) \begin{aligned} \text{cov}[r_{it}, r_{js}] &= \mathbb{E}[r_{it} r_{js}] - \mathbb{E}[r_{it}]\mathbb{E}[r_{js}] \\ &= \mathbb{E}[r_{it} r_{js}] \\ &= \mathbb{E}[(\boldsymbol{\beta}_i^{\top} \mathbf{f}_t + \varepsilon_{it})(\boldsymbol{\beta}_j^{\top} \mathbf{f}_s + \varepsilon_{js})] \\ &= \mathbb{E}[\boldsymbol{\beta}_i^{\top} \mathbf{f}_t \mathbf{f}_s^{\top} \boldsymbol{\beta}_j + \varepsilon_{it} \boldsymbol{\beta}_j^{\top} \mathbf{f}_s + \boldsymbol{\beta}_i^{\top} \mathbf{f}_t \varepsilon_{js} + \varepsilon_{it} \varepsilon_{js}] \\ &= \boldsymbol{\beta}_i^{\top} \mathbb{E}[\mathbf{f}_t \mathbf{f}_s^{\top}] \boldsymbol{\beta}_j + \boldsymbol{\beta}_j^{\top} \mathbb{E}[\varepsilon_{it} \mathbf{f}_s] + \boldsymbol{\beta}_i^{\top} \mathbb{E}[\mathbf{f}_t \varepsilon_{js}] + \mathbb{E}[\varepsilon_{it} \varepsilon_{js}] \\ &= \boldsymbol{\beta}_i^{\top} \mathbb{E}[\mathbf{f}_t \mathbf{f}_s^{\top}] \boldsymbol{\beta}_j + \text{cov}[\varepsilon_{it}, \varepsilon_{js}]. \end{aligned} \tag{10}

We cannot simplify E[ftfs]\mathbb{E}[\mathbf{f}_t \mathbf{f}_s^{\top}] without some assumptions. Typically, we restrict t=st = s, so that this term is the covariance of the factors at time tt. This is called a cross-sectional analysis, where the “cross-section” is the slice of all assets at time tt.

As we will see, Equation 1010 is a big leap forward, as it explicitly represents the riskiness of two assets through their idiosyncratic risk and their exposure to the factors’ risk. What this suggests is that not only are factors useful for predicting asset returns, they are useful for modeling portfolio volatility.

Inference

What parameters are we actually estimating in a multi-factor model? It depends on how we frame the problem. Let’s look at several scenarios.

Classic (implicit) factor model

In an implicit factor model, we assume the factor loadings are known and seek to estimate the unknown (implicit) factors. This approach uses a cross-sectional setup and induces the classic formulation of factor analysis. In a cross-sectional analysis, data are grouped by time period, meaning we care about differences between assets within each time period. Thus, we can rewrite Equation 44 in vector form as

rt=Bft+εt,(11) \mathbf{r}_t = \mathbf{B} \mathbf{f}_t + \boldsymbol{\varepsilon}_t, \tag{11}

where

rt=[r1trNt],B=[α1β11β1KαNβN1βNK],εt=[ε1tεNt].(12) \mathbf{r}_t = \begin{bmatrix} r_{1t} \\ \vdots \\ r_{Nt} \end{bmatrix}, \quad \mathbf{B} = \begin{bmatrix} \alpha_{1} & \beta_{11} & \dots & \beta_{1K} \\ \vdots & \ddots & \ddots & \vdots \\ \alpha_{N} & \beta_{N1} & \dots & \beta_{NK} \end{bmatrix}, \quad \boldsymbol{\varepsilon}_t = \begin{bmatrix} \varepsilon_{1t} \\ \vdots \\ \varepsilon_{Nt} \end{bmatrix}. \tag{12}

And we can represent the idiosyncratic return in vector-form as

E[εt]=0,cov[εt]=Σε,Σε=[σ12σN2].(13) \begin{aligned} \mathbb{E}[\boldsymbol{\varepsilon}_t] &= \mathbf{0}, \\ \text{cov}[\boldsymbol{\varepsilon}_t] &= \boldsymbol{\Sigma}_{\varepsilon}, \quad \boldsymbol{\Sigma}_{\varepsilon} = \begin{bmatrix} \sigma^2_1 & \\ & \ddots & \\ & & \sigma_N^2 \end{bmatrix}. \end{aligned} \tag{13}

While εt\boldsymbol{\varepsilon}_t is indexed by tt, Σε\boldsymbol{\Sigma}_{\varepsilon} is not. This is a cross-sectional assumption: at each time period, the covariance of the noise does not change. Since Σε\boldsymbol{\Sigma}_{\varepsilon} is a diagonal matrix, the error terms are uncorrelated across assets.

Equations 99 and 1010 above can be written in vector form as

E[rt]=0,V[rt]=BΣfB+Σε,(14) \begin{aligned} \mathbb{E}[\mathbf{r}_t] &= \mathbf{0}, \\ \mathbb{V}[\mathbf{r}_t] &= \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon}, \end{aligned} \tag{14}

Typically, Σf\boldsymbol{\Sigma}_f is assumed to be a diagonal matrix. See A1 for derivations.

Finally, to estimate the factors, {f1,f2,,fT}\{ \mathbf{f}_1, \mathbf{f}_2, \dots, \mathbf{f}_T \}, we fit TT linear regressions using Equation 1111.

Time-series-based (explicit) factor model

In an explicit factor model, we assume the factors are known a priori (explicit) and seek to estimate the unknown factor loadings. In some sense, this is the most intuitive setup and the one most analogous to the models in the introduction. For example, in the Fama–French three factor model, we know the three factors a priori, and estimating the loadings is equivalent to estimating the parameters {βn1,βn2,βn3}\{\beta_{n1}, \beta_{n2}, \beta_{n3}\} in Equation 22 but for all assets nn.

This requires a slightly different setup, a time-series regression. Instead of Equation 1111, we represent the problem as

rn=Fβn+εn.(15) \mathbf{r}_n = \mathbf{F} \boldsymbol{\beta}_n + \boldsymbol{\varepsilon}_n. \tag{15}

Each of the KK columns of F\mathbf{F} is a TT-vector fk\mathbf{f}_k. This represents the kk-th factor varying across time. This is a time-series regression rather than a cross-sectional regression because now the independent variable is the returns for a single asset across time. Any distributional assumption on rn\mathbf{r}_n is an assumption about this time series. The error terms still have spherical errors but w.r.t. time:

cov[εn]=Σε,Σε=[σ12σT2].(16) \text{cov}[\boldsymbol{\varepsilon}_n] = \boldsymbol{\Sigma}_{\varepsilon}, \quad \boldsymbol{\Sigma}_{\varepsilon} = \begin{bmatrix} \sigma^2_1 & \\ & \ddots & \\ & & \sigma_T^2 \end{bmatrix}. \tag{16}

Now we fit NN linear regressions using Equation 1515 to estimate βn\boldsymbol{\beta}_n for each asset. Here, the real goal is to assess the goodness-of-fit of the model, assuming the factors F\mathbf{F}. If the model has a high coefficient of determination or if the estimated coefficients are statistically significant, then this suggests that the investor has selected useful factors.

Statistical analysis

In the final approach, we assume both the factors and the loadings are unknown. To estimate both quantities jointly, we use standard methods in multivariate statistics such as factor analysis or PCA. Formally, factor analysis is the cross-sectional model defined in Equation 1111, while probabilistic PCA is identical to factor analysis except the idiosyncratic returns have a common variance, i.e.

cov[εt]=Σε,Σε=[σ2σ2].(17) \text{cov}[\boldsymbol{\varepsilon}_{t}] = \boldsymbol{\Sigma}_{\varepsilon}, \quad \boldsymbol{\Sigma}_{\varepsilon} = \begin{bmatrix} \sigma^2 & \\ & \ddots & \\ & & \sigma^2 \end{bmatrix}. \tag{17}

Inference for factor analysis typically requires assuming that ft\mathbf{f}_t is multivariate normally distributed, which induces a multivariate normal assumption on the returns:

ftN(0,Σf),rtftN(0,BΣfB+Σε).(18) \begin{aligned} \mathbf{f}_t &\sim \mathcal{N}(\mathbf{0}, \boldsymbol{\Sigma}_f), \\ &\Downarrow \\ \mathbf{r}_t \mid \mathbf{f}_t &\sim \mathcal{N}\left( \mathbf{0}, \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon} \right). \end{aligned} \tag{18}

This assumption is not too unreasonable if we assume rt\mathbf{r}_t is a vector of log returns rather than raw returns. We can then write down the log likelihood,

L=12t=1Trt(BΣfB+Σε)1rtT2lndet(BΣfB+Σε)+const,(19) \mathcal{L} = -\frac{1}{2} \sum_{t=1}^T \mathbf{r}_t^{\top} (\mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon})^{-1} \mathbf{r}_t - \frac{T}{2} \ln \det(\mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon}) + \text{const}, \tag{19}

and use maximum likelihood estimation or expectation–maximization (EM) to infer the parameters B\mathbf{B} and Σε\boldsymbol{\Sigma}_{\varepsilon}. We can then estimate the factors from the inferred parameters.

Alternatively, we can use PCA. In the PCA-based approach, we first compute the sample covariance matrix of the returns, Σ^r\hat{\boldsymbol{\Sigma}}_r. Then we take as the factors the first KK eigenvectors corresponding to the largest KK eigenvalues of Σr\boldsymbol{\Sigma}_r.

See my previous post on factor analysis for details on fitting factor analysis using EM. Alternatively, see (Tipping & Bishop, 1999) for a discussion of probabilistic PCA.

Risk-factor modeling

So far, we have viewed factors as useful macroeconomic indicators that are correlated with or predictive of asset returns. However, if a factor predicts a return, it is natural to think of it as a risk-factor as well. What do I mean? Recall that in the mean–variance analysis framework, the objective is to maximize our portfolio’s expected return while minimizing its variance. See my post on mean–variance analysis if this claim does not make sense. Formally, if w\mathbf{w} is an NN-vector of portfolio weights, then the unconstrained objective is:

w=arg ⁣maxw{wrtwΣrw},(20) \mathbf{w}^{\star} = \arg\!\max_{\mathbf{w}} \left\{ \mathbf{w}^{\top} \mathbf{r}_t - \mathbf{w}^{\top} \boldsymbol{\Sigma}_r \mathbf{w} \right\}, \tag{20}

where now rt\mathbf{r}_t is an NN-vector of assets in a portfolio at time tt and where Σr\boldsymbol{\Sigma}_r is the covariance of those assets at time tt. We might add constraints such as the weights summing to unity, but the essence of the problem is to maximize the returns of the positions we take and to minimize the risk of those positions, as captured by their variances and covariances.

However, Σr\boldsymbol{\Sigma}_r is an N×NN \times N matrix, which can be quite large and quite sparse. Think about how many stocks there are, for example, and how the number of available stocks changes across time. To compute the optimal w\mathbf{w}^{\star}, an optimizer may supply many values for w\mathbf{w} and then compute wΣrw\mathbf{w}^{\top} \boldsymbol{\Sigma}_r \mathbf{w} many times. In risk-factor modeling, we replace Σr\boldsymbol{\Sigma}_r with the decomposition in Equation 1414:

Σr=BΣfB+Σε.(21) \boldsymbol{\Sigma}_r = \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon}. \tag{21}

In fact, we should do this if we believe Equation 1414 is true, since that equation states that rt\mathbf{r}_t is really a linear combination of factors. This allows us to rewrite the objective in Equation 2121 as

w=arg ⁣maxw{wrtwBΣfBwwΣεw},(22) \mathbf{w}^{\star} = \arg\!\max_{\mathbf{w}} \left\{ \mathbf{w}^{\top} \mathbf{r}_t - \mathbf{w}^{\top} \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} \mathbf{w} - \mathbf{w}^{\top} \boldsymbol{\Sigma}_{\varepsilon} \mathbf{w} \right\}, \tag{22}

Notice that Σε\boldsymbol{\Sigma}_{\varepsilon} is a diagonal matrix of idiosyncratic variances, so the right-most term in Equation 2222 can be computed with a dot product and scalar-vector multiplication. And typically we assume that the factors are uncorrelated, meaning that Σf\boldsymbol{\Sigma}_f is a diagonal matrix as well. Since KNK \ll N, computing wBΣfBw\mathbf{w}^{\top} \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} \mathbf{w} is much faster than computing wΣrw\mathbf{w}^{\top} \boldsymbol{\Sigma}_r \mathbf{w}. Furthermore, this decomposition elegantly handles the sparse matrix Σr\boldsymbol{\Sigma}_r. Rather than estimating the covariances between all combinations of assets, we model asset correlations through their loadings onto common factors.

Conclusion

In finance, multi-factor modeling generalizes many early financial models into a common framework, where returns are linear functions of macroeconomic explanatory variables, called “factors”. In this framework, we can either assume we know the factors or the loadings but not both, in which case inferring the other amounts to a multivariate linear regression. Or we can use standard methods from multivariate statistics, such as factor analysis or PCA, to infer both the factors and loadings jointly. In a cross-sectional analysis, the natural extension of these ideas to model risk (portfolio variances and covariances) through the low-rank approximation induced by the factor model.

   

Appendix

A1. Unconditional moments

The unconditional mean is zero, since E[ft]=0\mathbf{E}[\mathbf{f}_t] = \mathbf{0} by assumption:

E[rt]=E[Bft+εt]=BE[ft]+E[εt]=B0=0.(A2.1) \begin{aligned} \mathbb{E}[\mathbf{r}_t] &= \mathbb{E}[ \mathbf{B} \mathbf{f}_t + \boldsymbol{\varepsilon}_t] \\ &= \mathbf{B} \mathbb{E}[ \mathbf{f}_t ] + \mathbb{E}[ \boldsymbol{\varepsilon}_t] \\ &= \mathbf{B} \mathbf{0} \\ &= \mathbf{0}. \end{aligned} \tag{A2.1}

The unconditional variance is

Σrcov[rt]=E[(rtE[rt])(rtE[rt])]=E[rtrt]=E[(Bft+εt)(Bft+εt)]=E[BftftB+εtεt+εtftB+Bftεt]=BE[ftft]B+E[εtεt]+E[εtft]B+BE[ftεt]=BΣfB+Σε.(A2.2) \begin{aligned} \boldsymbol{\Sigma}_r &\triangleq \text{cov}[\mathbf{r}_t] \\ &= \mathbb{E}[(\mathbf{r}_t - \mathbb{E}[\mathbf{r}_t])(\mathbf{r}_t - \mathbb{E}[\mathbf{r}_t])^{\top}] \\ &= \mathbb{E}[\mathbf{r}_t \mathbf{r}_t^{\top}] \\ &= \mathbb{E}[(\mathbf{B} \mathbf{f}_t + \boldsymbol{\varepsilon}_t)(\mathbf{B} \mathbf{f}_t + \boldsymbol{\varepsilon}_t)^{\top}] \\ &= \mathbb{E} \left[ \mathbf{B}\mathbf{f}_t\mathbf{f}_t^{\top}\mathbf{B}^{\top} + \boldsymbol{\varepsilon}_t \boldsymbol{\varepsilon}_t^{\top} + \boldsymbol{\varepsilon}_t \mathbf{f}_t^{\top} \mathbf{B}^{\top} + \mathbf{B} \mathbf{f}_t \boldsymbol{\varepsilon}_t \right] \\ &= \mathbf{B} \mathbb{E} \left[ \mathbf{f}_t\mathbf{f}_t^{\top} \right] \mathbf{B}^{\top} + \mathbb{E} \left[ \boldsymbol{\varepsilon}_t \boldsymbol{\varepsilon}_t^{\top} \right] + \mathbb{E} \left[ \boldsymbol{\varepsilon}_t \mathbf{f}_t^{\top} \right] \mathbf{B}^{\top} + \mathbf{B} \mathbb{E}\left[\mathbf{f}_t \boldsymbol{\varepsilon}_t \right] \\ &= \mathbf{B} \boldsymbol{\Sigma}_f \mathbf{B}^{\top} + \boldsymbol{\Sigma}_{\varepsilon}. \end{aligned} \tag{A2.2}

The cross-terms are zero because εt\boldsymbol{\varepsilon}_t and ft\mathbf{f}_t are mean-zero and uncorrelated, so:

0=cov[ft,εt]=E[(ftE[ft])(εtE[εt])]=E[ftεt].(A2.3) \begin{aligned} \mathbf{0} &= \text{cov}[\mathbf{f}_t, \boldsymbol{\varepsilon}_t] \\ &= \mathbb{E}[(\mathbf{f}_t - \mathbb{E}[\mathbf{f}_t])(\boldsymbol{\varepsilon}_t - \mathbb{E}[\boldsymbol{\varepsilon}_t])^{\top}] \\ &= \mathbb{E}[\mathbf{f}_t \boldsymbol{\varepsilon}_t^{\top}]. \end{aligned} \tag{A2.3}

  1. Markowitz, H. (1952). Portfolio selection. Journal of Finance.
  2. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance, 19(3), 425–442.
  3. Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica: Journal of the Econometric Society, 768–783.
  4. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. The Review of Economics and Statistics, 222–224.
  5. Treynor, J. L. (1961). Market value, time, and risk. Time, and Risk (August 8, 1961).
  6. Fama, E. F., & French, K. R. (2004). The capital asset pricing model: Theory and evidence. Journal of Economic Perspectives, 18(3), 25–46.
  7. Chamberlain, G., & Rothschild, M. (1982). Arbitrage, factor structure, and mean-variance analysis on large asset markets. National Bureau of Economic Research Cambridge, Mass., USA.
  8. Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  9. Ross, S. A. (1976). The arbitrage theory of capital asset pricing. In Handbook of the fundamentals of financial decision making: Part I (pp. 11–30). World Scientific.
  10. Rosenberg, B., & McKibben, W. (1973). The prediction of systematic and specific risk in common stocks. Journal of Financial and Quantitative Analysis, 8(2), 317–333.
  11. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.