Geometry of the Efficient Frontier

Some important financial ideas are encoded in the geometry of the efficient frontier, such as the tangency portfolio and the Sharpe ratio. The goal of this post is to re-derive these ideas geometrically, showing that they arise from the mean–variance analysis framework.

In modern portfolio theory, the efficient frontier is the locus of points {(σp,μp)}\{(\sigma_p, \mu_p)\} corresponding to optimal portfolios, where “optimal” means the lowest risk (standard deviation of the portfolio σp\sigma_p) for the highest reward (expected portfolio return μp\mu_p). When a portfolio contains only risky (random) assets, the relationship between risk σp\sigma_p and reward μp\mu_p is quadratic and is typically diagrammed as the Markowitz bullet (Figure 11, dashed blue line) on a risk–return spectrum, with risk on the xx-axis and reward on the yy-axis. For any given risk σp\sigma_p, there are two portfolios on the hyperbola with different rewards. Clearly, all things being equal, higher reward should be preferred for the same level of risk. Thus, only the top half of the hyperbola is called the efficient frontier (Figure 11, solid blue line).

This nonlinear efficient frontier only applies to portfolios with all risky assets. If we include a single risk-free asset—the canonical example is a United States treasury bill—then the Markowitz bullet becomes piecewise-linear (Figure 11, red dashed line), and the top half is again the efficient frontier (Figure 11, red solid line). For a given level of risk, every portfolio on this linear efficient frontier has a greater or equal expected return to any portfolio on the hyperbolic efficient frontier. The line crosses the yy-axis at the rate of return of the risk-free asset, called the risk-free rate (Figure 11, red circle), and the line intersects the efficient frontier at a point called the tangency portfolio (Figure 11, white circle). The tangency portfolio gets its name because the linear efficient frontier is collinear to the tangent line at the point that the two frontiers intersect.

Finally, the slope of the linear efficient frontier is the Sharpe ratio, or the performance of a portfolio in excess of the risk-free rate after adjusting for risk. Put differently, every portfolio on the linear efficient frontier, including the tangency portfolio, has the same Sharpe ratio. Furthermore, this Sharpe ratio is the highest Sharpe possible, i.e. it is the highest expected excess return per unit risk of any portfolio.

Figure 1. The efficient frontier (EF) for risky-only assets (blue) and for a portfolio with risky and one risk-free asset (red). The inefficient frontiers are the dashed lines, since any portfolio above the axes of symmetry have higher expected return for the same risk. The linear efficient frontier is a line between the risk-free rate (red dot) and the tangency portfolio (white dot).

The above three paragraphs make a lot of claims. In many resources discussing modern portfolio theory, mean–variance analysis, or related topics such as the capital asset pricing model, these claims are often made without proof. The reader is expected to know, understand, or simply accept that everything I’ve written above makes sense. The goal of this post is to re-derive these geometric properties for myself.

These questions have already been answered in (Merton, 1972), and this post is, essentially, my notes on that paper. I’ve also relied on these notes by Eric Zivot for some of the matrix algebra required.

Setup and notation

If this section does not make sense, please see my post on mean–variance analysis first.

Suppose a portfolio has NN risky assets. Let RnR_n, a random variable, be the return for the nn-th asset. Let’s denote the first moment as μnE[Rn]\mu_n \triangleq \mathbb{E}[R_n], and let’s denote the covariance between RnR_n and RmR_m as σnmCov(Rn,Rm)\sigma_{nm} \triangleq \text{Cov}(R_n, R_m). Thus, the variance of RnR_n is σnn=σn2V[Rn]\sigma_{nn} = \sigma_n^2 \triangleq \mathbb{V}[R_n]. Finally, let wnw_n denote portfolio weight of the nn-th asset. We can pack these symbols into vectors and a matrix as follows:

μ[μ1μN],Σ=[σ11σ1NσN1σNN],w[w1wN].(1) \boldsymbol{\mu} \triangleq \begin{bmatrix} \mu_1 \\ \vdots \\ \mu_N \end{bmatrix}, \quad \boldsymbol{\Sigma} = \begin{bmatrix} \sigma_{11} & \dots & \sigma_{1N} \\ \vdots & \ddots & \vdots \\ \sigma_{N1} & \dots & \sigma_{NN} \end{bmatrix}, \quad \mathbf{w} \triangleq \begin{bmatrix} w_1 \\ \vdots \\ w_N \end{bmatrix}. \tag{1}

We assume that the covariance matrix Σ\boldsymbol{\Sigma} is non-singular, so Σ1\boldsymbol{\Sigma}^{-1} exists.

A portfolio’s return RpR_p, also a random variable, is simply an accounting identity,

RpnwnRn,(2) R_p \triangleq \sum_{n} w_n R_n, \tag{2}

and we can derive it’s mean μpE[Rp]\mu_p \triangleq \mathbb{E}[R_p] and variance σpV[Rp]\sigma_p \triangleq \mathbb{V}[R_p] from Equation 22:

μpwμ,σp2wΣw.(3) \begin{aligned} \mu_p &\triangleq \mathbf{w}^{\top} \boldsymbol{\mu}, \\ \sigma_p^2 &\triangleq \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w}. \end{aligned} \tag{3}

The optimal portfolio weights are defined in the following quadratic programming problem:

minwwΣw,subject towμ=μp,andw1=1,(4) \begin{aligned} \min_{\mathbf{w}} &&& \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w}, \\ \text{subject to} &&& \mathbf{w}^{\top} \boldsymbol{\mu} = \mu_p, \\ \text{and} &&& \mathbf{w}^{\top} \mathbf{1} = 1, \end{aligned} \tag{4}

In words, Equation 44 means: minimize the portfolio’s variance subject to the constraints that the portfolio’s expected return is μp\mu_p and the weights sum to unity. Thus, for a given expected return μp\mu_p, we can solve the optimization problem for w\mathbf{w} and then calculate σp2\sigma_p^2.

Figure 2. Fifteen portfolios on the efficient frontier, computed numerically using quadratic programming.

The Markowitz bullet is the locus of points {(μp,σp)}\{(\mu_p, \sigma_p)\} that satisfy Equation 44 for all μp\mu_p, and the efficient frontier is the top half of this bullet.

In Figure 22, I’ve drawn the Markowitz bullet for fifteen different μp\mu_p values using synthetic expected returns μ\boldsymbol{\mu} and covariances Σ\boldsymbol{\Sigma}. For each μp\mu_p, I used SciPy’s minimize function to find the optimal weights w\mathbf{w} and then solved for σp\sigma_p. (See A1 for code.) Empirically, we can see that the Markowitz bullet is a hyperbola. Now, let’s prove it.

Efficient frontier with only risky assets

First, let’s derive the efficient frontier when our portfolio only contains risky assets, i.e. when each RnR_n is a random variable. As we will see, in this case, the efficient frontier in mean-standard deviation space is a hyperbola because the portfolio variance σp2\sigma_p^2 is a quadratic function of the portfolio mean μp\mu_p, i.e. the efficient frontier in mean-variance space is a parabola. We’ll prove this by solving for the optimal weights w\mathbf{w} using the method of Lagrange multipliers, and then expressing σp2\sigma_p^2 in terms of these optimal weights.

Solving for optimal portfolio weights

Let’s write Equation 44 using a Lagrangian function:

L(w,λ)=wΣw+λ1(wμμp)+λ2(w11),(5) \mathcal{L}(\mathbf{w}, \boldsymbol{\lambda}) = \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w} + \lambda_1 \left( \mathbf{w}^{\top} \boldsymbol{\mu} - \mu_p \right) + \lambda_2 \left( \mathbf{w}^{\top} \mathbf{1} - 1 \right), \tag{5}

where λ=[λ1λ2]\boldsymbol{\lambda} = \begin{bmatrix} \lambda_1 & \lambda_2 \end{bmatrix}^{\top} are Lagrange multipliers. We want to take the derivative of L\mathcal{L} w.r.t. each wiw_i and each λi\lambda_i, set each equation equal to zero, and solve. In other words, we want to solve

w1,,wN,λ1,λ2L(w,λ)=0.(6) \nabla_{w_1, \dots, w_N,\lambda_1,\lambda_2} \mathcal{L}(\mathbf{w}, \boldsymbol{\lambda}) = \mathbf{0}. \tag{6}

This is a system of N+2N+2 equations. The gradient of the Lagrangian function is

wL=2Σw+λ1μ+λ21,λ1L=wμμp,λ2L=w11.(7) \begin{aligned} \nabla_{\mathbf{w}} \mathcal{L} &= 2 \boldsymbol{\Sigma} \mathbf{w} + \lambda_1 \boldsymbol{\mu} + \lambda_2 \mathbf{1}, \\ \frac{\partial}{\partial \lambda_1} \mathcal{L} &= \mathbf{w}^{\top} \boldsymbol{\mu} - \mu_p, \\ \frac{\partial}{\partial \lambda_2} \mathcal{L} &= \mathbf{w}^{\top} \mathbf{1} - 1. \end{aligned} \tag{7}

The derivative of the first term in the top row of Equation 77 is because, in general,

x  xBx=(B+B)x,(8) \nabla_{\mathbf{x}} \; \mathbf{x}^{\top} \mathbf{B} \mathbf{x} = (\mathbf{B} + \mathbf{B}^{\top}) \mathbf{x}, \tag{8}

for a vector x\mathbf{x} and matrix B\mathbf{B}. In our case, B=Σ\mathbf{B} = \boldsymbol{\Sigma} is symmetric. You can easily derive this result yourself by hand or see Equation 8181 in (Petersen et al., 2008). The derivatives of the other terms are fairly straightforward.

We can solve for w\mathbf{w} after setting the first line of Equation 77 to zero:

w=12λ1Σ1μ12λ2Σ11.(9) \mathbf{w} = -\frac{1}{2} \lambda_1 \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} -\frac{1}{2} \lambda_2 \boldsymbol{\Sigma}^{-1} \mathbf{1}. \tag{9}

This can be written more compactly as

w=12Σ1[μ1][λ1λ2]=12Σ1Uλ.(10) \begin{aligned} \mathbf{w} &= -\frac{1}{2} \boldsymbol{\Sigma}^{-1} \begin{bmatrix} \boldsymbol{\mu} & \mathbf{1} \end{bmatrix} \begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix} \\ &= -\frac{1}{2} \boldsymbol{\Sigma}^{-1} \mathbf{U} \boldsymbol{\lambda}. \end{aligned} \tag{10}

where U\mathbf{U} is an N×2N \times 2 matrix, U[μ1]\mathbf{U} \triangleq \begin{bmatrix} \boldsymbol{\mu} & \mathbf{1} \end{bmatrix}.

We can solve for λ\boldsymbol{\lambda} while adhering to the equation for w\mathbf{w} by plugging Equation 1010 into the second and third lines of Equation 77 after setting these lines equal to zero:

μp=μw=12λ1μΣ1μ12λ2μΣ11,1=1w=12λ11Σ1μ12λ21Σ11.(11) \begin{aligned} \mu_p = \boldsymbol{\mu}^{\top} \mathbf{w} &= -\frac{1}{2} \lambda_1 \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} -\frac{1}{2} \lambda_2 \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}, \\ 1 = \mathbf{1}^{\top} \mathbf{w} &= -\frac{1}{2} \lambda_1 \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} -\frac{1}{2} \lambda_2 \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}. \end{aligned} \tag{11}

We can write this system of linear equations in matrix form as

[μp1]=12[μΣ1μμΣ111Σ1μ1Σ11][λ1λ2]=12[μ1]Σ1[μ1]λ.(12) \begin{aligned} \begin{bmatrix} \mu_p \\ 1 \end{bmatrix} &= -\frac{1}{2} \begin{bmatrix} \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} & \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1} \\ \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} & \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1} \end{bmatrix} \begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix} \\ &= -\frac{1}{2} \begin{bmatrix} \boldsymbol{\mu}^{\top} \\ \mathbf{1}^{\top} \end{bmatrix} \boldsymbol{\Sigma}^{-1} \begin{bmatrix} \boldsymbol{\mu} & \mathbf{1} \end{bmatrix} \boldsymbol{\lambda}. \end{aligned} \tag{12}

We can simplify Equation 1212 by writing it in terms of U\mathbf{U}, λ\boldsymbol{\lambda}, u[μp1]\mathbf{u} \triangleq \begin{bmatrix} \mu_p & 1 \end{bmatrix}^{\top}, and MUΣ1U\mathbf{M} \triangleq \mathbf{U}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{U}:

u=12UΣ1Uλ=12Mλ.(13) \begin{aligned} \mathbf{u} &= -\frac{1}{2} \mathbf{U}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{U} \boldsymbol{\lambda} \\ &= -\frac{1}{2} \mathbf{M} \boldsymbol{\lambda}. \end{aligned} \tag{13}

Now we can solve for the λ\boldsymbol{\lambda} values which hold for Equation 1010:

λ=2M1u.(14) \boldsymbol{\lambda} = -2 \mathbf{M}^{-1} \mathbf{u}. \tag{14}

And finally, we can solve explicitly for the optimal weights that give the efficient frontier w\mathbf{w} by plugging Equation 1414 into the second line of Equation 1010:

w=Σ1UM1u.(15) \mathbf{w}^{\star} = \boldsymbol{\Sigma}^{-1} \mathbf{U} \mathbf{M}^{-1} \mathbf{u}. \tag{15}

To check that this is correct, I’ve re-created Figure 22 using both numerical minimization of Equation 44 and analytical computation of Equation 1515 (Figure 33). (See A2 for code.)

Figure 3. Fifteen portfolios on the efficient frontier, computed numerically using quadratic programming (blue dots) and analytically using Equation 1515 (red "x" marks).

Why the Markowitz bullet is a hyperbola

So why is the Markowitz bullet a hyperbola? We can now express the portfolio variance σp2\sigma_p^2 as a function of its expected return μp\mu_p (encoded in u\mathbf{u}):

σp2=wΣw=(Σ1UM1u)Σ(Σ1UM1u)=uM1UΣ1UM1u.=uM1u.=[μp1][[μ1]Σ1[μ1]]1[μp1](16) \begin{aligned} \sigma_p^2 &= \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w} \\ &= (\boldsymbol{\Sigma}^{-1} \mathbf{U} \mathbf{M}^{-1} \mathbf{u})^{\top} \boldsymbol{\Sigma} (\boldsymbol{\Sigma}^{-1} \mathbf{U} \mathbf{M}^{-1} \mathbf{u}) \\ &= \mathbf{u}^{\top} \mathbf{M}^{-1} \mathbf{U}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{U} \mathbf{M}^{-1} \mathbf{u}. \\ &= \mathbf{u}^{\top} \mathbf{M}^{-1} \mathbf{u}. \\ &= \begin{bmatrix} \mu_p & 1 \end{bmatrix} \left[ \begin{bmatrix} \boldsymbol{\mu}^{\top} \\ \mathbf{1}^{\top} \end{bmatrix} \boldsymbol{\Sigma}^{-1} \begin{bmatrix} \boldsymbol{\mu} & \mathbf{1} \end{bmatrix} \right]^{-1} \begin{bmatrix} \mu_p \\ 1 \end{bmatrix} \end{aligned} \tag{16}

At this point, we need to invert M\mathbf{M}, but it’s not as bad as it looks because M\mathbf{M} is a 2×22 \times 2 matrix:

M=[μ1]Σ1[μ1]=[μΣ1μμΣ111Σ1μ1Σ1].(17) \mathbf{M} = \begin{bmatrix} \boldsymbol{\mu}^{\top} \\ \mathbf{1}^{\top} \end{bmatrix} \boldsymbol{\Sigma}^{-1} \begin{bmatrix} \boldsymbol{\mu} & \mathbf{1} \end{bmatrix} = \begin{bmatrix} \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} & \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1} \\ \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} & \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \end{bmatrix}. \tag{17}

Recall that the inverse of a 2×22 \times 2 matrix A\mathbf{A} has the following solution:

A=[abcd],A1=1det(A)[dbca],det(A)=adbc.(18) \mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \qquad \mathbf{A}^{-1} = \frac{1}{\det(\mathbf{A})} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}, \qquad \det(\mathbf{A}) = ad - bc. \tag{18}

This is particularly convenient for us since M\mathbf{M} is symmetric. Therefore, the inverse of M\mathbf{M} is just

M1=1det(M)[1Σ111Σ1μμΣ11μΣ1μ].(19) \mathbf{M}^{-1} = \frac{1}{\det(\mathbf{M})} \begin{bmatrix} \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1} & -\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} \\ -\boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1} & \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} \end{bmatrix}. \tag{19}

Putting it all together, we can see that the variance of the portfolio σp2\sigma_p^2 is a quadratic function of its expected return μp\mu_p:

σp2=1det(M){(1Σ11)μp22(1Σ1μ)μp+μΣ1μ}.(20) \sigma_p^2 = \frac{1}{\det(\mathbf{M})} \left\{ (\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}) \mu_p^2 - 2 (\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}) \mu_p + \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} \right\}. \tag{20}

This is just a vectorized formulation of Equation 1212 in (Merton, 1972). To quote Merton on his Equation 1212, “Thus, the frontier in mean-variance space is a parabola.” Later, we will find it easier to work with this quadratic equation if we introduce some notation:

s111Σ11,s1μ=sμ11Σ1μ,sμμμΣ1μddet(M)=sμμs11s1μ2.(21) \begin{aligned} s_{11} &\triangleq \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}, \\ s_{1\mu} = s_{\mu 1} &\triangleq \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}, \\ s_{\mu\mu} &\triangleq \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} \\ d &\triangleq \det(\mathbf{M}) = s_{\mu\mu}s_{11} - s_{1\mu}^2. \end{aligned} \tag{21}

Putting it together, we can write μp\mu_p in terms of σp\sigma_p as

σp=f(μp)=s11μp22s1μμp+sμμd.(22) \sigma_p = f(\mu_p) = \sqrt{\frac{s_{11} \mu_p^2 - 2 s_{1\mu} \mu_p + s_{\mu\mu}}{d}}. \tag{22}

And this is just a vectorized formulation of Equation 1515 in (Merton, 1972). To quote Merton again, “It is usual to present the frontier in the mean-standard deviation plane instead of the mean-variance plane… Figure II graphs [this] frontier which is a hyperbola…” To summarize, the efficient frontier in terms of (μp,σp2)(\mu_p, \sigma_p^2) is a parabola, while the efficient frontier in terms of (μp,σp)(\mu_p, \sigma_p) is a hyperbola. And “Markowitz bullet” typically refers to the frontier in mean-standard deviation space, i.e. the bullet is a hyperbola. This is a subtle distinction which was pointed out to me by a reader (see the acknowledgements). See this mathematics StackExchange post for further discussion.

Anyway, Equation 2222 is useful is because now we can use basic properties of quadratic equations for quick computation, such as finding the vertex of the hyperbola (where the efficient frontier starts), taking derivatives, or solving for μp\mu_p.

Figure 4. The Markowitz bullet (blue line), visualized by drawing ten thousand portfolios that satisfy Equation 2020. Also, fifteen portfolios on the efficient frontier, computed numerically using quadratic programming (blue dots) and analytically using Equation 1515 (red "x" marks).

Furthermore, we can easily vectorize the computation required to draw the efficient frontier. Rather than computing the optimal w\mathbf{w} and then computing σp2\sigma_p^2, we can simply compute the three scalar coefficients in Equation 2222—which do not depend on w\mathbf{w}—and the normalization term dd to compute the correct (minimum) standard deviation σp\sigma_p for any input expected return μp\mu_p. In Figure 44, I have drawn the Markowitz bullet using this vectorized computation over ten thousand μp\mu_p values. (See A3 for code.)

Relationship between reward and weights

Finally, note that Equation 1515 has an important implication. While the relationship between risk and reward is quadratic, the relationship between optimal weights and expected returns is linear. Thus, the Markowitz bullet is just hyperplane in weight-space. To visualize this, I’ve plotted the Markowitz bullet for N=2N=2 assets. When N=2N=2, the two optimal weights are fully specified by w1w_1, since w2=1w1w_2 = 1 - w_1. Thus, we can plot the Markowitz bullet in 33-dimensional space, with σp\sigma_p, μp\mu_p, and w1w_1 as the axes (Figure 55).

Figure 5. The Markowitz bullet in 33-dimensional space defined by expected return μp\mu_p, return variance σp\sigma_p, and optimal weight w1w_1. The bullet is a hyperbola lying on a hyperplane defined by w1w_1.

Intuitively, this makes sense. All this geometry is representing is that, if we want bigger expected returns for our portfolio, we should put more weight on assets with bigger expected returns. However, our risk grows nonlinearly.

Efficient frontier with a risk-less asset

Now that we have proven that the Markowitz bullet with risky assets is a hyperbola, let’s consider the efficient frontier when we include a risk-free asset with return rfr_f (lowercase because non-random). As I mentioned, this is often called the risk-free rate and the canonical example of a risk-free asset is a United States treasury bill. We’ll prove that, in this case, the Markowitz bullet is a piecewise linear function, and that the slope of the top half of this frontier—the efficient part—is the Sharpe ratio. Where the hyperbolic and linear functions intersect is called the tangency portfolio (Figure 11).

Note that we will assume the risk-free rate rfr_f is lower than the yy-coordinate of the vertex of the hyperbolic efficient frontier. In other words, we assume that a portfolio of risky assets has higher expected return than the risk-free rate. See Section IV of (Merton, 1972) for a discussion of when this does not hold.

To compute this new efficient frontier, let’s repeat our process from the previous section, but this time, let’s include a risk-free asset. Let wfw_f denote the weight of rfr_f in a portfolio with N+1N+1 assets. Since the portfolio weights sum to unity, we have

w1+wf=1.(23) \mathbf{w}^{\top} \mathbf{1} + w_f = 1. \tag{23}

The expected return on a portfolio with both risky and risk-free assets can be written as

μp=wμ+wfrf=wμ+rf(1w1)=rf+w(μrf1).(24) \begin{aligned} \mu_p &= \mathbf{w}^{\top} \boldsymbol{\mu} + w_f r_f \\ &= \mathbf{w}^{\top} \boldsymbol{\mu} + r_f (1 - \mathbf{w}^{\top} \mathbf{1}) \\ &= r_f + \mathbf{w}^{\top} (\boldsymbol{\mu} - r_f \mathbf{1}). \end{aligned} \tag{24}

Since rfr_f is risk-free, we want to minimize our portfolio’s variance, which is still wΣw\mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w}, while targeting a given expected excess return, which is just the expected return less the risk-free rate,

μprf=w(μrf1).(25) \mu_p - r_f = \mathbf{w}^{\top} (\boldsymbol{\mu} - r_f \mathbf{1}). \tag{25}

We target the excess return because rfr_f is fixed. To simplify things, let’s use the following notation:

μ~μrf1,μ~pμprf.(26) \begin{aligned} \tilde{\boldsymbol{\mu}} &\triangleq \boldsymbol{\mu} - r_f \mathbf{1}, \\ \tilde{\mu}_p &\triangleq \mu_p - r_f. \end{aligned} \tag{26}

Now the new optimization problem is

minwwΣw,subject towμ~=μ~p.(27) \begin{aligned} \min_{\mathbf{w}} &&& \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w}, \\ \text{subject to} &&& \mathbf{w}^{\top} \tilde{\boldsymbol{\mu}} = \tilde{\mu}_p. \end{aligned} \tag{27}

Notice that w1=1\mathbf{w}^{\top} \mathbf{1} = 1 is no longer a constraint. While the portfolio weights must sum to unity, w\mathbf{w} need not. This is because we can allocate whatever proportion of our portfolio we would like to the risk-free asset. The portfolio variance is unchanged since rfr_f is risk-free. Again, we can solve this using Lagrange multipliers. The Lagrangian is

L(w,λ)=wΣw+λ(wμ~μ~p).(28) \mathcal{L}(\mathbf{w}, \lambda) = \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w} + \lambda (\mathbf{w}^{\top} \tilde{\boldsymbol{\mu}} - \tilde{\mu}_p). \tag{28}

Again, we can find the first-order conditions by computing the gradient, setting it equal to zero, and following for w\mathbf{w} in terms of λ\lambda. The first-order conditions are

wL=2Σw+λμ~=0,λL=wμ~μ~p=0.(29) \begin{aligned} \nabla_{\mathbf{w}} \mathcal{L} &= 2 \boldsymbol{\Sigma} \mathbf{w} + \lambda \tilde{\boldsymbol{\mu}} = \mathbf{0}, \\ \frac{\partial}{\partial \lambda} \mathcal{L} &= \mathbf{w}^{\top} \tilde{\boldsymbol{\mu}} - \tilde{\mu}_p = 0. \end{aligned} \tag{29}

As before, let’s solve the first first-order condition for w\mathbf{w}:

w=λ2Σ1μ~.(30) \mathbf{w} = -\frac{\lambda}{2} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}. \tag{30}

We can then solve for λ\lambda while adhering to the definition of w\mathbf{w} as before:

μ~p=μ~w=λ2μ~Σ1μ~,λ=2μ~pμ~Σ1μ~.(31) \begin{aligned} \tilde{\mu}_p &= \tilde{\boldsymbol{\mu}}^{\top} \mathbf{w} \\ &= -\frac{\lambda}{2} \tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}, \\ &\Downarrow \\ \lambda &= \frac{-2 \tilde{\mu}_p}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}. \end{aligned} \tag{31}

Finally, we can plug λ\lambda back into Equation 3030 to solve for w\mathbf{w} without λ\lambda:

w=μ~p(Σ1μ~μ~Σ1μ~).(32) \mathbf{w} = \tilde{\mu}_p \left( \frac{\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right). \tag{32}

(This is Merton’s Equation 3636.) As before, we can now express σp2\sigma_p^2 in terms of these optimal weights:

σp2=wΣw=μ~p2(Σ1μ~μ~Σ1μ~)Σ(Σ1μ~μ~Σ1μ~)=(μprf)2(μrf1)Σ1(μrf1).(33) \begin{aligned} \sigma_p^2 &= \mathbf{w}^{\top} \boldsymbol{\Sigma} \mathbf{w} \\ &= \tilde{\mu}_p^2 \left( \frac{\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right)^{\top} \boldsymbol{\Sigma} \left( \frac{\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right) \\ &= \frac{(\mu_p - r_f)^2}{(\boldsymbol{\mu} - r_f \mathbf{1})^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \end{aligned} \tag{33}

This time, rather than leaving this as a function of μp\mu_p, let’s rewrite it as a function of σp\sigma_p, so that we can easily plot it the standard mean–variance axes:

μprf=σp(μrf1)Σ1(μrf1)μp=rf±σp(μrf1)Σ1(μrf1).(34) \begin{aligned} | \mu_p - r_f | &= \sigma_p \sqrt{(\boldsymbol{\mu} - r_f \mathbf{1})^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})} \\ &\Downarrow \\ \mu_p &= r_f \pm \sigma_p \sqrt{(\boldsymbol{\mu} - r_f \mathbf{1})^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \end{aligned} \tag{34}

Note that this is Merton’s Equation 3535. Clearly, the Markowitz bullet is now a piecewise linear function, and again, the efficient frontier is only the top-half of this bullet (Figure 66).

Figure 6. The Markowitz bullet is a piecewise linear function when a portfolio can contain one risk-less asset. The efficient frontier is the top half of this bullet, a line. The vertex is the risk-free rate.

Notice that the slope of this new frontier with a risk-less asset is the Sharpe ratio:

μprfσp=(μrf1)Σ1(μrf1).(35) \frac{\mu_p - r_f}{\sigma_p} = \sqrt{(\boldsymbol{\mu} - r_f \mathbf{1})^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \tag{35}

Thus, we have a geometric interpretation of the Sharpe ratio: it is reward (expected excess return) per unit risk (standard deviation) on the risk–return spectrum. A higher Sharpe is a steeper slope, meaning more reward for the same risk. The right-hand side of Equation 3535 is just the Sharpe ratio of any portfolio along the linear efficient frontier.

Sharpe-maximizing portfolio

So all portfolios on the linear efficient frontier have the same Sharpe (Equation 3535). A natural question to ask at this point is: since portfolios on the hyperbolic efficient frontier have varying Sharpe ratios, which one has maximum Sharpe?

To find this portfolio, we just need to compute

μpmax=arg ⁣maxx{xrfσpmin},(36) \mu_p^{\textsf{max}} = \arg\!\max_{x} \left\{ \frac{x - r_f}{\sigma_p^{\textsf{min}}} \right\}, \tag{36}

where σpmin\sigma_p^{\textsf{min}} is the minimum variance (Equation 2222). We can drop the determinant since it does not depend on xx, giving us the following optimization problem:

μpmax=arg ⁣maxx{xrfs11x22s1μx+sμμ}.(37) \mu_p^{\textsf{max}} = \arg\!\max_{x} \left\{ \frac{x - r_f}{\sqrt{s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu}}} \right\}. \tag{37}

Again, we take the derivative, set it equal to zero, and solve for xx. The first-order condition is:

x[(xrf)(s11x22s1μx+sμμ)1/2]=0.(38) \frac{\partial}{\partial x} \left[ (x - r_f) (s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu})^{-1/2} \right] = 0. \tag{38}

Using the product rule, we get:

(s11x22s1μx+sμμ)1/212(2s11x2s1μ)(xrf)(s11x22s1μx+sμμ)3/2=0.(39) (s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu})^{-1/2} - \frac{1}{2} \frac{(2 s_{11} x - 2 s_{1\mu}) (x - r_f)}{(s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu})^{3/2}} = 0. \tag{39}

We can eliminate the nn-roots by multiplying both sides of the equation by the denominator in the rightmost term in Equation 3939 and then simplifying:

0=(s11x22s1μx+sμμ)(s11xs1μ)(xrf)0=s11x22s1μx+sμμ(s11x2s1μxs11rfx+s1μrf)0=s11x22s1μx+sμμs11x2+s1μx+s11rfxs1μrfs1μx+s11rfx=sμμ+s1μrfx=sμμ+s1μrfs1μ+s11rf.(40) \begin{aligned} 0 &= (s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu}) - (s_{11} x - s_{1\mu})(x-r_f) \\ 0 &= s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu} - (s_{11} x^2 - s_{1\mu} x - s_{11} r_f x + s_{1\mu} r_f) \\ 0 &= \cancel{s_{11} x^2} - 2 s_{1\mu} x + s_{\mu\mu} - \cancel{s_{11} x^2} + s_{1\mu} x + s_{11} r_f x - s_{1\mu} r_f \\ -s_{1\mu} x + s_{11} r_f x &= -s_{\mu\mu} + s_{1\mu} r_f \\ x &= \frac{-s_{\mu\mu} + s_{1\mu} r_f}{-s_{1\mu} + s_{11} r_f}. \end{aligned} \tag{40}

Thus, we have shown that the optimal portfolio weights are:

μpmax=sμμrfs1μs1μrfs11.(41) \mu_p^{\textsf{max}} = \frac{s_{\mu\mu} - r_f s_{1\mu}}{s_{1\mu} - r_f s_{11}}. \tag{41}

Let’s plug this into Equation 1515, since these are the optimal weights for a portfolio on the quadratic efficient frontier. The only place that μpmax\mu_p^{\textsf{max}} appears is in u\mathbf{u}. Let’s compute just M1u\mathbf{M}^{-1} \mathbf{u} first, since it is tedious. To be clear, we want to compute

M1u=1d[s11s1μs1μsμμ][sμμrfs1μs1μrfs111].(42) \mathbf{M}^{-1} \mathbf{u} = \frac{1}{d} \begin{bmatrix} s_{11} & -s_{1\mu} \\ -s_{1\mu} & s_{\mu\mu} \end{bmatrix} \begin{bmatrix} \frac{s_{\mu\mu} - r_f s_{1\mu}}{s_{1\mu} - r_f s_{11}} \\ 1 \end{bmatrix}. \tag{42}

Let’s compute each component in the resultant 22-vector separately. The first component is

1d[s11(sμμrfs1μs1μrfs11)s1μ]=1d[s11sμμrfs11s1μs1μ2+rfs11s1μs1μrfs11]=1d[s11sμμs1μ2s1μrfs11]=1s1μrfs11.(43) \begin{aligned} & \frac{1}{d} \left[ s_{11} \left( \frac{s_{\mu\mu} - r_f s_{1\mu}}{s_{1\mu} - r_f s_{11}} \right) - s_{1\mu} \right] \\ &= \frac{1}{d} \left[ \frac{s_{11} s_{\mu\mu} - r_f s_{11} s_{1\mu} - s_{1\mu}^2 + r_f s_{11} s_{1\mu}}{s_{1\mu} - r_f s_{11}} \right] \\ &= \frac{1}{d} \left[ \frac{s_{11} s_{\mu\mu} - s_{1\mu}^2}{s_{1\mu} - r_f s_{11}} \right] \\ &= \frac{1}{s_{1\mu} - r_f s_{11}}. \end{aligned} \tag{43}

The second component is

1d[s1μ(sμμrfs1μs1μrfs11)sμμ]=1d[s1μsμμ+rfs1μ2+sμμs1μrfs11sμμs1μrfs11]=1d[rfs1μ2rfs11sμμs1μrfs11]=1d[rf(s11sμμs1μ2)s1μrfs11]=rfs1μrfs11.(44) \begin{aligned} & \frac{1}{d} \left[ -s_{1\mu} \left( \frac{s_{\mu\mu} - r_f s_{1\mu}}{s_{1\mu} - r_f s_{11}} \right) - s_{\mu\mu} \right] \\ &= \frac{1}{d} \left[ \frac{-s_{1\mu} s_{\mu\mu} + r_f s_{1\mu}^2 + s_{\mu\mu} s_{1\mu} - r_f s_{11} s_{\mu\mu}}{s_{1\mu} - r_f s_{11}} \right] \\ &= \frac{1}{d} \left[ \frac{r_f s_{1\mu}^2 - r_f s_{11} s_{\mu\mu}}{s_{1\mu} - r_f s_{11}} \right] \\ &= \frac{1}{d} \left[ \frac{-r_f (s_{11} s_{\mu\mu} - s_{1\mu}^2)}{s_{1\mu} - r_f s_{11}} \right] \\ &= \frac{-r_f}{s_{1\mu} - r_f s_{11}}. \end{aligned} \tag{44}

Now putting these two rows into a vector and right-multiplying it by Σ1U\boldsymbol{\Sigma}^{-1} \mathbf{U}, we get

w=Σ1UM1u=[Σ1μΣ11][1s1μrfs11rfs1μrfs11]=Σ1μs1μrfs11rfΣ11s1μrfs11=Σ1(μrf1)1Σ1μrf1Σ11=Σ1(μrf1)1Σ1(μrf1).(45) \begin{aligned} \mathbf{w} &= \boldsymbol{\Sigma}^{-1} \mathbf{U} \mathbf{M}^{-1} \mathbf{u} \\ &= \begin{bmatrix} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} & \boldsymbol{\Sigma}^{-1} \mathbf{1} \end{bmatrix} \begin{bmatrix} \frac{1}{s_{1\mu} - r_f s_{11}} \\ \frac{-r_f}{s_{1\mu} - r_f s_{11}} \end{bmatrix} \\ &= \frac{\boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}}{s_{1\mu} - r_f s_{11}} - \frac{r_f \boldsymbol{\Sigma}^{-1} \mathbf{1}}{s_{1\mu} - r_f s_{11}} \\ &= \frac{\boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}{\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} - r_f \mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}} \\ &= \frac{\boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}{\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \end{aligned} \tag{45}

Thus, the portfolio weights which maximizes the Sharpe ratio on the hyperbolic efficient frontier are

wmaxΣ1(μrf1)1Σ1(μrf1).(46) \mathbf{w}^{\textsf{max}} \triangleq \frac{\boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}{\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \tag{46}

Tangency portfolio

Now let’s derive the weights of the portfolio that sits at the intersection of hyperbolic and linear efficient frontiers. This is called the tangency portfolio, and we’ll see why in the next section. The point for this section is that the tangency portfolio has the same weights as in Equation 4646, i.e. the tangency portfolio maximizes the Sharpe ratio.

By definition, the tangency portfolio is fully invested in risky assets and has no stake in rfr_f. But since it has the weight wfw_f, it must sit at the intersection of the hyperbolic and linear efficient frontiers. Let ω\boldsymbol{\omega} denote this (N+1)(N+1)-vector of weights:

ω=[wwf].(47) \boldsymbol{\omega} = \begin{bmatrix} \mathbf{w} \\ w_f \end{bmatrix}. \tag{47}

So while ω1=1\boldsymbol{\omega}^{\top} \mathbf{1} = 1, we know that wf=0w_f = 0. Thus, we can use the result from Equation 3232 to write:

1=1ω=1w=μ~p(1Σ1μ~μ~Σ1μ~),(48) 1 = \mathbf{1}^{\top} \boldsymbol{\omega} = \mathbf{1}^{\top} \mathbf{w} = \tilde{\mu}_p \left( \frac{\mathbf{1} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right), \tag{48}

which implies

μ~p=μ~Σ1μ~1Σ1μ~.(49) \tilde{\mu}_p = \frac{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\mathbf{1} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}. \tag{49}

Equation 4949 is a closed-form solution for the expected excess return of the tangency portfolio. And we can plug this excess return back into our equation for the portfolio weights (again Equation 3232) to get

wtp=(μ~Σ1μ~1Σ1μ~)(Σ1μ~μ~Σ1μ~)=Σ1μ~1Σ1μ~=Σ1(μrf1)1Σ1(μrf1)=wmax(50) \begin{aligned} \mathbf{w}^{\textsf{tp}} &= \left( \frac{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\mathbf{1} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right) \left( \frac{\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\tilde{\boldsymbol{\mu}}^{\top} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \right) \\ &= \frac{\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}}{\mathbf{1} \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\mu}}} \\ &= \frac{\boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}{\mathbf{1} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})} \\ &= \mathbf{w}^{\textsf{max}} \end{aligned} \tag{50}

Here, I’ve written wtp\mathbf{w}^{\textsf{tp}} to emphasize that this derivation only holds if we assume wf=0w_f = 0, i.e. that these weights are the tangency portfolio. And we see that these weights wtp\mathbf{w}^{\textsf{tp}} are equal to the weights wmax\mathbf{w}^{\textsf{max}} for the Sharpe-maximizing portfolio.

Tangent line at tangency portfolio

Finally, let’s see why the tangency portfolio has the name it does. We will compute the tangent line at the tangency portfolio and show that its slope is equal to the slope of the linear efficient frontier (Figure 77). This will prove that the two lines (the linear EF and the tangent line at the tangency portfolio) are collinear, since we already know that the tangency portfolio is on the linear efficient frontier.

Figure 7. The tangent line at the tangency portfolio (μpmax,σpmin)(\mu_p^{\textsf{max}}, \sigma_p^{\textsf{min}}) has a slope equal to the slope of the linear efficient frontier (EF). This point is the intersection of the hyperbolic and efficient frontiers.

Let’s see that the slope of the tangent line at (μpmax,σpmin)(\mu_p^{\textsf{max}}, \sigma_p^{\textsf{min}}) is indeed Equation 3535. To find the slope of the tangent line of a function

y=f(x)+b(51) y = f(x) + b \tag{51}

at a point (x1,y1)(x_1, y_1), we need to compute the derivative at x1x_1, i.e. compute f(x1)f^{\prime}(x_1).

The derivative of Equation 2323 is

f(μ)=s11μs1μd(s11μ22s1μμ+sμμ).(52) f^{\prime}(\mu) = \frac{s_{11} \mu - s_{1\mu}}{\sqrt{d (s_{11} \mu^2 - 2 s_{1\mu} \mu + s_{\mu\mu})}}. \tag{52}

To find the slope of the tangent line at the tangency portfolio, we simply plug in μpmax\mu_p^{\textsf{max}}. For simplicity, since the derivations are tedious, let’s call this optimal value xx. Then we have

f(x)=s11xs1μd(s11x22s1μx+sμμ).(53) f^{\prime}(x) = \frac{s_{11} x - s_{1\mu}}{\sqrt{d (s_{11} x^2 - 2 s_{1\mu} x + s_{\mu\mu})}}. \tag{53}

The numerator is

s11(sμμrs1μs1μrs11)s1μ(s1μrs11s1μrs11)=s11sμμrs1μs11s1μ2+rs1μs11s1μrs11=s11sμμs1μ2s1μrs11=ds1μrs11.(54) \begin{aligned} s_{11} \left( \frac{s_{\mu\mu} - rs_{1\mu}}{s_{1\mu} - rs_{11}} \right) - s_{1\mu} \left( \frac{s_{1\mu} - rs_{11}}{s_{1\mu} - rs_{11}} \right) &= \frac{s_{11}s_{\mu\mu} - rs_{1\mu}s_{11} - s_{1\mu}^2 + rs_{1\mu}s_{11}}{s_{1\mu} - rs_{11}} \\ &= \frac{s_{11}s_{\mu\mu} - s_{1\mu}^2}{s_{1\mu} - rs_{11}} \\ &= \frac{d}{s_{1\mu} - rs_{11}}. \end{aligned} \tag{54}

Now let’s compute the quadratic term in the denominator of Equation 5353. This is

s11x22s1μx+sμμ=s11(sμμrs1μs1μrs11)22s1μ(sμμrs1μs1μrs11)+sμμ=s11(sμμrs1μ)22s1μ(sμμrs1μ)(s1μrs11)+sμμ(s1μrs11)2(s1μrs11)2.(55) \begin{aligned} & s_{11}x^2 - 2s_{1\mu}x + s_{\mu\mu} \\ &= s_{11} \left( \frac{s_{\mu\mu} - rs_{1\mu}}{s_{1\mu} - rs_{11}} \right)^2 - 2s_{1\mu} \left( \frac{s_{\mu\mu} - rs_{1\mu}}{s_{1\mu} - rs_{11}} \right) + s_{\mu\mu} \\ &= \frac{s_{11} (s_{\mu\mu} - rs_{1\mu})^2 - 2s_{1\mu}(s_{\mu\mu} - rs_{1\mu})(s_{1\mu}-r s_{11}) + s_{\mu\mu}(s_{1\mu} - rs_{11})^2}{(s_{1\mu} - rs_{11})^2}. \end{aligned} \tag{55}

We can see here that the denominator in Equation 5454 will cancel with the denominator in Equation 5555. So we only need to simplify the numerator in Equation 5555. This is

s11(sμμ22s1μsμμr+s1μ2r2)2s1μ(s1μsμμs1μ2rsμμs11r+s1μs11r2)+sμμ(s1μ22s1μs11r+s112r2)(56) \begin{aligned} & s_{11}(s_{\mu\mu}^2 - 2 s_{1\mu}s_{\mu\mu}r + s_{1\mu}^2 r^2) \\ &\quad - 2s_{1\mu}(s_{1\mu}s_{\mu\mu} - s_{1\mu}^2 r - s_{\mu\mu}s_{11}r + s_{1\mu}s_{11}r^2) \\ &\quad + s_{\mu\mu}(s_{1\mu}^2 - 2s_{1\mu}s_{11}r + s_{11}^2 r^2) \end{aligned} \tag{56}

Let’s simplify by distributing, canceling, and combining like terms:

s11sμμ22s1μsμμs11r+s1μ2s11r22s1μ2sμμ+2s1μ3r+2s1μsμμs11r2s1μ2s11r2)+s1μ2sμμ2s1μsμμs11r+sμμs112r2.(57) \begin{aligned} & s_{11}s_{\mu\mu}^2 - \cancel{2s_{1\mu}s_{\mu\mu}s_{11}r} + \underline{s_{1\mu}^2s_{11}r^2} \\ &\quad - \boxed{2s_{1\mu}^2s_{\mu\mu}} + 2s_{1\mu}^3r + \cancel{2s_{1\mu}s_{\mu\mu}s_{11}r} - \underline{2s_{1\mu}^2s_{11}r^2}) \\ &\quad + \boxed{s_{1\mu}^2s_{\mu\mu}} - 2s_{1\mu}s_{\mu\mu}s_{11}r + s_{\mu\mu}s_{11}^2r^2. \end{aligned} \tag{57}

Simplifying further, we get

s11sμμ2s1μ2s11r2s1μ2sμμ+2s1μ3r2s1μsμμs11r+sμμs112r2=(sμμs112s1μ2s11)r22(s1μsμμs11s1μ3)r+(s11sμμ2s1μ2sμμ)=s11(sμμs11s1μ2)r22s1μ(sμμs11s1μ2)r+sμμ(s11sμμs1μ2)=d(s11r22s1μr+sμμ).(58) \begin{aligned} & s_{11}s_{\mu\mu}^2 - s_{1\mu}^2s_{11}r^2 - s_{1\mu}^2s_{\mu\mu} + 2s_{1\mu}^3r - 2s_{1\mu}s_{\mu\mu}s_{11}r + s_{\mu\mu}s_{11}^2r^2 \\ &= (s_{\mu\mu}s_{11}^2 - s_{1\mu}^2s_{11}) r^2 - 2(s_{1\mu}s_{\mu\mu}s_{11} - s_{1\mu}^3)r + (s_{11}s_{\mu\mu}^2 - s_{1\mu}^2s_{\mu\mu}) \\ &= s_{11} (s_{\mu\mu}s_{11} - s_{1\mu}^2) r^2 - 2 s_{1\mu}(s_{\mu\mu}s_{11} - s_{1\mu}^2)r + s_{\mu\mu} (s_{11}s_{\mu\mu} - s_{1\mu}^2) \\ &= d (s_{11} r^2 - 2 s_{1\mu} r + s_{\mu\mu}). \end{aligned} \tag{58}

We can see that things should start canceling, giving us

σμμ=x=(ds1μrs11)/d(d(s11r22s1μr+sμμ)(s1μrs11)2)=1s11r22s1μr+sμμ.(59) \begin{aligned} \frac{\partial \sigma}{\partial \mu} \bigg|_{\mu = x} &= \left( \frac{d}{s_{1\mu} - rs_{11}} \right) \bigg/ \sqrt{d \left( \frac{d (s_{11} r^2 - 2 s_{1\mu} r + s_{\mu\mu})}{(s_{1\mu} - rs_{11})^2} \right)} \\ &= \frac{1}{\sqrt{s_{11} r^2 - 2s_{1\mu}r + s_{\mu\mu}}}. \end{aligned} \tag{59}

Thus, we have derived that the slope of the tangent line at the tangency portfolio, since it is the inverse of Equation 5959:

μσ=s11r22s1μr+sμμ=(1Σ11)rf22(μΣ11)rf+μΣ1μ=(μrf1)Σ1(μrf1).(60) \begin{aligned} \frac{\partial \mu}{\partial \sigma} &= \sqrt{s_{11} r^2 - 2s_{1\mu}r + s_{\mu\mu}} \\ &= \sqrt{(\mathbf{1}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}) r_f^2 - 2 (\boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \mathbf{1}) r_f + \boldsymbol{\mu}^{\top} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}} \\ &= \sqrt{(\boldsymbol{\mu} - r_f \mathbf{1})^{\top} \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - r_f \mathbf{1})}. \end{aligned} \tag{60}

And this is the slope of the linear efficient frontier (Equation 3535).

Conclusion

To summarize, we have proven the geometric facts implicit in Figure 11. When only considering risky assets, the Markowitz bullet is a hyperbola because the portfolio variance is a quadratic function of the portfolio’s expected return. When also considering a risk-free asset whose rate of return is less than the expected return of any portfolio on the hyperbolic frontier, the Markowitz bullet is a piecewise linear function with a vertex at the risk-free rate.

Put differently, any portfolio on the linear efficient frontier has the same Sharpe and this Sharpe is optimal! While I didn’t discuss this here, this idea is closely related to the mutual fund separation theorems in (Merton, 1972). The tangency portfolio sits at the intersection of these two efficient frontiers and has maximum Sharpe out of all risky portfolios. We call it the “tangency portfolio” because the tangent line at this vertex is collinear with the linear efficient frontier. Thus, holding a single risk-free asset or holding the tangency portfolio—or any linear combination of the two—have the same Sharpe ratio. However, the tangency portfolio has a higher expected return.

In a future post, I’ll discuss the capital asset pricing model (CAPM). As I understand it now, the main argument of the CAPM is that the tangency portfolio must be the market portfolio, or a portfolio that holds assets in proportion to the market. Thus, the CAPM argues that the market is “efficient” in the sense that it has maximum return per unit risk.

   

Acknowledgments

Thanks to Christopher Jordan-Squire and Đồng Khau Tú for pointing out mistakes in this post. In particular, Christopher observed that the efficient frontier in mean-standard deviation space is a hyperbola, not a parabola.

   

Appendix

A1. Solving for w\mathbf{w} numerically

def get_ef_port_numerically(rets, covm, targ):
    """Solve for the efficient frontier weights for a given expected return
    vector `rets`, covariance matrix `covm`, and expected portfolio return
    `targ`.
    """
    def objective(weights):
        return weights.T @ covm @ weights - targ * rets.T @ weights

    norm_constraint = lambda weights: 1 - weights.sum()
    targ_constraint = lambda weights: np.dot(rets, weights) - targ

    resp = minimize(objective,
                    x0=np.random.dirichlet([1]*len(rets)),
                    method='SLSQP',
                    bounds=[(-2, 2)]*5,
                    constraints=[
                        {'type': 'eq', 'fun': norm_constraint},
                        {'type': 'eq', 'fun': targ_constraint}
                    ])
    weights = resp.x

    return weights

A2. Solving for w\mathbf{w} analytically

def get_ef_port_analytically(rets, covm, targ):
    """Solve for the efficient frontier weights for a given expected return
    vector `rets`, covariance matrix `covm`, and expected portfolio return
    `targ`.
    """
    N = rets.shape[0]
    u = np.array([targ, 1])[:, None]
    U = np.vstack([rets, np.ones_like(rets)]).T

    covm_inv = np.linalg.solve(covm, np.eye(N))
    M        = U.T @ covm_inv @ U
    M_inv    = np.linalg.solve(M, np.eye(2))
    weights  = covm_inv @ U @ M_inv @ u

    return weights

A3. Solving for σp2\sigma_p^2 directly from μp\mu_p

def get_sigma_from_mu(rets, covm, means):
    """Solve for portfolio variances `vars_` for every value in a vector
    `means`, given expected return vector `rets` and covariance matrix `covm`.
    """
    N        = len(rets)
    ones     = np.ones_like(rets)
    covm_inv = np.linalg.solve(covm, np.eye(N))

    a = ones.T @ covm_inv @ ones
    b = ones.T @ covm_inv @ rets
    c = rets.T @ covm_inv @ rets
    d = a*d - b*c

    vars_ = (1/d) * np.sqrt(a*means**2 - 2*b*means + c)
    return vars_
  1. Merton, R. C. (1972). An analytic derivation of the efficient portfolio frontier. Journal of Financial and Quantitative Analysis, 7(4), 1851–1872.
  2. Petersen, K. B., Pedersen, M. S., & others. (2008). The matrix cookbook. Technical University of Denmark, 7(15), 510.