Residual Sum of Squares in Terms of Pearson's Correlation

I re-derive a relationship between the residual sum of squares in simple linear regresssion and Pearson's correlation coefficient.

In simple linear regression, or ordinary least squares (OLS) with a single explanatory variable, the model is

yn=α+βxn+εn.(1) y_n = \alpha + \beta x_n + \varepsilon_n. \tag{1}

In a previous post on simple linear regression, I showed that the normal equations for α^\hat{\alpha} and β^\hat{\beta} can be written in terms of Pearson’s correlation between the response and explanatory variables, ρxy\rho_{xy}:

α^yˉβxˉ,β^SxySx2=ρxySySx,(2) \begin{aligned} \hat{\alpha} &\triangleq \bar{y} - \beta \bar{x}, \\ \hat{\beta} &\triangleq \frac{S_{xy}}{S_x^2} = \rho_{xy} \frac{S_y}{S_x}, \end{aligned} \tag{2}

where Sx2S_x^2 and Sy2S_y^2 are the un-normalized variances, and SxyS_{xy} is the un-normalized covariance:

Sx2n=1N(xnxˉ)2,Sy2n=1N(ynyˉ)2Sxyn=1N(xnxˉ)(ynyˉ).(3) S_x^2 \triangleq \sum_{n=1}^N (x_n - \bar{x})^2, \qquad S_y^2 \triangleq \sum_{n=1}^N (y_n - \bar{y})^2 \qquad S_{xy} \triangleq \sum_{n=1}^N (x_n - \bar{x})(y_n - \bar{y}). \tag{3}

In this post, I want to show how the residual sum of squares (RSS),

RSSn=1N(yny^n)2,(4) \textsf{RSS} \triangleq \sum_{n=1}^N (y_n - \hat{y}_n)^2, \tag{4}

can be written in terms of Pearson’s correlation as well. First, we simply expand RSS using the normal equations in Equation 22:

RSS=n=1N[yn(α^+β^xn)]2=n=1N[yn(yˉβ^xˉ+β^xn)]2=n=1N[(ynyˉ)β^(xnxˉ)]2=n=1N(ynyˉ)2+β^2n=1N(xnxˉ)22β^n=1N(ynyˉ)(xnxˉ).(5) \begin{aligned} \textsf{RSS} &= \sum_{n=1}^N \left[ y_n - (\hat{\alpha} + \hat{\beta} x_n) \right]^2 \\ &= \sum_{n=1}^N \left[ y_n - (\bar{y} - \hat{\beta} \bar{x} + \hat{\beta} x_n) \right]^2 \\ &= \sum_{n=1}^N \left[ (y_n - \bar{y}) - \hat{\beta} (x_n - \bar{x}) \right]^2 \\ &= \sum_{n=1}^N (y_n - \bar{y})^2 + \hat{\beta}^2 \sum_{n=1}^N (x_n - \bar{x})^2 - 2 \hat{\beta} \sum_{n=1}^N (y_n - \bar{y}) (x_n - \bar{x}). \end{aligned} \tag{5}

We can write the last line of Equation 55 in terms of correlation by applying the definitions in Equations 22 and 33:

RSS=Sy2+β^2Sx22β^Sxy=Sy2+(SxySx2)2Sx22(SxySx2)Sxy=Sy2Sxy2Sx2=Sy2(1Sxy2Sx2Sy2)=Sy2(1ρxy2).(6) \begin{aligned} \textsf{RSS} &= S_y^2 + \hat{\beta}^2 S_x^2 - 2 \hat{\beta} S_{xy} \\ &= S_y^2 + \left( \frac{S_{xy}}{S_x^2} \right)^2 S_x^2 - 2 \left( \frac{S_{xy}}{S_x^2} \right) S_{xy} \\ &= S_y^2 - \frac{S_{xy}^2}{S_x^2} \\ &= S_y^2 \left(1 - \frac{S_{xy}^2}{S_x^2 S_y^2} \right) \\ &= S_y^2 \left(1 - \rho_{xy}^2 \right). \end{aligned} \tag{6}

And we’re done.