In simple linear regression, or ordinary least squares (OLS) with a single explanatory variable, the model is
yn=α+βxn+εn.(1)
In a previous post on simple linear regression, I showed that the normal equations for α^ and β^ can be written in terms of Pearson’s correlation between the response and explanatory variables, ρxy:
α^β^≜yˉ−βxˉ,≜Sx2Sxy=ρxySxSy,(2)
where Sx2 and Sy2 are the un-normalized variances, and Sxy is the un-normalized covariance:
Sx2≜n=1∑N(xn−xˉ)2,Sy2≜n=1∑N(yn−yˉ)2Sxy≜n=1∑N(xn−xˉ)(yn−yˉ).(3)
In this post, I want to show how the residual sum of squares (RSS),
RSS≜n=1∑N(yn−y^n)2,(4)
can be written in terms of Pearson’s correlation as well. First, we simply expand RSS using the normal equations in Equation 2:
RSS=n=1∑N[yn−(α^+β^xn)]2=n=1∑N[yn−(yˉ−β^xˉ+β^xn)]2=n=1∑N[(yn−yˉ)−β^(xn−xˉ)]2=n=1∑N(yn−yˉ)2+β^2n=1∑N(xn−xˉ)2−2β^n=1∑N(yn−yˉ)(xn−xˉ).(5)
We can write the last line of Equation 5 in terms of correlation by applying the definitions in Equations 2 and 3:
RSS=Sy2+β^2Sx2−2β^Sxy=Sy2+(Sx2Sxy)2Sx2−2(Sx2Sxy)Sxy=Sy2−Sx2Sxy2=Sy2(1−Sx2Sy2Sxy2)=Sy2(1−ρxy2).(6)
And we’re done.