CFA® charterholders know how to interpret regression output to conclude whether their estimates and forecasts are reliable to support their investment recommendations, decisions or actions.

This RapidDigest only includes what is covered in the 2021 CFA® Curriculum Readings (Readings 4 and 5).

Trademark Disclaimer: CFA Institute does not endorse, promote, or warrant the accuracy or quality of knowell.rapidquizzer.com. CFA® and Chartered Financial Analyst® are registered trademarks owned by CFA Institute.

Regression Analysis

Refer to Linear Regression (Assumptions) for the sample data underlying this regression analysis.

Microsoft Excel-generated regression output of the Application to Real-World Data

Top Panel: Simple Linear Regression | Bottom Panel: Multiple Linear Regression

Simple and Multiple Linear Regression

Simple Linear Regression


 

Y = inflation | X = M3 money supply

Model Specification

    \[Inflation_i=-12.42669+(0.1219418)X_i\]

Slope Coefficient
  • Positive coefficient is expected as you learned in CFA® Exam Level I Economics of the monetary policy transmission mechanism from money supply to inflation.
  •  The sign also reflects the quantity theory of money M x V = P x Y where V (velocity of money) is constant and Y is unaffected because of money neutrality leaving M (money supply) and P (prices or inflation) remaining, which is our regression model.
Predicted 2018 inflation

    \[Inflation_{Israel}=-12.42669+(0.1219418)(115.8)=1.70\]

    \[Inflation_{Russia}=-12.42669+(0.1219418)(125.7)=2.91\]

Prediction Error

    \[\epsilon_i=Y - \widehat{Y}\]

 

    \[\epsilon_{Israel}=0.80 - 1.70= -0.90\]

    \[\epsilon_{Russia}=2.90 - 2.91 = 0.01\]

Observed Values
  • Israel below the regression line.
  • Russia on the regression line.
 Model Hypothesis
  • Null Hypotheses: that Y (inflation) has no relationship with X (M3 money supply)

    \[H_0_{intercept}: b_0 = 0\]

    \[H_0_{slope}: b_1 = 0\]

  • Alternative Hypotheses: that Y (inflation) has positive relationship with X (M3 money supply)

    \[H_a_{intercept}: b_0 \ne 0\]

    \[H_a_{slope}: b_1 \ne 0\]

  • Microsoft Excel regression output is a two-tailed hypothesis test.
  • Set up alternative hypothesis as the "hoped for" or suspected condition (e.g. Ha: slope > 0) if strongly believed.
           t-test statistics on regression coefficients (intercept and slope)

    \[t_{{b_0}_{statistic}} = \frac{\widehat{b}_0 - b_0}{s_{\widehat{b}_0}}\]

 

    \[t_{{b_0}_{statistic}} = \frac{-12.43 - 0}{3.38}=-3.67\]

    \[t_{{b_1}_{statistic}} = \frac{\widehat{b}_1 - b_1}{s_{\widehat{b}_1}}\]

    \[t_{{b_1}_{statistic}} = \frac{0.122 - 0}{0.029}}=4.21\]

t-test decision

 

at 5% significance level

tα/2,n-2 = t0.05/2,8-2=

t0.025,6 = 2.447

  • reject (statistically significant): | t-statistic |  > critical t value

reject b0 = 0: | -3.67 | > 2.447 

reject b1 = 0: | 4.21 | > 2.447 

Confidence Interval at 5% significance level

 

critical tα/2,n-2 = t0.05/2,8-2= t0.025,6 = 2.447

  • specifies the range of values within which the true parameter value falls at a given significance level

    \[\widehat{b}_0 \pm t_c s_{\widehat{b}_0}\]


-12.42268596 -/+ (2.447)(3.382329853) = -20.69894895 to -4.146422975

    \[\widehat{b}_1 \pm t_c s_{\widehat{b}_1}\]


0.121941844 -/+ (2.447)(0.028943608) = 0.051119385 to 0.192764302

  • reject (statistically significant) if the hypothesized 0 is outside the interval
p-value test
  • lowest significance level at which to reject the null hypothesis
  • reject (statistically significant): p-value

reject b0 = 0: 0.010419245

reject b1 = 0.005603983

Coefficient of Determination (R2)
  • measures Y variation explained by X 

    \[R^2=\frac{RegressionSS}{TotalSS}= \frac{\sum^n_{i=1}(\widehat{Y}_i - \overline{Y})^2}{\sum^n_{i=1}(Y_i - \overline{Y})^2}\]

    \[R^2 = \frac{3.384648295}{4.52875}=0.747369207\]

Correlation Coefficient (R)
  • measures degree of linear relationship between X and Y

    \[R= \sqrt{R^2}\]


    \[0.86450518 = \sqrt{0.747369207}\]

           ANOVA F-test statistic (only has one-tail)
  • F-statistic = (t-statistic of slope coefficient)2

17.75007386 = (4.213083652)2

  • redundant to t-test in simple linear regression

    \[F_{statistic}=\frac{RSS/1}{SSE/(n-2)}=\frac{MeanRegressionSumOfSquares}{MeanSquaredError}\]

    \[F_{statistic}=\frac{MRS}{MSE}=\frac{\biggl[\sum^n_{i=1}(\widehat{Y}_i - \overline{Y})^2\biggr]/1}{\biggl[\sum^n_{i=1}(Y_i - \widehat{Y})^2\biggr]/(n-2)}\]


    \[F_{statistic} = \frac{ExplainedVariation}{UnexplainedVariation}\]

    \[17.75007386 = \frac{3.384648295/1}{1.144101705/(8-2)}\]

F-test decision at 5% significance level

 

Fα,k,n-2 = F0.05,1, 8-2=

F0.05,1,6 = 5.99

  • reject (statistically significant): | F-statistic |  > critical F value

reject b1 = 0: | 17.75 | > 5.99 

Significance F at 5% alpha
  • lowest significance level at which to reject the null hypothesis that the regression model is not statistically significant
  • identical to p-value of slope coefficient = 0.005603983
  • reject if p-value
Standard Error of Regression
  • standard deviation of residuals

    \[\sigma_{\epsilon}=\sqrt{\frac{ResidualSumOfSquares}{n-2}}\]


    \[\sigma_{\epsilon}=\sqrt{\frac{1.144101705}{8-2}} = 0.436673353\]

  • assessed relative to units of Y (smaller is better)
Model Conclusion
  • subject to breaches of regression assumptions and their correction as discussed in Linear Regression (Heteroskedasticity and Serial Correlation)
  • reliable regression model (significant up to 0.5% alpha)
  • Caveat (not in CFA® Curriculum Reading): Sample size should be at least 10 for every independent variable.
Manual Regression CFA-Exam-Level-2-Linear-Regression-1

 

    \[Slope = b_1 = \frac{Covar(X,Y}{Var(X)}=\frac{\biggl[\sum^n_{i=1}(X_i - \overline{X})(Y_i - \overline{Y})\biggr]/(n-1)}{\biggl[\sum^n_{i=1}(X_i - \overline{X})^2\biggr]/(n-1)}\]


    \[b_1 = \frac{27.76/(8-1)}{227.62/(8-1)} = \frac{3.97}{32.52}= 0.121942\]

    \[Intercept = b_0 = \overline{Y} - b_1\overline{X}\]


    \[b_0= 1.8 - (0.121942)(116.7) = -12.42269\]

Manual ANOVA
Manual Standard Errors (optional)

    \[RegressionModel_{StandardError}=\sigma_{\epsilon}=\sqrt{\frac{ResidualSumOfSquares}{n-2}}\]


    \[\sigma_{\epsilon}=\sqrt{\frac{1.144101705}{8-2}} = 0.436673353\]


    \[Slope_{StandardError}=b_{1_{\epsilon}}=\frac{\sigma_{\epsilon}}{\sqrt{\sum^n_{i=1}(X_i-\overline{X})^2}}\]


    \[b_{1_{\epsilon}}=\frac{0.436673353}{\sqrt{227.62}}=0.028943608\]


    \[Intercept_{StandardError}=b_{0_{\epsilon}}=Regression_{\epsilon}\sqrt{\frac{1}{n}+\frac{\overline{X}^2}{\sum^n_{i=1}(X_i-\overline{X})^2}}\]


    \[b_{0_{\epsilon}}=0.436673353 \sqrt{\frac{1}{8}+\frac{116.7^2}{227.62}}=3.382329853\]

Multiple Linear Regression


Y = inflation | X1 = M3 money supply | X2 = GDP per hour worked

Model Specification

    \[Inflation_i=-12.43+(0.12)X_{1i} + (-0.06)X_{2i}\]

Slope Coefficients
  • Positive coefficient for M3 money supply (see Simple Linear Regression above)
  • Negative coefficient for GDP per hour worked because, as a measure of productivity, it reduces costs and therefore prices (inflation) as you learned in CFA® Exam Level I Economics.
  • Each partial regression coefficient value is the average change in Y (inflation) for a unit change in that independent variable, holding all other X's constant.  
Predicted 2018 inflation

    \[Inflation_{Israel}=-5.98+(0.12)(115.8)  -0.06 (103.4)=1.65\]

    \[Inflation_{Russia}=-5.98+(0.12)(125.7) -0.06 (104.9)=2.76\]

Prediction Error

    \[\epsilon_i=Y - \widehat{Y}\]

 

    \[\epsilon_{Israel}=0.80 - 1.65= -0.85\]

    \[\epsilon_{Russia}=2.90 - 2.76 = 0.14\]

Observed Values
  • Israel below the regression line.
  • Russia above the regression line.
 Model Hypothesis (F-test)
  • Null Hypotheses: that Y (inflation) has no dependent relationship with X1 (M3 money supply) and X2 (GDP per hour worked) simultaneously (i.e. all slope coefficients jointly = 0)

    \[H_0_{X_slopes}: b_1 = b_2 = 0\]

  • Alternative Hypotheses: that Y (inflation) has dependent relationship with X1 (M3 money supply) and/or X2 (GDP per hour worked) simultaneously and/or individually (i.e. at least one slope coefficient ≠ 0)

    \[H_a_{X_1slope}: b_1 \ne 0\]

    \[H_a_{X_2slope}: b_2 \ne 0\]

F-test statistic (only has one-tail)

    \[8.174483694 = \frac{3.468101511/2}{1.060648489/(8-3)}\]

  • reformulate the model if F-test fails
F-test decision

 

at 5% significance level

Fα,k,n-(k+1) = F0.05,2, 8-(2+1)=

F0.05,2,5 = 5.79

  • reject (statistically significant): F-statistic > critical F value

reject b1 = b2 = 0: | 8.17 | > 5.79 

Significance F at 5% alpha
  • reject if p-value

0.026545006 p-value

Model Hypotheses (t-test)
  • Null Hypotheses: that Y (inflation) has no dependent relationship with X1 (M3 money supply) and/or X2 (GDP per hour worked) individually and/or with the intercept (i.e. each coefficient = 0)

    \[H_0_{intercept}: b_0 = 0\]

    \[H_0_{X_1slope}: b_1 = 0\]

    \[H_0_{X_2slope}: b_2= 0\]

  • Alternative Hypotheses: that Y (inflation) has dependent relationship with X1 (M3 money supply) and/or X2 (GDP per hour worked) individually and/or with the intercept (i.e. each coefficient ≠ 0)

    \[H_a_{intercept}: b_0 \ne 0\]

    \[H_a_{X_1slope}: b_1 \ne 0\]

    \[H_a_{X_2slope}: b_2 \ne 0\]

t-test decision

 

at 5% significance level

tα/2,n-2 = t0.05/2,8-2=

t0.025,6 = 2.447

  • reject (statistically significant): | t-statistic |  > critical t value
  • refer to Simple Linear Regression for the t-statistic formula
  • t-statistics still significant for M3 money supply but not for GDP per hour worked and no longer for the intercept
 Coefficient of Determination (R2)
  • always increases if the new X is correlated with Y
 Adjusted R2
  • always reduces R2 by increasing the unexplained variation with removal of independent variables k in the degrees of freedom

    \[\overline{R}^2=1 - \biggl(\frac{n-1}{n-k-1}\biggr)(1-R^2)\]


    \[0.672115289=1 - \biggl(\frac{8-1}{8-2-1}\biggr)(1-0.765796635)\]

  • adding new independent variable increases R2
  • but offset by increasing k in the denominator of the fraction increasing the regression residuals that reduces R2
Standard Error of Regression
  • 0.460575399 slightly increased from 0.436673353 of simple linear regression but still relatively small compared with units of Y
 Model Conclusion
  • subject to breaches of regression assumptions and their correction as discussed in Linear Regression (Heteroskedasticity, Serial Correlation and Multicollinearity)
  • reliable regression model (significant up to 2.65% alpha)
  • Caveat (not in CFA® Curriculum Reading): Sample size should be at least 10 for every independent variable.
Key to Learning

Understand the Why

The regression model and its estimators must be statistically significant to rely on them for estimation and forecasting in business and investment.

Real-World Practical Application

https://blog.thinknewfound.com/2016/07/alphas-measurement-problem/

Legal Notice: no copyright (public domain) or copyright exception (free use) under fair dealing/fair use laws (i.e. educational use, critique, not substantial quote) and proper attribution with link to source

11030216_CFA-Exam-Level-2-Linear-Regression

 


RapidInsight: First Thing First

  • In multiple linear regression, the first thing people should look at should be the Significance F or p-value without having to obtain the F-statistic. The F-test evaluates the overall reliability of the regression model.
  • In simple linear regression, the Significance F is the square of the t-statistic of the slope coefficient so the F-test is redundant to the t-test. Therefore, the first thing to look at is the p-value of the slope coefficient without the need to calculate the t-statistic or specifying the significance level. The p-value is 0 so the model is impossibly reliable (i.e. cannot reject the null hypothesis that  there is zero alpha for value stocks at whatever level of confidence: 95%, 90% or 99%, for example). This conclusion terminates the regression analysis. 
  • Notwithstanding the regression model is not reliable, the intercept is only significant at below 90% confidence level  (i.e. 1 - 0.1115836).

     

More fromhttps://blog.thinknewfound.com/2016/07/alphas-measurement-problem/

Legal Notice: no copyright (public domain) or copyright exception (free use) under fair dealing/fair use laws (i.e. educational use, critique, not substantial quote) and proper attribution with link to source

11030216_CFA-Exam-Level-2-Linear-Regression


RapidInsight: Know Thy Number

  • The article mentions two regression assumptions: uncorrelated error term (independent from month to month) and zero-mean error-term. See Linear Regression (Assumptions).
  • It further mentions alpha (intercept) is a constant that can be regarded as a random variable. While the intercept is a constant in the regression formula (Yi = α + βXi), it is still an estimate with its own standard error (i.e. 0.0011569 or 11 bps as the article put it), not the standard error of the residuals (i.e. 0.03773 or 377 bps).

RapidQuiz