H2 Maths Notes (JC 1-2): 6.6) Correlation and Linear Regression
Download printable cheat-sheet (CC-BY 4.0)07 Oct 2025, 00:00 Z
Before you revise\ Revisit scatter diagram basics and variance formulas so the transition to algebraic PMCC and regression is smooth. Keep a GC or spreadsheet handy to compute \( r \) and regression coefficients quickly.
Core Definitions
- Product-moment correlation coefficient \( r \) measures linear association between \( x \) and \( y \); \( -1 \leq r \leq 1 \).
- \( r > 0 \) indicates positive association, \( r < 0 \) indicates negative association, and \( |r| \approx 1 \) signals strong linearity.
- Least-squares regression line of \( y \) on \( x \) minimises \( \sum (y_i - \hat{y_i})^2 \) and has equation \( y - \bar{y} = b(x - \bar{x}) \), where \( b = r \frac{S_y}{S_x} \).
- \( S_x \) and \( S_y \) are sample standard deviations of \( x \) and \( y \) respectively.
Computing \( r \)
For paired data \( (x_i, y_i) \), \( r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{(n - 1) S_x S_y}).
Example -- Physics vs Maths scores
Data (Physics, Maths) for 6 students: (68, 72), (74, 75), (65, 69), (80, 83), (70, 73), (78, 81).
- Enter into GC lists and run
LinReg(ax+b)
. - Output: \( r = 0.987 \) (rounded), \( \bar{x} = 72.5 \), \( \bar{y} = 75.5 \).
- Strong positive linear association.
Regression Line and Prediction
Using the same data, GC returns regression line \( y = 0.98x + 3.6 \) (values illustrative).
- To predict Maths score when Physics = 76: substitute \( x = 76 \), obtaining \( y = 0.98 \times 76 + 3.6 = 77.0 \) (nearest whole number).
- Only predict within the range of observed \( x \) (interpolation). Extrapolation is unreliable.
Residuals and Coefficient of Determination
- Residual: \( e_i = y_i - \hat{y_i} \); plot residuals to check linear model adequacy.
- Coefficient of determination \( r^2 \) gives the proportion of variance in \( y \) explained by \( x \).
Example -- Interpretation
If \( r = 0.82 \), then \( r^2 = 0.6724 \). State: “About 67% of the variation in Maths marks is explained by Physics marks via the fitted linear model.”
Hypothesis Test for \( r \)
- Test \( H_0: \rho = 0 \) vs \( H_1: \rho \neq 0 \) using t-statistic \( T = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}} \) with \( n - 2 \) degrees of freedom.
- Critical region depends on significance level; or compute p-value via calculator.
Example -- Significance of correlation
With \( n = 12 \) data points and \( r = 0.58 \), test at 5% whether \( \rho \neq 0 \).
- \( T = \frac{0.58 \sqrt{10}}{\sqrt{1 - 0.58^2}} = 2.21 \).
- Critical value \( t_{0.975,,10} = 2.228 \).
- Since \( 2.21 < 2.228 \), fail to reject \( H_0 \); correlation not significant at 5%.
Calculator Workflows
- TI:
LinReg(ax+b)
returns \( a \) (gradient), \( b \) (intercept), \( r \), and \( r^2 \) when diagnostics are on. - Casio:
REG
mode \(\rightarrow\)LR
\(\rightarrow\)AX+B
provides coefficients and \( r \). - Use residual lists to draw scatter plots of \( x \) vs residuals; randomness indicates a good linear fit.
- Always state variables: “Let x = Physics score, y = Maths score.”
Exam Watch Points
- Label axes and highlight whether you are predicting \( y \) from \( x \) or vice versa. The regression line of \( x \) on \( y \) is different.
- Interpret \( r \) in words (“strong/weak, positive/negative”) and link back to context.
- Check units: regression line must retain units of \( y \) on the left-hand side.
- Do not claim causation; correlation only addresses association.
- Mention interpolation vs extrapolation explicitly when commenting on prediction reliability.
Quick Revision Checklist
- [ ] Compute \( r \) and regression coefficients quickly with calculator support.
- [ ] Write regression equations in the form \( y = ax + b \) and perform predictions.
- [ ] Interpret \( r^2 \) and residual plots qualitatively.
- [ ] Conduct and explain hypothesis tests for correlation with t-statistics.
Next steps: Practise past-year linear regression questions that combine with hypothesis tests or sampling intervals from Topics 6.4 and 6.5.