IP EMaths Notes (Upper Sec, Year 3-4): 15) Statistics and Data Handling
Download printable cheat-sheet (CC-BY 4.0)08 Nov 2025, 00:00 Z
Statistics questions reward tidy tables and well-labelled graphs. Document your calculator steps so you can replicate them under exam conditions.
Quick reference
- Mean of grouped data: \( \bar{x} = frac{\sum fx}{\sum f} \).
- Median for grouped data: locate \( n/2 \) on the cumulative frequency column and interpolate within that class.
- Quartiles and percentiles: use cumulative frequency or ordered lists depending on the dataset size.
- Variance/standard deviation (for raw data): with calculator support, record the key inputs in case you need to justify the button presses.
- Correlation language: “positive”, “negative”, “no correlation”; do not say “strong cause”.
- Box-and-whisker charts compare spread and medians; comment on both when writing conclusions.
Building organised tables
- List raw data in ascending order; tally if the dataset is long.
- Build frequency and cumulative frequency columns side by side.
- For grouped data, add a midpoint column and a \( fx \) column.
- For two-variable data (x, y), include \( x^2 \), \( y^2 \), and \( xy \) columns if you need regression or product-moment correlation coefficient (PMCC) later.
Example table layout
Class | Frequency | Cumulative frequency | Midpoint | \( fx \) |
0-20 | 6 | 6 | 10 | 60 |
20-40 | 12 | 18 | 30 | 360 |
40-60 | 10 | 28 | 50 | 500 |
60-80 | 7 | 35 | 70 | 490 |
80-100 | 5 | 40 | 90 | 450 |
Total | 40 | 1860 |
Measures of central tendency
- Mean: use the \( \sum fx \) column. For raw data, type all values into the calculator’s statistics mode; state “1-VAR” on the calculator if required.
- Median: for raw data, pick the middle value(s). For grouped data, locate \( n/2 \) within the cumulative frequencies and interpolate. For the example above, \( n = 40 \), so \( n/2 = 20 \) falls in the 40-60 class: median ≈ 44 minutes (linear interpolation).
- Mode: from grouped data, quote the modal class. If the mode is needed more precisely, use the mode estimation formula (optional at IP level).
Measures of spread
- Range: highest minus lowest.
- Interquartile range (IQR): \( Q_3 - Q_1 \); less sensitive to outliers than the range.
- Variance / standard deviation: for ungrouped data, use calculator output (write down the x̄ and σ²). For grouped data, use the midpoints with the frequency column.
Worked example - grouped data (mean, median, quartiles)
Using the table above:
- Mean = \( frac{1860}{40} = 46.5 \) minutes.
- Median: cumulative frequencies 6, 18, 28, 35, 40. The 20th observation sits in the 40-60 class; interpolating gives 44 minutes to the nearest whole minute.
- Quartiles: \( Q_1 \) is the 10th observation (≈ 32 minutes), \( Q_3 \) is the 30th observation (≈ 61 minutes). Quote IQR ≈ 29 minutes.
Worked example - comparing box plots
Two classes recorded the time spent on revision (minutes per day). Box plots show:
- Class A: median 48, IQR 22.
- Class B: median 42, IQR 40.
Interpretation:
- Class A has a higher median, so its typical student spends more time revising.
- Class B has a much wider IQR, so revision times vary more; some students revise significantly more or less than the typical value.
Always comment on both location (median) and spread (IQR, range) when comparing box plots.
Worked example - scatter plot and correlation
A set of ten students recorded their hours of supervised revision (x) and mock exam scores (y). The calculator outputs warning that r = 0.86.
- State: There is a strong positive correlation between hours and mock scores.
- Interpretation: More supervised revision tends to align with higher mock scores; however, correlation does not imply that supervision alone causes the improvement. Other factors (student motivation, resources) may contribute.
- Use the line of best fit to make predictions only within the observed data range (interpolation). Extrapolation outside the range is risky.
Try this
- The heights (cm) of 50 students are grouped in classes of width 5 cm. Outline the steps you would use to estimate the mean height and the median height.
- A box plot for Class C shows median 55, lower quartile 40, upper quartile 75, minimum 25, maximum 90. Summarise what this tells you about the distribution of time spent on co-curricular activities per week.
- Calculator output for paired data gives \( \bar{x} = 6.2 \), \( \bar{y} = 72 \), \( \sigma_x = 1.4 \), \( \sigma_y = 12 \), and PMCC r = 0.15. Comment on the strength of the linear relationship.