Cumulative Frequency and Box Plots

Probability & Statistics

📚 The Skill

Cumulative Frequency

Cumulative frequency is a running total of frequencies. It tells you how many values are less than or equal to a given value.

Class Frequency Cumulative Frequency
0-10 5 5
10-20 12 5 + 12 = 17
20-30 18 17 + 18 = 35
30-40 10 35 + 10 = 45

Cumulative Frequency Curve

Cumulative frequency curve

How to draw:

  1. Plot cumulative frequency against the upper class boundary
  2. Join points with a smooth curve (S-shape)
  3. Start the curve at the lower boundary of the first class (CF = 0)

Reading Quartiles from the Curve

For $n$ data values:

  • Lower Quartile (LQ/Q₁): Read across from $\frac{n}{4}$
  • Median (Q₂): Read across from $\frac{n}{2}$
  • Upper Quartile (UQ/Q₃): Read across from $\frac{3n}{4}$

Box Plots (Box and Whisker Diagrams)

Box plot with labels

A box plot shows five key values:

  1. Minimum — lowest value
  2. Lower Quartile (LQ) — 25th percentile
  3. Median — 50th percentile (middle value)
  4. Upper Quartile (UQ) — 75th percentile
  5. Maximum — highest value

Interquartile Range (IQR)

$$\text{IQR} = \text{UQ} - \text{LQ}$$

The IQR measures the spread of the middle 50% of the data. A smaller IQR means more consistent data.

Comparing Distributions

Comparing box plots

When comparing box plots, comment on:

  • Median: Which is higher? (average/typical value)
  • IQR: Which is smaller? (consistency)
  • Range: Which is larger? (overall spread)

Example comparison: "Class B has a higher median (57 vs 40), so performed better on average. Class B also has a smaller IQR (20 vs 30), showing more consistent results."

🚩 The Traps

Common misconceptions and how to avoid them.

⚠️

Comparing box plots without using values "The Vague Comparison"

The Mistake in Action

Compare the two distributions shown in the box plots.

Wrong: "Class A has a bigger spread. Class B did better."

Why It Happens

Students make general statements without referencing specific values from the diagrams. This loses marks for interpretation.

The Fix

Always quote values when comparing box plots:

Good comparison: "Class B has a higher median (57%) than Class A (40%), showing Class B performed better on average.

Class A has a larger IQR (30 compared to 20), meaning Class A's results were less consistent.

Class A has a larger range (70 compared to 55), showing more variation overall."

Use: median, IQR, range, minimum, maximum — with actual numbers.

Spot the Mistake

Compare the distributions

Class A has bigger spread, Class B did better

Click on the line that contains the error.

View in Misconception Museum →
⚠️

Plotting cumulative frequency at class midpoints "The Midpoint Mistake"

The Mistake in Action

For the class 20-30, plot the cumulative frequency at x = 25.

Wrong: Point plotted at (25, CF)

Why It Happens

Students are used to plotting at midpoints for frequency polygons and histograms, so they apply the same rule here.

The Fix

For cumulative frequency, plot at the upper class boundary.

The class 20-30 has upper boundary 30. Plot the cumulative frequency at x = 30, not 25.

Why? Cumulative frequency tells you how many values are less than or equal to a value. By the end of the class 20-30, you've counted everyone up to 30.

Spot the Mistake

Class 20-30

Plot at x = 25

Click on the line that contains the error.

View in Misconception Museum →
⚠️

Reading quartiles at wrong cumulative frequency values "The Quartile Confusion"

The Mistake in Action

There are 80 data values. Find the median from the cumulative frequency curve.

Wrong: Read across from CF = 50 (half of 100)

Why It Happens

Students sometimes read from the wrong value, or confuse the cumulative frequency with a percentage scale.

The Fix

For $n$ data values, read quartiles at these cumulative frequencies:

  • LQ: $\frac{n}{4}$ = $\frac{80}{4}$ = 20
  • Median: $\frac{n}{2}$ = $\frac{80}{2}$ = 40
  • UQ: $\frac{3n}{4}$ = $\frac{3 \times 80}{4}$ = 60

For 80 values, read the median from CF = 40, not 50.

Check: Does $\frac{n}{4} + \frac{n}{2} + \frac{3n}{4}$ make sense? LQ < Median < UQ ✓

Spot the Mistake

80 data values, find median

Read from CF = 50

Click on the line that contains the error.

View in Misconception Museum →

🔍 The Deep Dive

Apply your knowledge with these exam-style problems.

Level 1: Fully Worked

Complete solutions with commentary on each step.

Question

Complete the cumulative frequency table:

Score Frequency
0-10 3
10-20 7
20-30 15
30-40 12
40-50 8

Solution

Add a running total:

Score Frequency Cumulative Frequency
0-10 3 3
10-20 7 3 + 7 = 10
20-30 15 10 + 15 = 25
30-40 12 25 + 12 = 37
40-50 8 37 + 8 = 45

The final cumulative frequency (45) equals the total number of data values.

Check: 3 + 7 + 15 + 12 + 8 = 45 ✓

Question

A cumulative frequency curve shows 60 data values. Find the median and interquartile range.

Solution

Step 1: Find the positions.

  • LQ position: $\frac{60}{4}$ = 15
  • Median position: $\frac{60}{2}$ = 30
  • UQ position: $\frac{3 \times 60}{4}$ = 45

Step 2: Read from the curve. Draw horizontal lines from CF = 15, 30, and 45 to the curve, then down to the x-axis.

[Reading from a typical curve]

  • LQ (CF = 15): approximately 24
  • Median (CF = 30): approximately 35
  • UQ (CF = 45): approximately 48

Step 3: Calculate IQR. $$\text{IQR} = UQ - LQ = 48 - 24 = 24$$

Answer: Median ≈ 35, IQR ≈ 24

Question

Two classes took the same test. Compare their results.

Class A: Min=25, LQ=40, Median=52, UQ=65, Max=85 Class B: Min=35, LQ=48, Median=58, UQ=68, Max=80

Solution

Comparing medians (average): Class B has a higher median (58) than Class A (52), so Class B performed better on average.

Comparing IQRs (consistency): Class A: IQR = 65 − 40 = 25 Class B: IQR = 68 − 48 = 20

Class B has a smaller IQR, so their results were more consistent.

Comparing ranges (spread): Class A: Range = 85 − 25 = 60 Class B: Range = 80 − 35 = 45

Class A has a larger range, showing more variation in results.

Summary: Class B performed better overall (higher median) with more consistent results (smaller IQR), while Class A had more extreme results (larger range).

Level 2: Scaffolded

Fill in the key steps.

Question

Draw a box plot for the following data: Minimum = 12, LQ = 18, Median = 25, UQ = 34, Maximum = 42

Level 3: Solo

Try it yourself!

Question

A cumulative frequency curve shows test scores for 80 students. The curve shows that 60 students scored 65 or less. What percentage of students scored more than 65?

Show Solution

Students scoring 65 or less: 60

Students scoring more than 65: 80 − 60 = 20

$$\text{Percentage} = \frac{20}{80} \times 100 = 25\%$$

Answer: 25% of students scored more than 65.

👀 Examiner's View

Mark allocation: Drawing cumulative frequency: 2-3 marks. Reading values: 1-2 marks. Box plots: 2-3 marks. Comparisons: 2-3 marks.

Common errors examiners see:

  • Plotting at midpoints instead of upper boundaries
  • Miscalculating quartile positions
  • Not using a smooth curve for CF
  • Comparisons without using data/numbers

What gains marks:

  • Accurate plotting at upper class boundaries
  • Smooth S-shaped curve
  • Using $\frac{n}{4}$, $\frac{n}{2}$, $\frac{3n}{4}$ for quartiles
  • Comparisons that reference actual values

📝 AQA Notes

AQA often asks for comparisons between two box plots. Always give numbers!