Repeat 500+ Times: The Distribution
The full catalog of gaps chance alone makes — the re-randomization distribution.
Why It Centers Near Zero
Every shuffle was built under no effect — so the typical gap is about zero.
- High and low scores split evenly across T and C
- Positive and negative gaps are equally likely
It centers at zero — not at our observed 4. That's the whole point.
This Is the Chance Reference
The distribution answers one question: how big a gap does luck alone make?
- Gaps near the center are routine for chance
- Gaps far in the tails are rare for chance
It's a ruler. Next we measure our 4 against it.
Describe One Shuffle; Why Centered at Zero?
On your own, write two things:
- The steps of one shuffle — pool, re-deal, recompute
- Why the distribution centers at zero, not at 4
Explain both before advancing. This is the engine.
We Have Chance's Gaps — Now Judge Ours
We know what luck makes: gaps centered at zero, mostly small.
- Our real result still sits off to the side: 4 points
- Drop the 4 onto the distribution and ask where it lands
Ordinary gap, or way out in the tail? That decides it.
The Decision Rule: Locate, Then Read the Tail
The shaded fraction = how often chance alone makes a gap this big.
The Study-Method Case: 4 Is in the Tail
Dropping 4 onto the distribution: only about 2% of shuffles reached 4 or more.
- Chance makes a gap this big only ~1 time in 50
- That's surprising — we doubt the no-effect model
We call the 4-point difference statistically significant.
A Contrasting Case That Is Not Significant
A different experiment: a 1-point gap, with 40% of shuffles reaching 1 or more.
- Chance makes a gap this big nearly half the time
- No reason to doubt the no-effect model
We do not call 1 point significant. Same test, opposite verdict.
Cause Rides on Random Assignment
Significance alone says the gap is real — not that the treatment caused it.
- Random assignment balanced the groups beforehand
- Only then can a significant gap be pinned on the treatment
Significant + randomly assigned = caused. Significant alone ≠ caused.
Not-Significant Does Not Prove No Effect
"Not significant" means: the data is consistent with no effect — not that no effect is proven.
- A small or noisy experiment can miss a real effect
- The honest conclusion: "we did not detect an effect"
Absence of evidence is not evidence of absence.
The Threshold Is a Judgment Call
- How small must the tail be? No law of nature sets the line
- 5% is a common convention — a choice, not a fact
- 2% is clearly surprising; 40% clearly isn't; borderline needs judgment
Understand what the tail means — don't just apply a cutoff.
Decide and Justify, Including Not-Significant
Both randomized. Decide significance and whether you can claim cause:
- Exercise: 5-point gap, 3% of shuffles reached 5+
- Supplement: 2-point gap, 30% reached 2+
Justify with the tail fraction. One is "did not detect."
Five Errors About Significance and Chance
Shuffling the values — only the labels move
Expecting the center on 4 — it centers at zero
"Not significant" = no effect — only consistent with it
Cause from significance alone — needs assignment
Treating the threshold as sacred — it's a judgment
Key Takeaways From Lesson Two
✓ Shuffle labels (not values) → distribution centered at zero
✓ Tail fraction = how often chance makes a gap this big
✓ Small tail → significant; here 2%, so reject no effect
Not significant = consistent with no effect, not proof
Cause needs random assignment, not just significance
Coming Up Next: Evaluating Reports
This was the unit's last analytic tool. Next, you turn it outward.
In Lesson B.6, you'll read a real data-based report — an article, an ad, a study — and judge with reasons how much of it to believe.