An Outlier Doesn't Move All Statistics
Recompute all four for the salary data:
- Mean and SD — jump dramatically (non-resistant)
- Median and IQR — barely move (resistant)
"The outlier messes up the data" is too crude — it wrecks only mean and SD.
The Standard Deviation Misleads Too
The owner's $500k also inflates the SD.
- Values now sit far from the inflated mean → SD balloons
- A spread claim based on SD would also mislead
Among the nine workers, salaries are actually quite consistent.
Misleading Versus Honest: Two Conclusions
Same salary data, two stories:
- Misleading (mean): "The average salary is $90,000."
- Honest (median): "A typical employee earns about $50,000."
The outlier's danger is the false story a non-resistant statistic tells.
Your Turn: Correct the Interpretation
Clinic waits: most 10–20 minutes, one emergency at 180 minutes. A report says "average wait is 35 minutes."
Explain why that misleads, then write the corrected interpretation.
Use a resistant statistic, then advance.
Answer: The 180-minute outlier inflated the mean. A typical patient waits about 15 minutes.
So Should You Just Delete It?
An outlier distorts the story — so delete it?
- That instinct is wrong — and avoiding it is the key judgment
- Some outliers are genuine, valuable information
Don't delete first. Investigate first — find out where it came from.
Outliers Are Not Automatically Errors
An outlier is just an unusual value — and unusual values come in two kinds:
- A genuine error (typo, broken sensor)
- A real, rare event that actually happened
You can't tell which without investigating. Erasing a real value falsifies data.
Two Cases, Two Different Calls
- 7-inch adult height → almost certainly a typo → error
- 108° summer day → a real, rare heat wave → keep it
Same flag, opposite decisions — because the source differs.
The Four-Step Outlier Decision Protocol
When you find an outlier:
- Investigate — where did the value come from?
- Decide — error or legitimate rare value?
- Document — record the choice and why
- Report — state its effect on your conclusion
This replaces impulsive deletion with a defensible, transparent choice.
How to Report a Kept Outlier
If you keep a legitimate outlier, report it carefully:
- Use resistant statistics (median, IQR) for the typical case
- Report the outlier separately as a flagged observation
Typical highs are around 90°, with one record day at 108°.
The typical case stays honest; the rare event stays visible.
Your Turn: Keep or Remove?
Reaction times: mostly 200–400 ms, one entry reads 9000 ms (nine seconds).
Decide keep or investigate, and justify in one sentence.
Name what you'd check. Decide, then advance.
Answer: Investigate — 9000 ms is implausibly long; check whether the subject was distracted or the timer glitched before keeping or removing.
Full Task: Run the Whole Protocol
Given a data set with one outlier in context:
- Investigate its likely source
- Decide error or legitimate
- Document with a justification
- Report its effect on your conclusion
Do the whole protocol unaided.
Answer: A documented keep-or-remove decision, with resistant statistics and a flag if kept.
Key Takeaways From This Lesson
✓ An outlier hits mean and SD, not median and IQR
✓ The wrong statistic tells a misleading story
✓ Investigate, decide, document, report
Outliers can be errors or real values
Keep a real outlier, but flag it separately
Next: fitting data to a normal model.