3 Bias and checking assumptions
3.1 Learning outcomes
After viewing the tutorial materials and following along with the demonstration, you can:
- Screen for outliers using the boxplot rule
- Check the assumptions of normality and homogeneity of variances, using plots and tests
3.2 Datasets
This demonstration uses the dataset ReducedESS11 which you can download here. You can see a summary of the variables in the previous tutorial.
3.3 Videos
3.4 Guide to diagnostic plots
3.4.1 Boxplot

Note 3.1: How to interpret a boxplot
- Blue box spans the middle 50% of the data
- Q1 (25th percentile) distinguishes lower 25% of the sorted values
- Median (Q2/50th percentile) splits the sorted dataset in half
- Q3 (75th percentile) distinguishes the upper 25% of the sorted values
- Potential outliers are identified by the ‘boxplot rule’ whereby a value is flagged if it is smaller/larger than the value of Q1/Q3 minus/plus 1.5 times the interquartile range
- The vertical lines at the end of the whiskers show the minimum and maximum values in the absence of outliers
- If there are no visible whiskers and no outliers, then the minimum and maximum are respectively equal to Q1 and Q3
3.4.2 QQ and density/distribution plots
In the figure below you can see the impact of different types of deviation from normality on how data appear in QQ and density plots. Note the correspondence between the sample quantiles and the theoretical quantiles in the case of the (near) normal distribution, and how this correspondence varies in the skewed or non-mesokurtic distributions underneath.
