Boxplots are used to represent quantitative data. Compare to histograms, boxplots are effective for identifying outliers and comparing distributions across subgroups.
A boxplots follow the 5-data summary:
- min (except outliers)
- first quartile (i.e. 25% percentile)
- median
- third quartile (75% percentile)
- max (except outliers)
Outliers
Outliers (data outsides the fences) are labeled in boxplot as dots. they are calculated using the interquartile range (IQR):
Lower Fence: Q1 - 1.5 * IQR
Upper Fence: Q3 + 1.5 * IQR
Any data point below the lower fence or above the upper fence is considered an outlier.
Intuition
Box plot can be counter-intuitive. Keep in mind that in a box plot, longer box or whisker segments do not indicate larger quantities; all four segments actually represent the same amount of data. Also, counterintuitively, shorter segments in a box plot signify higher densities of values.
Critics argue that box plots are not intuitive and can overlook important details, and encourage the use of alternatives such as strip plot 1