Boxplots are used to represent quantitative data. Compare to histograms, boxplots are effective for identifying outliers and comparing distributions across subgroups.

comparative boxplot.jpeg

A boxplots follow the 5-data summary:

  • min (except outliers)
  • first quartile (i.e. 25% percentile)
  • median
  • third quartile (75% percentile)
  • max (except outliers)

Outliers

Outliers (data outsides the fences) are labeled in boxplot as dots. they are calculated using the interquartile range (IQR):
Lower Fence: Q1 - 1.5 * IQR Upper Fence: Q3 + 1.5 * IQR Any data point below the lower fence or above the upper fence is considered an outlier. boxplot_IQR.png

Intuition

Box plot can be counter-intuitive. Keep in mind that in a box plot, longer box or whisker segments do not indicate larger quantities; all four segments actually represent the same amount of data. Also, counterintuitively, shorter segments in a box plot signify higher densities of values.

box-plot-vs-histogram-w-callouts.webp

Critics argue that box plots are not intuitive and can overlook important details, and encourage the use of alternatives such as strip plot 1

Footnotes

  1. I’ve Stopped Using Box Plots. Should You? | Nightingale