Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

The Chemical Statistician


Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from the lack of universal agreement among statisticians on how to calculate quantiles.)  I will show that the fivenum() function uses a simpler and more interpretable method to calculate the 5-number summary than the summary() function.  This post expands on a recent comment that I made to correct an error in the post on box plots.

> y = seq(1, 11, by = 2) > y [1]  1  3  5  7  9 11 > fivenum(y) [1]  1  3  6  9 11 > summary(y)    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    …

