Before I do any form of statistical analysis on my data, I always like to get a good look at it. Getting familiar with the data before conducting inferential tests is a good way to understand the data.
Data exploration can take on many forms; most common is perhaps to produce means (or medians) of each group you are investigating and plot them.
This is, of course, fine. However, of late I have become interested in looking at the distributions of my results rather than just an estimate of their central tendency. Looking at just the mean—a point estimate—”throws” a lot of information away. Did all participants in each group perform equally? Were there any participants who performed substantially differently from the group (so-called “outliers”)?
Perhaps one of the most under-used plots in psychological research is the humble Boxplot (a.k.a. box & whisker plot). Below is a Boxplot generated from the statistical package R (Your good-old friend SPSS also produces these beauties!).
Boxplots are simple to interpret. The central bold line is the median of the distribution; the horizontal line above and below the medium—the lines that make up the “box”—represent the upper- and lower-quartile of your distribution of scores, respectively (an upper-quartile limit means that 75% of your observed data points fall below this line, thus giving a good representation of the “spread” of scores around your median); the boxes then sprout whiskers, which reach to the largest observed score (upper whisker) and smallest observed score (lower whisker) in your distribution.
The circles in the plot represent outliers; these are scores that are considered “extreme” in relation to the rest of your sample. (Wikipedia has a very nice article on outliers here).
There are several useful properties of Boxplots which make them a suitable candidate for inclusion in your next published report:
1) We can use Boxplots to visualise the spread of scores in our sample.
The figure below shows two samples with the same mean (a score of 5), but a difference in their standard deviations (SD); the plot on the left has a SD of 1, and the one on the right has a SD of 3. These differences are easily appreciable in a Boxplot.
2) We can use a Boxplot to show differences in central tendency.
We usually use graphs to show our reader how groups differ in their estimates of central tendency. Well, Boxplots are good for this, too! Below are two plots with the same SD, but the plot on the left has a mean of 5, and the one on the right has a mean of 8.
3) We can use Boxplots to gauge “normality” in our data.
Inferential statistics often require that your data be “normally” distributed. Boxplots are superb at allowing the researcher to gauge normality. If the box and whiskers are relatively “symmetrical” around their median, this is good evidence that your sample data is normally distributed. The plot below (left) is from a normal distribution, as evidenced by its nice symmetry; the plot on the right, however, is not from a normal distribution (this data is positively skewed; that is, more observations towards the lower end of the plot).
The versatility and wealth of descriptive-power makes the Boxplot one of the most under-appreciated methods of displaying data in all of psychology. Put this right by starting to use Boxplots more in your research! Boxplots FTW!!