Error bars: Expressing uncertainty

Statistics is a tool that can be used for 2 purposes: 1) to describe a sample and 2) to infer things about a population based on a sample. Error bars presented on graphs can be used for both purposes.

Error bars for description:

Say we want to assess how well a teacher is performing based on student feedback. A brief questionnaire is administered to all 20 students in the class. One of the items asks about how responsive the teacher is on a scale from 0 (not at all) to 5 (very much). It would be virtually unheard of for all the scores on this item to be identical. Even if the general perception of the class is that the teacher is highly responsive, not everyone will provide the same score. There are just too many factors at play that enter into such a perception. As a result, we will not obtain a single score from all 20 students but rather a distribution of scores. Some students will respond 3, others 4 and perhaps others 5. We say that the distribution has some amount of spread or variability. In some distributions, all values will cluster very closely to the mean, whereas in other distributions, the values will be more spread out. This variability indicates the amount of uncertainty (i.e., error) there is in a measurement.

Error bars on graphs allow us to visualize this uncertainty around some central point, usually the mean. The more clustered the data points around the central point the less uncertainty there is in the data. There are 2 commonly used ways to express variability within a distribution: standard deviation and range.

In the figure below, each thick vertical green bar represents the mean level of teacher responsiveness rating for 3 different classes and the thin-capped lines represent the standard deviation of ratings in each class.

The standard deviation is the average of the squared deviations from the mean. It expresses the amount of spread about (below and above) the mean. About 67% of all data points will be found within 1 SD of the mean, 95% within 2 SDs of the mean, and 99% within 3 SDs of the mean.

In the bar graph above, notice that the error bar for class 3 is considerably larger than for class 1. This suggests that there is greater uncertainty in the mean reported for class 3 than for class 1, suggesting that the mean of xx for class 1 is a better estimate of teacher responsiveness than the mean of xx reported for class 3.

The range is quite simply the difference between the highest and lowest values. It is the interval within which all the data points can be found. Given the lowest score on the teacher’s evaluation question of 3 and the highest of 5 then the range 5 – 3 = 2. The figure below shows the mean ratings provided by respondents for 5 items on a questionnaire. The orange dots represent the mean rating and the error bars represent the range from lowest to highest rating.

Error bars for inference:

The other purpose of statistics is to make inferences regarding a target population based upon a sample. To learn about, say, the relationship between stress and pain among people with chronic pain, it would not be feasible to study the entire population of chronic pain patients. Instead we draw a (typically random) sample from among the population and study the sample. If we are to learn something about the relationship between stress and pain among people with chronic pain our concern cannot be with the sample per se. It must be with the extent to which our sample is representative of the population of chronic pain patients. But how can we know this? In other words, how can we know exactly how closely our sample reflects the population of interest on some variables of interest?

One way by which we can know this is to think about what were to happen if we conducted numerous replications of a study, each time randomly drawing a sample from the population and calculating a mean value on some variable for each replication. If we did this then the less variable the mean from sample to sample, the more confidence we can have in the estimates derived from any one sample.

Standard Error

The standard error (SE) is a way of quantifying how precise or spread out a variable will be if a study is repeated many times. It depends on 2 factors: the sample size and the standard deviation (SD) of a given sample.

where SE is the standard error, SD is the standard deviation of the sample, and n is the sample size.

The effect of sample size on SE is intuitive. The greater the size of a sample, the more representative it should be. After all, as a sample gets larger it is including more and more of the total population. Thus, smaller sample sizes will result in less precise estimates of a population, whereas as sample sizes grow larger, they will yield more precise estimates of the population. The effect of SD is also straightforward. Returning to the teacher evaluation example, if there is very little spread in the rating scores then it can be assumed that there is very little spread in the rating scores given the entire population. Certainly it is possible that we might, by chance alone, have selected a sample producing extreme values. But the fact is that this is the best we can do because our sample is all that we have and on average, it is more likely than not for this assumption to be valid. Thus, SE is directly related to how dispersed the data are in our sample and inversely related to the sample size (ie., larger sample sizes will result in smaller SEs).

Confidence Intervals

There is one other piece of the puzzle to know about: the Central Limit Theorem. It states that given a sample of sufficient size (around 30), the SE will be normally distributed. That is, if we were to conduct 30 repetitions of a study with the same sample size, and calculated a mean on some variable for each repetition, and then plotted the frequency of each of these means, we would wind up with a curve that is very close to normal (the so-called “bell curve”). We call this curve the sampling distribution of the means. Why is this important? Because if the sampling distribution of the means follows a normal distribution, we can assume that 95% of the scores will fall within 2 standard deviations of the mean. Given that SE is simply the standard deviation of the sampling distribution, we can be 95% confident that the true (population) mean will be within 2 SE of the mean.

For sample sizes greater than 10, the distribution will come very close to normal and therefore confidence intervals can be easily calculated by simply multiplying the SE by 2 because in a normal distribution 95% of all values are within 2 standard deviations of the mean.

For sample sizes less than 10, however, a critical value of t must be determined using n-1 degrees of freedom:

eq 3

The critical value of t can be obtained from t distribution tables presented in almost any textbook or you can search the web for t distribution tables.

Error bars can depict standard errors and 95% confidence intervals. The mean of the sample along with SE or CI error bars, indicates the interval within which the population mean is likely to be.

The following graph is adapted from Bingel, Schoell, Herken, Buchel and May (2007) in which mean pain intensity ratings (on a scale from 0 to 100) is plotted across 8 days.

Fig3

The length of an error bar therefore provides a quick visual indication of how much uncertainty there is in the data. The wider the error interval, the greater the uncertainty and the less precise is the parameter estimate.