Sampling strategies

One of the factors that often receives less attention than it deserves is sampling. When attempting to understand some phenomenon there is typically a population of some interest. A population is the group of individuals that we are ultimately attempting to learn something about.  All the people living in Pennsylvania, all patients with diabetes mellitus type II, all freshman college students in private universities in the United States. These are all populations.

Say we want to know whether a novel intervention can improve compliance with blood glucose readings in type II diabetics. Including every single type II diabetic in such a study would be an impossible task. So researchers instead work with samples. A sample is simply a subset of the population. We collect data from samples and make inferences about the target population based on the data collected from the sample.

The challenge of sampling is that it can introduce error: If the sample does not adequately represent the population given the research question then the results may not generalize to the target population. If the sample is not large enough then it may not be sufficiently large to overcome the contribution of chance to the findings.

So when conducting a study, there is a target population, which is the ultimate set of individuals to to which the results of the study are aimed to be generalized. The accessible population is a more constricted subset of the target population that would be accessible to the researchers.

The intended study sample is group of individuals that the researchers have selected to be included in a study, and the actual study sample is the group of individuals that actually do participate in the study.

Sampling Strategies

Sampling strategies are commonly divided into two types: probability based sampling and nonprobability sampling.  

With probability based sampling individuals are chosen according to some random process and each member of a population has a particular chance of being included in the sample. Statistical methods that are typically used to enable generalization from the sample to the target population require that the samples have been chosen at random.

There are several approaches to achieving a random sample: simple random sample, systematic sample, stratified random sample, and a cluster sample.

simple random sample is collected by first enumerating all the individuals in the population and then randomly selecting a subset of this list.

As with the simple random sample, a systematic sample requires that an enumeration of all individuals in the target population be constructed.  But instead of using a purely random approach to sampling from the population, a first member is selected at random and thereafter every kth member is selected, where k is population size divided by the sample size.

stratified random sample requires that a population be divided into homogenous subgroups prior to sampling. All members are assigned to a subgroup and each member is assigned to one and only one subgroup. Simple random sampling or systematic sampling is then applied within each subgroup.

A cluster sample takes advantage of naturally existing groups. Typically a sample of groups is chosen randomly from all groups and then individual members are then selected via a simple or systematic random sample within each group.

If the goal of a study is to be able to draw inferences about a population based on data collected from the sample, it is necessary that the sample has been selected randomly. Statistical analyses rely on this assumption.

But in clinical research a random sample of the whole target population is almost never possible. Convenience sampling, preferably with a consecutive design, is a practical approach that is often suitable.

In deciding on a sampling strategy it will ultimately be necessary to determine whether the results obtained on the basis of the sample would be close to those that would be obtained if the data from the entire population could be obtained.

Leave a comment