Cluster sampling is a technique that generates statistics about certain populations. It has a specific format required to obtain an appropriate sample, and though this sampling can help accurately gauge some information, it is not thought as accurate as simple random samples, where all groups of the same size have the same exact chance of being selected. Despite lacking the assurance that comes from using random samples, cluster sampling is used frequently in business and other applications.
The basic procedure for creating a cluster sample is to divide the full population into some sort of meaningful groups. For instance, McDonald’s® might want a sense of what the most popular item ordered on their menu is. They might create a cluster/group for each McDonald’s store. They would then pick some of these clusters and obtain a sample from all people in that group. They could keep track of each customer’s order and decide which menu item is most popular or survey customers eating, but the company would only survey or track people in the chosen clusters; they’d also try to get all people at selected clusters.
This type of sampling is very popular on big voting nights. A natural division exists between voter precincts, but by choosing some of the precincts and surveying or using exit polls at the chosen ones, there’s often a good sense what issues or what elected officials appear to be winning. The results are extrapolated to the entire population, and they’re often fairly representative of it.
When people study statistics, they often find it challenging to remember the features of cluster sampling as opposed to the features of stratified sampling. The two have some similarities and key differences that are worth understanding.
In a stratified sample, a population is also divided into groups, though number of groups tends to be smaller. A population could be divided by gender, age, income, and region in which they live, and comparing the result of each group may be part of the reason the stratified sample is performed. The huge and appreciable difference between stratified and cluster methods is that when the groups are created, some members from each group or strata are selected. With a cluster, when clusters are created, the whole population of some of the clusters are used.
The degree to which this method works tends to depend on what is being evaluated and how diverse of a population clusters represent. Say a statistician decided to break down voting precincts in a predominantly Republican state and create clusters of some of them to look for predictions about a national election. These results would likely be skewed and not representative of the complete population in the US. On the other hand, cluster sampling with exit polling in a Republican or Democrat state could say a lot about the voting trends in the individual state.