Overview Link to heading

The representativeness of scientific research and public opinion polls is one of the defining methodological problems of our time. Whether a single study or a series of studies provide us with general information about the world greatly depends on the quantity and quality of the data collected.

If the methods and data of a study are rigorous and convincing in its own context we call it internally valid. If the methods and data allow us to generalize these results to a wider population we call it externally valid.

Key components of external validity are samples, populations and representativeness. Properly drawn random samples with adequate sample size allow inferences from a thousand persons to an entire country. By contrast, badly drawn samples are often biased in unknown and irreparable ways. The quality and quantity of the sampling process are key to the power of modern opinion polls, which are able to predict the outcome of democratic elections within a margin of just a few percentage points.

However, these results are only helpful when the methods are rigorous. The reliance on flawed data in decisionmaking can cause significant and widespread damage. Nowhere is this more acute than in public policy and the application of the law. It is critical for lawyers to understand when results can be generalized and when they cannot.

This tutorial examines the following themes:

  • The difference between descriptive and inferential statistics
  • Samples, populations and representativeness as key concepts
  • The reference class problem as a PR catastrophe for the Munich Security Conference
  • Fixing the reference class problem with random sampling or post-stratification
  • Biased opinion polling and the downfall of the Literary Digest
  • Visual intuition for random samples
  • Numerical intuition for random samples
  • The bootstrap procedure for measuring uncertainty at different sample sizes

This tutorial is rather heavy on theory in the first half, but the second half includes a lot of interesting R code you can run yourself. You’ll learn to understand the process, verify the results and make some cool diagrams. The visuals in particular are far more exciting when you create them yourself!