Overview Link to heading

Data visualization is an incredibly important component of a data scientist’s toolbox. Not just for communicating the results of an analysis, but also as a sanity check for detecting obvious problems with the data.

The original Datasaurus was created by Alberto Cairo in 2016 as a humorous demonstration of how a silly image can be presented as a serious dataset. Justin Matejka und George Fitzmaurice later published a technique that could produce “serious” datasets from a wide range of base images, resulting in the Datasaurus Dozen.

This tutorial discusses the Datasaurus with its dozen companion examples and explores how they illustrate the limitations of raw data inspection and pure summary statistics.

Info

[New Tutorial] The Importance of Data Visualization (Datasaurus Edition)