New Dataset available! Link to heading

Tilko Swalve and I are excited to announce the publication of a new dataset!

We collected, cleaned, sorted and published Open Access more than 36,000 criminal law judgments of the German Federal Court of Justice, the Bundesgerichtshof (BGH).

These data are extremely rare. Extensive collections of older judgments issued by German courts are usually only available commercially — if at all. They are especially valuable for the rule of law, because criminal law deeply impacts the rights of citizens. The German Federal Court of Justice has been setting down criminal law guidelines for all of Germany since its creation in 1950.

Unfortunately the German Federal Court of Justice only started publishing its judgments in 2000 and has not done so retroactively for older decisions. Many important precedents and landmark decisions have been denied to the public until now. They were only available to purchasers of expensive commercial subscription services.

We are changing this. Now.

About the Dataset Link to heading

The dataset Entscheidungen des Bundesgerichtshofs in Strafsachen aus dem 20. Jahrhundert (BGH-Strafsachen-20Jhd) is an as-complete-as-possible collection of judgments in criminal matters issued by the German Federal Court of Justice in the period between 1 October 1950 (the founding of the court) and 1 January 2000, the date that the court started publishing decisions online.

We obtained judgments from five of the courts “senates” (panels) for the years 1950 to 1999. A sixth senate existed from 1954 to 1956, but we have no data for this last senate.

We offer the dataset in machine-readable formats TXT and CSV, but also include the original PDF files for traditional legal work.

Please note that the texts in the dataset are only available in German and the dataset accordingly is documented only in German. This is because NLP practitioners working on data in a certain language should be able to speak the language well enough to be able to read the documentation.

Features Link to heading

  • 31 variables
  • Data model compatible with the Corpus der Entscheidungen des Bundesgerichtshofs (CE-BGH)
  • Public Domain (CC-Zero 1.0)
  • Open and platform independent file formats (PDF, TXT, CSV)
  • Extensive Codebook
  • Compilation Report explains construction and validation of the data set in detail
  • Large number of diagrams for all purposes (see the ‘ANALYSIS’ archive)
  • Diagrams are available as PDF (for printing) and PNG (for web display), tables are available as CSV for easy readability by humans and machines
  • Secure cryptographic signatures
  • Publication of full source code (Open Source)

Content of the Dataset Link to heading

By Year Link to heading

We could not extract a plausible date for 2,419 judgments. They are not included in this diagram.

By Senate (Panel) Link to heading

All judgments have panel metadata. This diagram shows the full dataset.

By President Link to heading

“NA” means that no plausible date could be extracted and the judgment therefore could not be mapped to the tenure of a President. This is the case for 2,419 judgments.

Workflow Link to heading


The data pipeline offers the following features:

  • Clean filenames
  • Correct rotation, standardize in portrait orientation
  • Optical character recognition (OCR)
  • Automated cleaning of OCR errors related to German legal terminology
  • Extraction of additional variables
  • Production of ready-to-use ZIP archives
  • Comprehensive documentation
  • Automated unit tests and statistical reporting
  • Cryptographical signatures

Copyright Link to heading

The dataset is released into the public domain under a Creative Commons Zero 1.0 Universal Public Domain Waiver.