Overview Link to heading

Following a long period of consolidating my previous work I am happy to announce a new dataset release!

The Corpus der Entscheidungen des Bundesfinanzhofs (CE-BFH) is a complete collection of all decisions published by the German Federal Court of Finance (Bundesfinanzhof or just BFH). The dataset connects to the official database of the Bundesfinanzhof and transforms it into a machine-readable dataset.

Please make sure to read the accompanying Codebook! The Codebook is essentially the user manual for the dataset. It contains important information if you wish to use the dataset correctly and should be the first port of call for any new and experienced user. It will also help you decide which variant is right for you. Usually I recommend the CSV variants for quantitative applications and the PDF variant for traditional research.

For practitioners there is an additional variant that contains only the decisions published in the official collection (BFHE), also known as V-Entscheidungen.

The dataset will likely be updated once or twice per year. I always post recent news about new and updated datasets to Mastodon at @seanfobbe@fediscience.org.

Note
Please note that the dataset documentation and variables are in German. One reason for this is the loss of information when translating something from German to English and back again. Another is that anyone conducting NLP on German-language documents should speak either enough German to be able to read the documentation or probably doesn’t care about the details anyway since everything is fed into an LLM without much consideration for data quality.

Workflow Diagram Link to heading

Key Facts Link to heading

  • Reference date: 15 October 2023
  • Scope: 10,310 Decisions of the Federal Court of Finance (Bundesfinanzhof) of the Federal Republic of Germany
  • Formats: CSV, PDF, TXT and HTML

Features Link to heading

  • 34 variables
  • Regular updates
  • Public Domain (CC-Zero 1.0)
  • Open and platform independent file formats (PDF, TXT, CSV, HTML)
  • Extensive Codebook
  • Compilation Report explains construction and validation of the data set in detail
  • Large number of diagrams for all purposes (see the ‘ANALYSIS’ archive)
  • Diagrams are available as PDF (for printing) and PNG (for web display), tables are available as CSV for easy readability by humans and machines
  • Secure cryptographic signatures
  • Publication of full source code (Open Source)

No Copyright: Public Domain Link to heading

The texts and metadata are in the public domain (§ 5 para 1 UrhG) as they are court decisions and official works. § 5 para 1 UrhG is also applicable to official databases via analogy, as the German Federal Court of Justice has decided (BGH, Beschluss vom 28.09.2006 - I ZR 261/03, “Sächsischer Ausschreibungsdienst”).

All of my own contributions are released into the public domain through the CC0 1.0 Universal Public Domain License.

Disclaimer Link to heading

This dataset is a private academic initiative and is in no way associated with agencies, courts or other public authorities of the Federal Republic of Germany.