Thank you! Link to heading
Thank you all for for your interest in my work! My Open Access repository of legal datasets over at Zenodo passed the milestone of 100,000 downloads sometime during November. I am ecstatic that something appears to be changing in the legal world and interest in Legal Data Science is increasing.
That being said, as you can see from the diagram below, the distribution of attention has been rather asymmetric and skews towards a few especially popular datasets. There are plenty of interesting applications to be worked on with some of the lesser known data sets, so I encourage you to browse around a bit and see if you find anything interesting.
Some top recommendations:
- The corpus of German parliamentary materials (CDRS-BT) is vast (more than 800 million tokens) and contains draft laws, in addition to other materials, dating back to 1949
- The set of German legal stopwords (SW-DE-RS) is an auxiliary data set that can save you a huge amount of work when cleaning your own German legal data
- The Collection of International Treaties and Legal Documents (CITLD) is an excellent companion to traditional international law research — myself, I use it almost daily
Applications Link to heading
If you happen to have built and published any interesting applications or research with these datasets, write me and let me know!