Overview Link to heading
Effective immediately, I am switching to copyleft licensing for all future software publications. Copyleft means that redistribution of a work or its derivatives must occur under the same or a similar license. This includes new software, new versions of existing software, libraries, active projects, public development versions on GitHub and anything new or current.
I prefer the GNU General Public License v3 (GPLv3), but will decide on a case-by-case basis if another copyleft license is appropriate. I am considering the GNU Affero General Public License (AGPLv3) for certain projects, but haven’t decided yet.
I am committed to backwards-compatible licensing for all past projects and versions. All past and inactive software publications will remain under the licenses they were originally published with. Usually this is MIT No Attribution (MIT-0).
My academic writing will also move to copyleft licensing, usually Creative Commons Attribution-ShareAlike 4.0 International. This website is already licensed under CC BY-SA 4.0 and this will remain as is.
I will continue to release academic datasets under the least restrictive license possible. For example, if the raw data is in the public domain, the finished dataset will be in the public domain. With official legal and political data there are different principles that are important to me, primarily advancing the rule of law and legal scholarship. This requires the unlimited availability of official legal data.
Continue reading for some deeper reflections on digital infrastructure, free software, the death of corporate open source, the (pill)age of AI, the need to resurrect copyleft licensing and where we go from here as a global community reliant on technology.
Table of Contents Link to heading
Software as Infrastructure Link to heading
Software is much more than a fungible product that can be substituted with an alternative at will. After deployment it tends to become a type of infrastructure, with the surrounding social dynamics adapting to the assumptions, affordances and limitations of software systems. This applies to individual workplaces, but also to entire markets, where user expectations are often defined by the leading product.
Consider the Excel Gene Auto-Formatting Saga. For many years the auto-formatting functionality of Excel caused a large number of data entry errors by reformatting gene names as dates and floating point numbers. For example, “SEPT2” which stands for “Septin 2”, is turned into “2006/09/02” (Ziemann, Eren and El-Osta 2016). The problem was first described by Zeeberg et al (2004), almost 20 years ago. Ziemann, Eren and El-Osta (2016) conducted a “programmatic scan of leading genomics journals” and estimated that “approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions”. The problem was so bad that the HUGO Gene Nomenclature Committee (HGNC) decided to rename several genes to avoid future data entry errors (Bruford et al 2020).
Microsoft added an option to disable auto-conversion in 2023 (The Verge 2023), but this is a manual option that needs to be changed by the user.
The kind of market dominance that can force changes even to scientific nomenclature is anything but rare in the software world. The phenomenon of software-as-infrastructure combined with network effects forces society to bend to successful software products. This generates lucrative business opportunities for actors in control of digital infrastructure.
The standard corporate playbook calls for building a mediocre software product that covers an acceptable number of use cases, attempting to win as many users as fast as possible, locking them into a proprietary system and then extracting maximum profit when the costs of switching are unacceptably high to the individual user/organization. The median product is optimized for generating recurring revenue, not user satisfaction or engineering quality.1
The Microsoft Windows operating system is perhaps the most famous example of this approach, but it is by no means the only one. Microsoft has captured a number of other software markets with successful business tactics over product quality (word processors, spreadsheets, e-mail). More recently, the success of Facebook, TikTok and every other VC-fueled attempt at building Software-as-a-Service (SaaS) platforms have pursued the same vision of creating profitable monopolies by capturing new product categories as they are created. The plan to “build a platform” has become nearly synonymous with “build a monopoly”.2
The private control of infrastructure accompanied by the destruction of competition through monopolistic practices was a tried and tested means of extracting economic rents (unearned revenue) even before the digital age. The United States boasts a colorful history of “captains of industry” or “robber barons” (your choice) making fantastic fortunes from establishing exclusive or quasi-exclusive control over railroads, public transport, communications, real estate, finance and other key social infrastructure. Similar economic patterns have played out all around the world in other times and places.
Several monopolies of the old industrial age were broken up by successful legal challenges brought by the US government in the early 20th century. Unfortunately, the zeal and success rate of anti-monopoly (“anti-trust” in the US) enforcement waned sharply with the rise of the Chicago School in the 1970s. Microsoft in particular managed to avert the US government’s anti-trust challenges to its business practices and was able negotiate a settlement that kept the company intact in the 2000s. This failure of enforcement in the US is significant, because the US became the global hub of software development precisely during the time that competition law was at its weakest. Companies like Microsoft, Google, Facebook (now Meta) and Apple have built such dominance in certain digital markets that they can deny entry to most new competitors and outright buy all the others (e.g. acquisition of Instagram and Whatsapp by Facebook/Meta). Supposedly this benefits consumer welfare, but I am not convinced.
It was during this era of competition law failure, from the 1980s onwards, that civil society took a keen interest in software as infrastructure and developed surprisingly effective means to challenge corporate control of software and secure this new digital infrastructure for society at large. Socially, this effort was led by the Free Software Movement and the Open Source Movement. Legally, the GNU General Public License (GPL) created by Richard Stallman — the “de facto constitution for the Free Software movement” (Tai 2001) — was a milestone that subverted copyright law with the intent to ensure free distribution and re-distribution of software. Technologically, a wide range of high-quality community-supported projects were successful in challenging corporate dominance of the software world, foremost among them Linux. The Free Culture Movement pursued a similar goal for cultural works, developed the Creative Commons licenses and birthed many successful projects, most notably Wikipedia.
Today, many open software and open cultural works are so pervasive that they are fundamentally a part of global infrastructure. For example:
- The Linux kernel and its many distributions (e.g. Debian) power almost all servers in the world
- Almost all programming languages (e.g. C, C++, Python, R)
- The GNU Project and its free software packages (e.g. Emacs, GNOME, GnuPG)
- OpenSSL (and forks like LibreSSL) which secure most encrypted internet communication
- The Firefox web browser
- Wikipedia as the default global encyclopedia3
- Specialized blogs that rival professional news outlets in quality and often outperform them in speed (e.g. Krebs on Security)
- The open pre-publication of academic research on arXiv
Free Software versus Open Source Link to heading
At this point I should mention the important distinctions between “Free Software”, “Open Source” and “Source Available”, distinctions that are often lost in public discourse. Roughly speaking the key characteristics between the different concepts and their associated social movements are:
- Source Available = Public Source Code
- Open Source = Public Source Code + Free Reuse/Redistribution
- Free Software = Public Source Code + Free Reuse/Redistribution + Copyleft
Open Source is a pragmatic concept and means that the source code for a program is publicly available, the program can be freely distributed and derived works can be freely created and distributed for any purpose. In particular, the Open Source Definition championed by the Open Source Initiative (OSI) forbids discrimination against persons, groups, fields of endeavor, products, other software and particular technologies.
Free Software is a primarily idealistic and philosophical approach that views software as a public good, based on the four software freedoms (Smith 2007). These are:
- Freedom to use the software for any purpose
- Freedom to change the software to suit your needs
- Freedom to share the software with your friends and neighbors
- Freedom to share the changes you make
Strictly speaking the definition of Free Software based on these four freedoms does not include copyleft provisions. The main difference between “Free Software” and “Open Source” movements is that the former stresses an idealistic, the latter a pragmatic approach to software distribution. In practice both are often mentioned side-by-side as “Free and Open Source Software” (FOSS) or “Free, Libre and Open Source Software” (FLOSS) to sidestep ideological debates and because in truth the formal definitions are almost identical.
Copyleft means that redistribution of a work or its derivatives must occur under the same or a similar license. The most famous of the copyleft licenses is the GNU General Public License (GPL). For example, the GPL requires that GPL software and its derivatives must be redistributed either under the GPL or AGPL — no exceptions. A similar principle is imbued in Creative Commons licenses with the share-alike quality.
That being said, the practical approach of the Free Software Movement to Free Software is strongly characterized by a copyleft approach to ensure robust enforcement of these freedoms. Copyleft provisions guarantee the software freedoms not just for the direct license recipient, but for any downstream party that might come into contact with the software. Therefore, even if the official definition of Free Software does not include copyleft, I believe it does so in actual practice and should do so in theory as well. When I speak of Free Software I always mentally include the GPL and copyleft.
Source Available is a deceptive marketing tactic that some companies use to promote their product as Open Source, when in fact it complies with neither the Open Source Definition nor the Four Software Freedoms (OSI 2019). Source Available makes no guarantees beyond the public availability of the source code and usually includes some strict obligations that discriminate against persons, groups or technologies, usually to ensure a commercial advantage.
The Forging of a Social Contract, Maybe Link to heading
The evolution towards open software infrastructure seems smooth in hindsight, although the process was anything but painless going through it. Starting in the 1970s, the early innocence of the digital age was followed by intense cultural conflict between civil society and commercial interests about who would control the digital infrastructure of this new era, including the terms under which it would be made available to society at large.
The acrimony between civil society and the corporate world over matters like free software, copyright, information sharing and the boundaries of permissible computer use (“hacking”) was intense. Microsoft’s Steve Ballmer famously called Linux “a cancer that attaches itself in an intellectual property sense to everything it touches” (The Register 2001). Before the streaming years, many DVDs came with explicit FBI infomercials threatening criminal prosecution for unauthorized copying. The Computer Fraud and Abuse Act (CFAA) was a cornerstone in handing down heavy prison sentences for any and all vaguely unauthorized uses of computers, including the violation of Terms of Service (TOS), making the violation of a simple contract a felony criminal offense in the United States.
One of the lowest points of the latter part of the era remains the harsh prosecution and subsequent suicide of Aaron Swartz, one of the leaders of the Free Culture Movement and a co-creator of the Creative Commons organization.
Nevertheless, following the tumultuous rise of the World Wide Web it appeared that a social contract had been forged between corporate interests and civil society in the digital space. Individual creators, a wide range of collectives with different levels of organization and a fair number of corporations would openly publish on the internet some of the finest intellectual works the world has ever seen. These were made available for all to read, share and use (within reasonable constraints set by Open Source and Free Culture licenses), in the expectation that many others in the profit and non-profit ecosystems would return this generosity on principle or at least with principled business acumen.
The economically minded might prefer to frame this development in terms of an “information economy”, “sharing economy”, “platform economy” or “creator economy”, but I think that there is more to this system of mutual expectations than can be explained by conceptualizing it as an exchange of goods and services, governed by supply and demand. Many works are shared without the expectation or even viability of economic returns, so a market-based view cannot adequately explain much behavior on the Web. The idea of a social contract is probably closer to the truth.4
After a long and bitter struggle it seemed that profit and non-profit interests were finally roughly aligned, with each supporting the other through somewhat different, but ultimately compatible approaches. The Free Software and Free Culture advocates did not vanish, but there was a general feeling, especially among software developers, that times had changed and the earlier acrimony was no longer warranted.
Even Microsoft (!) has changed its tune and put its software muscle to work. It is now 2024 and Windows comes with a special Windows Subsystem for Linux to make using Linux in Windows machines as painless as possible.
Many others believed the same. For example, The Register reported a corporate executive saying:
In an email to The Register, David Habusha, VP of product at WhiteSource, said that the copyleft license was created by the Free Software Foundation in 1985 “to ensure the evil corporations of that time would not be able to use open-source software and then restrict its redistribution.”
But times have changed, he argues. “It is no longer an ‘us’ vs. ’them’ scenario, meaning the open-source community vs. commercial corporations,” he said.
I counted myself among those who believed we had buried the hatchet and as-open-as-possible with permissive licensing was the default way forward in the future. Academics expand the boundary of human knowledge for the sake of humanity, including corporate humanity, right?
The Death of Corporate Open Source Link to heading
People love telling you to enjoy something while it lasts. We may have reached this point with the corporate commitment to open source. Jeff Geerling recently surveyed the terrain and proclaimed that “Corporate Open Source is Dead” (Geerling 2024). I am inclined to believe him.
The list of high-profile casualties in recent years is impressive:
- MongoDB invented an entirely new license in 2018, the Server Side Public License, with draconian terms that were unacceptable to almost everyone (MongoDB 2018)
- Confluent switched licenses for some components from Apache 2.0 to the Confluent Community License in 2018 (Confluent 2018)
- Cockroach Labs switched the license for CockroachDB from Apache 2.0 to the Business Source License (BSL) in 2019 (Cockroach Labs 2019)
- HashiCorp switched their default license from Mozilla Public License v2.0 (MPL 2.0) to the Business Source License (BSL/BUSL) in 2021 (HashiCorp 2021)
- Elastic switched their default licenses from Apache 2.0 to Elastic License and Server Side Public License (SSPL) in 2021 (Elastic 2021)
- Red Hat took over and killed CentOS in 2021 (CentOS 2021), an Open Source project that was competing with Red Hat Enterprise Linux (RHEL), later stopping the publication of RHEL source code and making the inferior “CentOS Stream (…) the sole repository for public RHEL-related source code releases” in 2023 (Red Hat 2023)
- Redis switched their default licenses from three-clause BSD to the Redis Source Available License (RSALv2) and Server Side Public License (SSPLv1) in 2024 (Redis 2024)
The incident that hit hardest for me was Red Hat closing down CentOS and scaling back its Open Source commitments, but the last straw was when I learned about Red Hat’s long-running attempts to circumvent the GPL by contractually forcing customers to decide between a) exercising their GPL rights or b) remaining a customer of Red Hat (Software Freedom Conservancy 2023).
I was a long-time user of Fedora and switched to Debian for reasons of stability, but it’s safe to say that I am never going back. There are philosophical reasons, of course, but considering how Red Hat alienated the CentOS community I wonder if even Fedora has a viable long-term future as a community project.
There is nothing inherently wrong with people looking to earn money. There is also nothing inherently wrong with proprietary software, if it is necessary to keep the lights on. In practice it does seem that, according to the White House cyber policy director, companies like Microsoft need to be “dragged kicking and screaming” (The Register 2024) to do the right thing, such as providing basic security tooling.
It is, however, deceptive and unethical to present a project as Open Source when it is not. It is deceptive and unethical to solicit generous contributions from a community with no financial stake in the success of the software and later to revoke the reciprocal arrangement in the hopes of wringing out some extra dollars.
And this brings us to the age of generative AI.
The (Pill)Age of AI Link to heading
It is forbidden to violate copyright; therefore all pirates are punished unless they scrape in large numbers and to the sound of AI trumpets.5
It would seem like the corporate world was inspired by Voltaire, but missed the irony.
The Age of AI is the death of corporate open source writ large, magnified to global scale. Fueled by insensible amounts of venture capital since the launch of ChatGPT in November 2022, many for-profit corporations and some non-profit initiatives on the generative AI hype train have been rampaging across the internet and damaging the information ecosystem and public trust in ways that will not be fully understood for years to come. Instead of swindling contributions from a few willing stakeholders, AI companies are pillaging every bit of content they can get their hands on. Technical security measures, copyright laws and data protection regulations be damned.
Mustafa Suleyman, the CEO of Microsoft AI, had this to say in 2024:
I think that with respect to content that’s already on the open web, the social contract of that content since the ‘90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been “freeware,” if you like, that’s been the understanding.
Suleyman is not a lawyer and presumably has not watched any DVDs with anti-piracy infomercials in some time, so he can perhaps be forgiven for being ignorant of the finer points of copyright law. Then again, for proper pirates “the Code is more what you’d call ‘guidelines’ than actual rules”.
What bothers me most is his invocation of the social contract. Even a commercial contract implies a reasonably balanced give-and-take. A social contract implies a similarly balanced exchange in society at large. People publish openly on the Web in the expectation that others will return the sentiment and general welfare increases. However, Microsoft and OpenAI only take, they do not give back. The same is true for almost all AI companies.
OpenAI hasn’t published an open model since GPT-2 and is now one of Wall Street’s wolves pretending to be a non-profit sheep. Microsoft remains the archetype of a closed source company. Google is keeping its cards close to its chest. Only Meta has been making waves with its “open” models and offering to provide source code and weights on request. However, this is a deceptive marketing strategy as the Llama license does not comply with the OSI Open Source Definition and whether access requests are honored or denied is opaque. Also, we know what Facebook/Meta is like as a company. This commitment to “open” is likely no more than a maneuver to temporarily undercut the competition. As soon as Meta is ahead it will certainly “do a Red Hat”. We’ve seen this before.
More recently the French company Mistral AI rose to prominence on its claim of being an Open Source company (Mistral AI 2023).Then a massive Microsoft investment in Mistral was disclosed. Conveniently right after the EU AI Act was adopted and the French government had secured concessions on its behalf due to its standing as a good corporate citizen (The Verge 2024). Mistral continues to publish “open weight” models, but has begun excluding commercial use cases from its licenses (e.g. Codestral 22B).
Followers of my work might note that I publish a large number of legal and political data sets, so perhaps this is a case of pot calling kettle black? The difference being that I exclusively republish legal and political data created by public authorities, sourced from official public databases, with clear legal provenance and permission to re-use.
If laws are adopted and judgments are issued in the name of the people, then the people have a right to acquire, analyze, criticize and re-use judgments to become active participants in the rule of law.
The Breaking of the Social Contract Link to heading
While OpenAI and a couple of others had first crack at the Web, it is now open season with thousands of AI companies going at it full force. This wholesale pillage has an ethical, a legal and an economic dimension.
- Ethical dimension — Is it acceptable to take a mind-numbing collection of intellectual works, repackage them as AI and give nothing in return? What does this do to the social contract of the Web?
- Legal dimension — Is it permissible under copyright and data protection laws to take legally protected intellectual works, compress them into a machine learning model for profit to and give nothing in return?
- Economic dimension — What happens to the creator economy if you repackage a large number of intellectual works, do not provide compensation and use the result to compete with the original creators?
I’m not going to spend much time on the ethical and legal dimensions of this problem.
The ethical dimension is a matter of personal integrity and philosophical depth. Both qualities tend to be lacking in the executive suites of companies in the AI space, so I will save myself the effort of building an elaborate argument that will persuade exactly no one. Suffice to say that if you invoke a social contract to take all you can without giving back, you have understood neither the concept of “social”, nor “contract”, but you do know what “privatization” means.
The legal dimension is largely uncharted territory, so we will probably have to wait a few years for results from ongoing litigation and legislative processes to discover whether AI data set preparation, training or model usage count as copyright events and whether companies producing for-profit models can rely on established copyright exemptions such as “fair use”. I’m not a copyright lawyer, so I recommend you look up your favorite pundit for a legal opinion. I do believe this problem will have to be solved with laws, not litigation.
What I do want to focus on is the economics of the problem. The “creator economy” in the modern digital age provides much more freedom to creators than earlier economic eras did, particularly because now there is a much greater selection of viable intermediaries.
As an academic wanting to share your research in the pre-internet era you had a choice between submitting a book or journal article to one of a few publishers, write a technical report in the rare case that an established organization invited you to participate or send honest-to-goodness physical letters to your academic friends.
These days you can post random thoughts to thousands of strangers on social media (e.g. Mastodon, LinkedIn, Facebook,6 Reddit, Instagram, TikTok, YouTube), publish very lengthy posts (yes, I know that this essay is too long) on any subject under the sun to a managed blog in minutes, set up your self-compiled blog in however many weeks you need to get Hugo to work, share draft manuscripts instantly on arXiv or Zenodo, record a podcast, dance your research on TikTok (if you are young) or YouTube (if you are old) and do any number of more or less advisable things that will get your work noticed and draw the ire or admiration of your peers.
Myself, I have made good use of Mastodon and LinkedIn and depend greatly on my personal website to share my work. It’s probably fair to say that without social media and blogging I could not have published the research that interested me and gained the audience that I have. My experiences with traditional intermediaries have been mixed, to say it politely.
However, intermediaries remain. They simply have become more colorful, more exciting, more addictive and more algorithmic. This is certainly true of social media, but even academic publishing is becoming more and more like social media every year. Whether you are chasing likes on LinkedIn or citations on Google Scholar, the desire to conform to the expectations of the algorithms is intense. Nevertheless, it is anyone’s guess if today’s algorithms are more opaque than the backroom dealings and ego-driven editorial work of past centuries.
It is important to remember that intermediaries do not just mediate content between a creator and an audience, they also mediate the primary rewards: money and status. Financial rewards can be direct (e.g. subscriptions, shared advertising revenue from platforms) or indirect (e.g. subscriptions, jobs, consulting contracts, paid speaking gigs). Status rewards include short-term recognition (e.g. likes, comments, shares, citations) and long-term recognition (e.g. followers, visibility/mindshare, Board/advisor positions, academic h-index). Status rewards can often be converted into indirect financial rewards with some delay.
A basic model of the creator economy might look something like this:
The promise of the old creator economy was that if creators publish content through intermediaries, an audience would reward the intermediary and the intermediary would in turn share the rewards with the original creator. For example, Brian Krebs does great cybercrime journalism, publishes it for free on the Web, an audience looks for current research, Google directs readers to the website and profits from the ads served alongside, while Krebs profits from the additional traffic via ads on his website and by offering paid speaking engagements.
Under the new emerging AI economy this link to the original creator is broken. Instead of directing the user to the original content, AI companies intend to process the entire information request from the audience themselves in order to fully capture the revenue and status streams for themselves. Need current journalism? AI will provide. Need a recipe? AI will provide. Need financial advice? AI will provide. Need a solution to a complicated software problem? AI will provide.
There are no language models that can continue to function in the medium and long term without human creators publishing new content for it them repackage. However, in the new AI paradigm the content is identity-laundered and served by the intermediary to compete with the original human creator, starving them of the financial and social rewards they need to continue producing content. Artists are already regretting their generous sharing of images on the Web. Human translation work is dying. Journalists are being laid off by the thousands. More will follow.
Break the link to the creator, break the creator economy. It is economically unsustainable.
However, as Donath and Schneier remind us (Donath/Schneier 2024), there is more at stake here: human connections. Earlier in this essay I wrote that calling the Web an “economy” does it injustice, because there is more at stake than just an exchange of goods and services. Much of what makes the Web so interesting was built with altruistic motives in mind and a simple economic model is insufficient to explain this. The theory of a social contract is much nearer the mark, because the Web is about building a global commons, whether for profit, non-profit or simply as a byproduct of wanting to connect with other human beings.
And breaking the link between humans breaks the entire social contract of the Web.
From Open Source to Free Software Link to heading
It is now clear that the corporate world has broken the social contract of the Web and is doing its best to commodify and re-sell every last part of it. Then again, perhaps this has been happening for much longer and it just took me this long to figure it out.
So, where do we go from here? What do we do in the face of AI data pillage and corporate withdrawal from Open Source?
There is, of course, the option to stop sharing any valuable knowledge online. However, I don’t think this is right. The Web has become one of the greatest global commons ever created and this would be like burning down the Library of Alexandria because people have been selling cheap knock-off copies of the books near the entrance.
Technical countermeasures are always an option, but simple ones like robots.txt are routinely ignored by AI scrapers. I can’t say I’m surprised with so much money on the line. Authwalls are not feasible for sites that depend on casual traffic and even big media companies have had trouble generating enough revenue with subscriptions. Some services like Cloudflare offer counter-scraper machine learning-based blocking, but this can have unintended consequences for legitimate traffic. I’ve had to suspend updates to the Corpus of Decisions: International Court of Justice (CD-ICJ) because the Court is now blocking all automated access to its case law.
If history is any guide, this is a battle that has to be fought with legal and political countermeasures. It is also the time to make clear that the tacit social contract has been broken. A number of strategic litigants are already doing admirable work in the courts of law, but this fight needs to be taken into the court of public opinion and the political process as well.
This essay is a small contribution to the ongoing public debate. I think the much greater contribution to the court of public opinion (and a legal countermeasure at the same time) would be the widespread resurrection of viral copyleft licensing.
In the past 15 years or so, copyleft licenses have taken a heavy hit in terms of popularity. The proportion of GPL family licenses used in Open Source projects has dipped significantly compared to permissive licenses (e.g. BSD, MIT and Apache) (Wikipedia 2024). Many have argued that with publicly available software licensing questions had become irrelevant and practicality is now the order of the day. According to Matt Asay, a developer relations manager at MongoDB:7
It’s time for the open source Rambos to stop fighting and agree that developers care more about software’s access and ease of use than the purity of its license. (Infoworld 2023)
The opposite is true. In every instance where a company abandoned their OSI-approved Open Source licenses this has led to significant backlash from the relevant community and the forking of the project. With the death of corporate open source and the ravages of AI it seems to me that now is a good time to dust off the old Rambo spirit of Free Software and return to a strong and principled stance on software and content licensing. All the healthiest projects I know (Linux, Debian, Emacs, R, Wikipedia) were built on copyleft licensing, which is a good sign.
I do not mind if people use my published work products in commercial settings (and my chosen licenses explicitly allow this) but a robust reminder is in order that they are intended as part of a global commons to advance the knowledge of humanity and not as exploitable data sludge for greedy executives.
We are still in the opening stages of conflict between creators and AI developers, with claims being staked and the scales of justice being weighted with competing interests. Because of the economics of the problem (and hopefully the ethics, too, but I’m not holding my breath).
I assume that one of two things will happen:
- Current copyright laws will be interpreted to cover model preparation, training and deployment
- Sui generis rules will be adopted that cover the licensing of intellectual works for model preparation, training and deployment (comparable to the development of database protection rights)
The first outcome is perfectly covered by adopting current-generation copyleft licenses like GPLv3 or CC BY-SA 4.0. So this is the strategy I will adopt for now.
The second outcome will require updating standard licenses to include the to-be-developed rules. I am confident that standard-setting organizations will get behind this. I have much faith that Creative Commons will get this done and much less faith in the Free Software Foundation, but we’ll cross this bridge when we get there.
I have been GPL-curious for a long time, but have held off until I had time to think about the details and long-term implications. Now it is time to make a point. Instead of pragmatic and open distribution there is a need to clarify to the public and to the corporate world that a global commons needs Free Software based on robust and clear expectations that the commons will remain the commons.
Let us return to copyleft licensing to ensure that the social contract of the Web is enforceable.
Bring back the free software and open source Rambos.
Conclusion Link to heading
Effective immediately, I am switching to copyleft licensing for all future software publications. This includes new software, new versions of existing software, libraries, active projects, public development versions on GitHub and anything new or current.
I prefer the GNU General Public License v3 (GPLv3), but will decide on a case-by-case basis if another copyleft license is appropriate. I am considering the GNU Affero General Public License (AGPLv3) for certain projects, but haven’t decided yet.
I am committed to backwards-compatible licensing for all past projects. All past and inactive software publications will remain under the licenses they were originally published with. Usually this is MIT No Attribution (MIT-0).
My academic writing will also move to copyleft licensing, usually CC BY-SA 4.0. This website is already licensed under CC BY-SA 4.0 and this will remain as is.
I will continue to release academic datasets under the least restrictive license possible. For example, if the raw data is in the public domain, the finished dataset will be in the public domain. With official legal and political data there are different principles that are important to me, primarily advancing the rule of law and legal scholarship. This requires the unlimited availability of official legal data.
-
User satisfaction and good engineering do contribute to sales, but they are just two among many factors that decide whether a product is successful in the marketplace. There are outliers in both directions, of course. There exist products with exceptional engineering and user experience offered at a fair price and there are products that are an embarrassment to their developers on top of not meeting any real user needs, despite being sold for hefty sums to top management. ↩︎
-
Some might argue that most digital markets end up as oligopolies, due to the failure of capturing the complete market. I do wonder if this is not in fact a mischaracterization of the relevant market in many cases. While Facebook and Instagram are technically both social media networks, they serve very different user demographics and in any case belong to the same parent company. ↩︎
-
Wikipedia has ascended from “don’t trust anything on the internet, anyone can edit Wikipedia!” to “our AI product uses Wikipedia as the primary source of truth and we trust it completely”. ↩︎
-
You are probably not surprised that a lawyer turns to social contract theory instead of economics, but we’ll get there. ↩︎
-
This is my take on a famous quote by Voltaire: “It is forbidden to kill; therefore all murderers are punished unless they kill in large numbers and to the sound of trumpets.” ↩︎
-
Is anyone under 60 still on Facebook? ↩︎
-
MongoDB specially invented a new software license, the Server Side Public License (SSPL), to secure its commercial interests. The SSPL caused much consternation and shaking of heads in the Open Source community, since using any SSPL components in a service requires publishing the source code for the entire service and all its infrastructure. These terms are so draconian that they are entirely unacceptable to anyone who relies on proprietary components and probably many other users, too. The SSPL was submitted by MongoDB for approval by the Open Source Initiative (OSI), but was withdrawn when it appeared that it would be rejected (OSI 2019). The Board of the OSI later clearly stated that the SSPL could not be considered an Open Source license and called it a “fauxpen source license” (OSI 2019). ↩︎