Info

Recommended Reading

Climbing towards NLU (Bender and Koller 2020)
On the Dangers of Stochastic Parrots (Bender et al 2021)
Situating Search (Shah and Bender 2022)
Mystery AI Hype Theater, Episode 10 - Don’t Be A Lawyer, ChatGPT (DAIR)

Fake Filings in Federal Court Link to heading

’tis the season for AI unreason once again.

This post was prompted by the recent-ish (but inevitable) news item in which an attorney used ChatGPT to generate a legal brief (including fake citations and fake case law) for another attorney to file in US Federal Court on behalf of his client (Mata v. Avianca, Inc.). And, of course, it went very, very wrong.

Yes, we are at stage of the hype cycle where the output of Large Language Models (LLM) is prompting humans, not the other way around.

The full case history is available on CourtListener. The Guardian offers a brief review. I just want to highlight a few notable nuggets:

The apology is particularly interesting, esp. as a cautionary tale
The attorney did not just submit fake citations, but also fake excerpts and notarized them

In the end the attorneys and their firm were fined USD 5,000. Judge P. Kevin Castel had the following to say:

In researching and drafting court submissions, good lawyers appropriately obtain assistance from junior lawyers, law students, contract lawyers, legal encyclopedias and databases such as Westlaw and LexisNexis. Technological advances are commonplace and there is nothing inherently improper about using a reliable artificial intelligence tool for assistance. But existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings. [The attorneys and law firm] abandoned their responsibilities when they submitted non-existent judicial opinions with fake quotes and citations created by the artificial intelligence tool ChatGPT, then continued to stand by the fake opinions after judicial orders called their existence into question.

Mata v Avianca, Opinion and Order on Sanctions, 22 June 2023

In other words: check your sources and don’t lie to the judge.

The “ChatGPT Lawyer” is likely to end up becoming part of modern tech history, much as the “Zoom Cat Lawyer” did. Although the latter is far more wholesome and one of my favorite pieces of legal internet trivia.

Side Note on Citations Link to heading

The problem of fake legal citations is fixable at the technology level, but simply wasn’t a concern for the ChatGPT developers. Since LLMs operate at the word level (more often called a “token”), one would only need to pre-process the training data to concatenate legal citations into single tokens. A citation string like “1 BvR 141/16” would be converted into 1_BvR_141/16" or similar.

As far as we know, LLM hallucinations are not due to the creation of entirely new tokens, but due to the senseless or incorrect arrangement of known tokens from the training data. So a specialized legal LLM would not necesarily suffer from incorrect citations, but the fake quotes are more difficult to fix.

One would, of course, need to invest a fair amount of money to gather all relevant citation patterns. Presumably many legal publishers have already done so years ago. Even for new entrants into the market, given the volume of announced investments, this seems like a feasible effort.

Hype and Hallucination Link to heading

Following the recent hype arounds LLMs it is unsurprising that all sorts of actors are under pressure to invest in and deliver LLMs for their customers and stakeholders. OpenAI led the charge with ChatGPT, GPT-3 and GPT-4, Microsoft immediately partnered with OpenAI to bring Bing out of its twilight exile and Google was forced to prematurely announce Bard in a desperate bid to maintain control of the search market. A number of other organizations have announced billion dollar investments into generative AI.

The legal profession has been anything but unaffected by the hype, with many firms trying to apply one of the GPT variants to legal problems, some companies offering custom legal models of unknown quality or developing entirely new offerings. Even the famously tech-averse, risk-averse and conservative German judiciary has announced the development of a custom language model for judicial use.

But is this a reasonable response? People who stand to earn a lot of money from generative AI certainly think so. Already there are claims that the “The AI Revolution Will Be More Significant Than the Agricultural Revolution” (Sam Altman, CEO of OpenAI) or that “AI will save the world” (Marc Andreessen).

If I were hoping to sell a billion subscriptions that’s what I’d be telling my customers and investors as well. In his article Andreessen lambasts bootleggers, meaning “self-interested opportunists who stand to financially profit by the imposition of new restrictions, regulations, and laws”. He conveniently forgets to mention snake oil salesmen — people who profit immensely from selling the “next big thing”, regardless of whether it is useful or not.

On a technical level, LLMs currently rely on “Transformer” technology. Transformers “learn” about text by being fed sequences of words, with the final word masked. The model then has to guess the masked word. Stephen Wolfram has a more detailed and excellent write-up of the details.

So the sentence “It is illegal to commit murder” is fed to the algorithm as “It is illegal to commit [???]”. After being fed incredible amounts of data, the model might learn that “murder” is 30% probably, “theft” is 20% probable and “crimes” is 10% probable. With a larger context window than this short sentence the guesses become better suited to the prompt.

There is also a “temperature” parameter, which defines a certain probability to randomly select a lower-ranked word, because this makes texts “more interesting”. Essentially this means two levels of probability: the probabilities of the next likely words mixed with the probability of a lower-ranked word being selected.

On a more philosophical level, much has been made of the emergent abilities of LLMs, which at certain scales appear to suddenly exhibit abilities like natural language understanding, logical reasoning or even conscience. But do they? Schaeffer, Miranda and Koyejo (2023 / pre-print) call this view into question, arguing that “emergent” abilities are simply a result of nonlinear/discontinuous metrics and completely disappear if you change the evaluation method.

In brief, Transformers try to guess the next word based on the input prompt, given the probabilities it has learned from large amounts of text. This also means that Transformers create plausible text, but at the fundamental technical level do not understand it (Bender et al 2021), whatever “understanding” may mean in this context. Nor does the underlying technology provide them with any reasoning capabilities, except by accidental and plausible regurgitation of training data (Shah and Bender 2022: 222).

Where does that leave us? There are plenty of theoretical reasons to suppose that LLMs are unlikely to be some kind of super-intelligence, but instead are more akin to “stochastic parrots” (Bender et al 2021) or “spicy autocomplete”.¹

I am not an expert in LLMs so I cannot speak to the technical and mathematical quality of the critiques, but it seems rather likely that a huge string prediction engine trained on (allegedly) mostly internet text would turn out to be nothing more than a huge string prediction engine producing remixed internet text.

Large Language Models in the Legal Domain Link to heading

The legal profession has never had access to one or more all-powerful, all-knowing AIs before (if they are that), nor to fancy string-prediction enginees (my guess). And yet, the work is still getting done. So what would you use LLMs for?

Some tasks one could apply LLMs to include ideation, legal research, legal drafting, summarization and programming for lawyers. Naturally, there are many more potential applications. This list is certainly not comprehensive, but rather an opportunity to structure my own thinking on this topic.

Ideating Link to heading

Ideating seems like an ideal use case for LLMs. You throw a prompt at your favorite chatbot, the chatbot replies with a couple of ideas, you pick your favorite and voilà, your fantastic new project is out of the brainstorming/-writing phase and on its way to becoming reality.

At this stage of the process, truth, safety and copyright aren’t an issue. Whether something actually works isn’t a concern during brainstorming, as long as it plausibly could and can make the shortlist for further investigation. Safety is also not (yet) engaged. Everything occurs under the purview of a human and can only become reality through human action. Finally, ideas aren’t copyrightable in general, no matter where the idea is taken from.

If academic norms apply to you, you might get in trouble for plagiarism, but this a professional code of ethics that is enforced with professional censure, not legal sanctions (usually).

I assume this might be viable use case, depending on your needs? I’ve tried ideating with some of the publicly available chatbots and in my (very limited!) experience it was rather…disappointing. The results were sort of acceptable, but if you’ve spent any amount of time ruminating on your problem the results weren’t any kind of game-changer.

This is probably one of the real limitiations of ideating with LLMs: the suggestions are derivative of whatever exists in its training data. However, we need creative thinking most when we have new problems that need new solutions. Of course human creativity is also derivate of existing cultural patterns, but humans have access to their own lived experience to add a very recent and unique twist to their thoughts, something which is not represented well in textual data alone.

Maybe LLM ideating will be useful? For some people? I’m not sure, but if you find that the more pedestrian creativity hacks like talking a walk, making a cup of tea, talking to people or having some fun don’t work for you I suppose it could become a tool in your arsenal.

Legal Research Link to heading

Legal research is arguably the hardest problem for an LLM to tackle. The current transformer technology is, at its core, nothing more than a prediction of the next likeliest word given its input context window, with some additional randomness thrown in (“temperature”) to make the text a little spicier. You might get what you are looking for from your prompt, but maybe there is a 80% chance it’ll add some spice to your results to make them more fun, but less relevant to you. Recall the ChatGPT lawyer. If you present fake citations and fake direct quotes to an irate judge, spice will not make them play nice.

Legal research — and more generally, information retrieval (IR) — has two fundamental requirements (Shah and Bender 2022):

Truth
Relevance

If you go looking for something, your result must should be both true and relevant to your search to be useful. If it is untrue, but relevant, you have not retrieved anything, you have created something. If it is true, but irrelevant, you may have learned something new about the world, but not solved your problem.

Truth Link to heading

Truth in this sense can be either correspondence (correspond to the “real” world, whatever that may be) or coherence (deriving from an accepted body of knowledge). A true correspondence fact might be a the location of a physical item, a true coherence fact might be the authentic text of a judgment of the European Court of Human Rights, irrespective of whether the content corresponds to reality or not. The kind of truth you are seeking depends on your problem and your associated query.

In the legal domain, the sources of truth for LLMs are both very good and very bad at the same time.

The source of truth are bad, since the training data most likely contains all or most of the text on the internet and if you believe everything on the internet you probably have (a) never been online before or (b) are terribly naive. This internet text also contains some of the most offensive and immoral content on the planet, which can only be imperfectly removed from the training data through automatic means and surfaces every once in a while through clever prompting or sheer accident.

The sources of truth are also very good at the same time, because (modern) law is one of the few domains that possesses one or more authorized corpora of knowledge (laws, regulations, judgments, other government documents) that are both hierarchical and “true” by fiat. You may feel incandescent rage at the holdings and reasoning of certain Supreme Court judgments, but it doesn’t change the fact that they form authentic interpretations of the law of the land and you must either adhere to them or oppose them as such — but you cannot ignore their power.

Now, what happens if you combine bad sources of truth and good sources of truth? It is really anyone’s guess what kind of fuzzy truth value an individual query will produce and I think that is part of the problem.

If the query produces novel text that is supposed to contain the answer to your question, every word, every sentence, is potentially suspect and will require you to verify its truth value manually. If the answer to the query is trivial and you can verify it immediately, you probably never needed to search for it with an LLM in the first place. If confirming the veracity of the result is difficult, you will need to rely on a second search engine or other outside knowledge to confirm the truth value, which is silly and time-consuming.

In most of the interesting cases it will also be hard to decide whether something is true or false in a binary sense, but our willingness to accept something provisionally as truth will depend on the quality of the methods underlying the result. This applies to most empirical research in the natural and social sciences. If you generate a plausible result with a rigorous randomized controlled trial I will be more inclined to believe it than if you probabilistically recombined some input text and claim that your AI god delivered it from heaven in response to your prompting prayer.

Since it is computatonally infeasible or philosophically impossible to compile a complete list of all possible queries with all possible responses, it also not possible to test either the whole universe of queries or even a representative sample, because there is no sampling frame. No matter how much testing one does, it always remains possible to end up in some kind of local minimum were the training data is sparse and truly bizare results occur.

And even if it were possible to appropriately sample and test the model, the complex end-to-end interrelations between everything in such a model mean that everytime the training data changes substantially, one would have to rerun all tests to discover how likely a truthful response might be.

As the ChatGPT lawyer learned, even if the machine appears to be quoting from a judgment and swears to its correctness, you cannot trust it. I am therefore skeptical that this mode of operation will enhance legal research, simply because the cost-benefit ratio of checking the full output will be worse than performing the research some other way. Essentially, you are doing the research twice.

If the query produces links to a document or provides a selection of authorized excerpts of a document from a trusted database and ranks them for you, I could see LLMs serving a role to meet legal research needs. But this raises the question of relevance.

Relevance Link to heading

Finding relevant legal documents is no fun. Or rather, it is no fun with the commercial and non-commercial legal databases I have had access to in my legal life until now. Let us name no names, because hell hath no fury like a database scorned.

If you know exactly what you are looking for, most legal databases have worked fine for me, with some notable exceptions. With a few databases, even a single typo in the docket number can invalidate your query (looking at you, official German federal court databases).

Given that it is 2023, this is awful, but it is how it is. If the databases are publicly indexable, it is often easier to run a Google search and then click the deeplink.

Now, this would appear to provide an excellent opportunity for an LLM to steal the spotlight by providing search that understands what you really want and to provide it to you. Again, if it provides you with a custom text to serve your answer, everything is suspect. But what if it only selects from a finite set of responses from an approved database (e.g. documents or parts of documents)?

Shah and Bender (2022) have commented much more eloquently and expertly on this problem than I could, so I refer you to their paper for a detailed critique. As I understand the paper, search is a complex activity which relies on much information that cannot easily be transmitted to or reasoned over by the executing machine and therefore the machine alone cannot fully replicate the user’s close cooperation, context-bound information needs and regular updating of relevance considerations in response to the machine’s output.

In terms of opacity, the search functionalities of commercial and non-commercial databases are already so opaque that I do not believe that an LLM would add an extra layer of inscrutability, certainly no more than any other algorithms available.

So. Retrieving approved information objects from a database ranked by relevance via LLM might work? I am told it would be computationally more expensive than current approaches, but I am not expert enough to comment on their efficiency. Time will tell, I think.

Personally, I use explicit computer-assisted search functionalities almost exclusively to locate pieces of information that I already know or suspect to exist. Of the three search scenarios mentioned in Shah and Bender (2022: 225), I usually only carry out computer search to accomplish specific tasks. For exploration and learning I find a more social and linked reading search on Connected Papers or Wikipedia to be much more useful, followed by books, review papers, newspaper articles and information nuggets gleaned from social media. This mirrors the search patterns described by Shah and Bender (2022: 223–224)

I’m not sure that I would profit much from LLM search, but I have been wrong before.

Legal Drafting Link to heading

Blank page syndrome is bad, writer’s block is worse. Everyone who has written more than a few paragraphs knows the dreaded nothingness of a white screen on which our lucid thoughts and witty banter should be surfacing in pleasing regularity.². And if you stare too long into the abyss of the editor, the editor stares back at you.

Writer’s block is a serious problem. I have been waiting for more than 12 years for Patrick Rothfuss to finish Doors of Stone. Clients do not have 12 years. Sometimes 12 hours are too long. Perhaps an LLM can speed up this process, at least for the busy lawyer?

It depends. Legal drafting knows roughly three archetypes:

Templating
Modular drafting (boilerplate)
Bespoke drafting

Templates Link to heading

Templates are one of the foundations of the legal world. Templates save time and money, ensure consistent style and reduce legal risk through tried-and-court-tested language. The archetypical template is a complete and internally consistent document designed for a specific purpose (e.g. a non-disclose agreement, but also model laws or instruments of ratification) and only contains some variables such as the names of the parties, addresses and fields for dates and signatures. You create it once and you’re done (the exception being updates due to changes in legal or factual circumstances). Variables can be filled automatically from a database or mindlessly by hand.

The promise of LLMs includes the ability to call up custom templates for any kind of legal situation at a moment’s notice. Unfortunately, producing a perfect, ready-to-use and court-proof template is (currently) out of scope for LLMs. The stochastic nature of the text-generating process means that every word is possibly suspect, so the entire text needs to be double-checked by a legal professional with appropriate expertise.

Whereas standard templates have almost zero marginal labor cost, LLM-generated templates incur a non-trivial cost with each usage, making LLMs unsuitable for on-demand archetypical templates.

Modular Drafting Link to heading

Full templates are quite inflexible and therefore only applicable to a limited class of problems that is easily automated and requires next to no judgment calls. Most legal problems are routine, but do require some measure of customized judgment. Also, lawyers need to keep busy to justify their significant salaries, so “templating and chill” will not satisfy internal and external clients.

Modular drafting is a form of semi-templating. The building blocks (modules) are themselves small templates (boilerplate) and then recombined according to data-driven rules at scale or human judgment in an individual workflow. These building blocks could be single sentences, but also paragraphs, whole sections or even entire documents. Like full templates, these modules may be static or contain variables for fields such as dates, currency, names etc.

Modern contract generators and contract lifecycle management (CLM) systems exemplify this archetype. A database of pre-approved clauses is available, which are selected through automated rules (auto-templates), drag and drop, or by a user working their way through a guided questionnaire. The database manager can then provide suggested clauses (users who used clause A may also want clause B), prohibit combinations (it may be illegal to combine X with Y), keep the database under centralized version control, roll out updates quickly and perform statistical analysis on usage and gauge legal exposure across an entire organization.

As I see it, there are two ways to integrate LLMs into modular drafting:

Insofar as modular drafting with LLMs is performed in a manner similar to templating (e.g. LLM-generating building blocks and then custom-combine or LLM-combine them), the issues are the same. Each final document would need to be expert-checked, defeating the speed and cost benefits of modular drafting.
However, LLMs can also be used elsewhere in the pipeline, such as through suggesting pre-approved modules and extracting variables from source documents (feature extraction) to produce a first draft. It is my understanding that these steps are integrated into the workflow underlying FRAUKE, an electronic drafting assistant for judgments in air passenger rights cases currently being trialled in the district court (Amtsgericht) of Frankfurt (Germany). However, I have no further insight into the project and do not know if FRAUKE uses an LLM or a different machine learning approach.

The second option does have some merit. I am slightly concerned about the use of LLMs as feature extractors, since standard annotated NLP workflows appear to perform better and errors are more easily corrected with updated annotations (Honnibal 2023). However, LLM results might be acceptable, since some human review is required in any case.

LLMs might also come in handy where there are no rule-based methods for extracting the desired features and the setting is zero-shot or few-short learning. Suggesting pre-approved modules is probably fine in terms of legal quality (although it might miss problematic combinations) and might also be acceptable in a normative sense, since both the modules and the final result are subject to human review.

Bespoke Drafting Link to heading

Bespoke drafting is the most time-consuming type of legal drafting, making the LLM shortcut an especially attractive option. However, it is also the most challenging variant, since the underlying problem is probably rare or unique and requires a custom solution. Examples include international treaties, acts of parliament, superior court judgments, high-value contracts and academic papers, but also less dramatic work products with more interpersonal character such as client correspondence, meeting records or memos.

Since a custom draft requires custom review and editing of the whole document in any case, using an LLM to produce a first draft or provide helpful suggestions might prove beneficial. Having an LLM produce an unchecked final draft is almost always a terrible idea, since its stochastic nature, disregard of truth and lack of legal subroutines will result in a document that is not fit for use. In the judiciary, a human decision (at least in Germany) is always required.

However, drafting is only part of the workload. The more complex types of text (e.g. judgments, normative documents) will need much time spent debating and negotiating their content with other humans, work that cannot easily (or at all) be delegated to an LLM. They may also be very research-intensive, so the actual bottleneck in document production may not even be the drafting step. In many cases drafting and research are also a combined activity and cannot be split, with one part being delegated. Complex correspondence or legal memos are no different. The actual time saved might be less than hoped for, particularly since LLM ouput can also be tricky to get right. Spending an hour on prompt engineering instead of drafting and research provides no benefit at all.

Simple correspondence might be a good use case for an LLM, since it requires little to no research and can be verified quickly. However, there is the very real danger of lawyers ‘fluffing’ up their texts with an LLM, which are then fed into the busy respondent’s LLM to compress them back to a readable size. The result is a loss of information, waste of time, money and energy, since LLMs solve problems they themselves created in the first place.

All of this applies to business, legal practice and public office. The norms governing the use of LLM-assistance in academic work (as opposed to LLMs as an object of study) are still very much in flux, but since there are no reliable ways of detecting generated text, I assume the use of LLMs will be tolerated in the long-term, since there is no way of barring them without unacceptable collateral social damage. I do feel that LLMs are at their weakest in academic settings, since they remix known content in a plausible manner but without being properly embedded in accepted bodies of knowledge or access to empirical reality (no, the internet is NOT an acceptable substitute for empirical reality). Although, they might be a boon to English as a Second Language speakers (ESL), who could gain access to more sophisticated phrasings, albeit likely at the price of misrepresenting their own work.

Mixed Drafting Link to heading

While you do encounter these pure archetypes, legal drafting is just as often a combination of two or three of them. You might combine several modular paragraphs with some bespoke drafting to build a full template for a certain class of tasks and then add some further bespoke drafting to individualize each document to a certain set of circumstances. This happens in mass litigation, but also in contract generation, if you have the authority to add custom modifications (as in consumer or small business settings).

With mixed drafting, one encounters all of the dangers of LLMs at the same time.

Furthermore, one risks the imposition of bias through the helpful suggestion of specific language. LLM training data is what humans have written and published, which is, naturally, biased and tends to privilege men over women, majorites over minorites, white people over BIPOC, able-bodied over the disabled, neuro-typical over neuro-divergent and so on. The sensitive, experienced and skilled writer may spot these subtle influences and correct them, but the less sensitive, the less experienced and the less skilled an LLM-assisted legal drafter is, the more these subtle biases will creep into their writing.

Or maybe they will do so anyway? Legal writing is often unnecessarily obtuse in an attempt by inexperienced writers to signal knowledge through formality (I have been guilty of this many times). This is not a new trend, but I assume LLMs would reinforce it.

LLMs reproduce averages and defaults. We know from decades of working with computers, that very few people are willing and able to challenge the defaults of programs.

Summarization Link to heading

The automatic summarization of texts seems useful at first glance. I have often been assured by business leaders that they would take an imperfect summary any day over having to read lengthy documents. However, as we know from academic papers, the devil is in the detail. Summarization is actually a form of information retrieval, but limited to a single document instead of a full database. The person reading the summary wants to know what is actually in the document (truth), but also what is most important (relevance).

Therefore, much of the conderations regarding information retrieval and research also apply to summarization. Relevance is human-made and can change in the blink of an eye due to outside circumstances, changes that are difficult to incorporate into an LLM quickly. Maybe this will change with smaller model sizes?.

Using an LLM for summarization seems feasible to discover the rough subject of the document, but I would not rely on it to extract key holdings or arguments.

Programming for Lawyers Link to heading

Programming for lawyers is a large maybe. I’ve been assured by friends and colleagues that using ChatGPT or GitHub Copilot is great for learning and assisted coding and has helped them very much. On the flip side, I’ve also seen someone on Mastodon mention (third-hand account) how a student deleted important data by overlooking an “rm -rf *” statement (or similar) in the generated code. For the non-programmers: this command recursively force-deletes all files in the current and all sub-directories where it is executed.

Personally I like puzzling out coding solutions myself and avoid even the basic assistance provided by IDEs like RStudio (I code in Emacs). So I am probably very much the wrong person to sweeping claims about the usefulness of LLM-assisted coding.

But I do have one point: the challenge in writing code is not writing the code. It’s figuring out what you want the code to do. Especially if it’s statistical programming and the code is doing some complicated math.

If you want to generate a plot and format it nicely, that might work. But it might also do something wrong and unless you understand rather well what the code does, the resulting diagram might completely mislead you.

It’s even worse with statistical modeling. Consider a simple case: linear regression. Of course you can have it create the model for you based on a rough description, run the code and explain the results. But should you trust its output? I think not.

To be fair, there is enough linear regression content on the internet that it might do a fair job on toy problems. But as soon as you move off the well-trodden path (as you most likely will need to do for a new problem), all bets are off. There are also many more considerations to modeling, such as feature extraction, feature selection, data quality, measurement error, missing data, causal diagrams, researcher degrees of freedom and even the elephant in the room of whether frequentist statistics is appropriate to your research question at all.

Statistical programming already suffers from too much mindless ritual and too little careful thought. I doubt that LLMs will improve the situation.

Conclusion Link to heading

LLMs and transformer technology are here to stay, although I have serious doubts as to whether they will improve the practice or business of law as much as their proponents claim. I do see some modest opportunities, but nowhere near the potential that evangelists preach. Mostly, I think, people will be disappointed.

If you have well-functioning data infrastructure, a competent data team and well-trained end-users there should be a fair number of possibilities one could explore. But who can afford all three?

The more complex a legal task, the more likely we are to look for some kind of magic to solve our problems. But the quality of Transformers appears to degrade with increasing task complexity (Dziri et al 2023 / pre-print). We also stand at the cusp of an era in which LLM-generated text will vastly outstrip human text, probably causing catastrophic failures in future LLMs due to training new models on auto-generated LLM text (Shumailov et al 2023 / pre-print). Down-sizing models also appears to be more difficult than first thought (Gudibande et al 2023 / pre-print). Securing models is unsuprisingly problematic, given their size, opacity and complexity (El-Mhamdi et al 2023 / pre-print). The hype around LLMs as some sort of novel intelligence also completely ignores the fact that they are more akin to data-washing, hiding the immense amount of “data labor” performed by low-paid gig workers or outright data theft from skilled authors and artists that goes into providing their fuel (Li et al 2023 / pre-print)

If you take into account these limitations in producing quality output, the sheer amount of work and therefore cost (infrastructure, data labor, end-user training) required to make LLMs perform well and their likely degradation in the future, this will probably limit their medium-term to long-term applications.

Naturally, technology as a whole always improves, but this does not mean that this specific technology will. Remember Blockchain, which started as the future of finance and is still as energy-hungry and inefficient as ever. Despite all the advances in AR/VR technology the Metaverse still remains useless and in solid “Ok, Boomer” territory.

Of course everything might turn out differently. Predictions are hard — especially if they concern the future.³

Acknowledgements Link to heading

I am very grateful to the work that Emily Bender, Timnit Gebru, Margaret Mitchell and their colleagues at the Distributed AI Resarch Institute (DAIR) have been doing in this space. I do not know them personally, but their work has significantly informed and influenced my thinking on the subject of this essay.

This essay contains no LLM-generated text and especially no tiresome “did you notice that what you were reading was generated by AI” tropes.

This is is attributed to someone, but I can’t find a primordial citation. ↩︎
To be fair, I wrote this text in Emacs with the Wombat theme, so it is more of a very dark gray screen. ↩︎
Attributing this aphorism opens up an interesting rabbithole: https://quoteinvestigator.com/2013/10/20/no-predict/ ↩︎