When The Atlantic revealed last month that tens of thousands of books published in the past 20 years had been used without permission to train Meta’s AI language model, well-known authors were outraged, calling it a “smoking gun” for mega-corporate misbehavior. Now that the magazine has put out a searchable database of affected books, the outrage is redoubled: “I would never have consented for Meta to train AI on any of my books, let alone five of them,” wrote the novelist Lauren Groff. “Hyperventilating.” The original Atlantic story gestured at this sense of violation and affront: “The future promised by AI is written with stolen words,” it said.
I understand that the database in question, called “Books3,” appears to have been assembled from torrented ebooks ripped into text files, in which case any use of it could be a breach of copyright. Still I was mystified, at first, by the Sturm und Drang response, and by the claim that generative AI is “powered by mass theft.” Perhaps I was just jealous of the famous writers who were being singled out as victims—Stephen King, Zadie Smith, Michael Pollan, and others who command huge speaking fees and lucrative secondary-rights deals. Maybe I’d better understand the writers’ angst, I thought, if my work, too, was being pirated and sourced for AI power.
Now I know that it is. Yesterday, when I put my name into The Atlantic’s database search, three of the 10 books I have authored or co-authored appeared. How exciting! I’d joined the ranks of the aggrieved. But then, despite some effort, I found myself disappointingly unaggrieved. What on earth was wrong with me?
Authors who are angry—authors who are effing furious—have pointed to the fact that their work was used without permission. That is also at the heart of a lawsuit filed in California by the comedian Sarah Silverman and two other authors, Richard Kadrey and Christopher Golden, which contends that Meta failed to seek out their consent before extracting snippets of their text, called “tokens,” for use in teaching its AI. The company used their books in ways the authors didn’t anticipate and, upon consideration, in ways they don’t approve of. (Meta has filed a motion to dismiss the suit.)
Whether or not Meta’s behavior amounts to infringement is a matter for the courts to decide. Permission is a different matter. One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways. The philosopher Jacques Derrida liked to talk about “dissemination,” which I take to mean that, like a plant releasing its seed, an author separates from their published work. Their readers (or viewers, or listeners) not only can but must make sense of that work in different contexts. A retiree cracks a Haruki Murakami novel recommended by a grandchild. A high-school kid skims Shakespeare for a class. My mother’s tree trimmer reads my book on play at her suggestion. A lack of permission underlies all of these uses, as it underlies influence in general: When successful, art exceeds its creator’s plans.
But internet culture recasts permission as a moral right. Many authors are online, and they can tell you if and when you’re wrong about their work. Also online are swarms of fans who will evangelize their received ideas of what a book, a movie, or an album really means and snuff out the “wrong” accounts. The Books3 imbroglio reflects the same impulse to believe that some interpretations of a work are out of bounds.
Perhaps Meta is an unappealing reader. Perhaps chopping prose into tokens is not how I would like to be read. But then, who am I to say what my work is good for, how it might benefit someone—even a near-trillion-dollar company? To bemoan this one unexpected use for my writing is to undermine all of the other unexpected uses for it. Speaking as a writer, that makes me feel bad.
I also feel—am I allowed to say this?—a little bored by the idea that Meta has stolen my life. If the theft and aggregation of the works in Books3 is objectionable on moral or legal grounds, then it ought to be so irrespective of those works’ absorption into one particular technology company’s large language model. But that doesn’t seem to be the case. The Books3 database was itself uploaded in resistance to the corporate juggernauts. The person who first posted the repository has described it as the only way for open-source, grassroots AI projects to compete with huge commercial enterprises. He was trying to return some control of the future to ordinary people, including book authors. In the meantime, Meta contends that the next generation of its AI model—which may or may not still include Books3 in its training data—is “free for research and commercial use,” a statement that demands scrutiny but also complicates this saga. So does the fact that hours after The Atlantic published a search tool for Books3, one writer distributed a link that allows you to access the feature without subscribing to this magazine. In other words: a free way for people to be outraged about people getting writers’ work for free.
I’m not sure what I make of all this, as a citizen of the future no less than as a book author. Theft is an original sin of the internet. Sometimes we call it piracy (when software is uploaded to Usenet, or books to Books3); other times it’s seen as innovation (when Google processed and indexed the entire internet without permission) or even liberation. AI merely iterates this ambiguity. I’m having trouble drawing any novel or definitive conclusions about the Books3 story based on the day-old knowledge that some of my writing, along with trillions more chunks of words from, perhaps, Amazon reviews and Reddit grouses, have made their way into an AI training set.
Actually, what about those Amazon reviewers and Redditors? What about the Wikipedia authors who labored to write the pages for Bratz dolls and the Bosc pear, or the bloggers whose blogs were long abandoned, or the corporate-brochure copywriters, or, heck, even the search-engine-optimization landfill dumpers? All of their work likely has been or will be sucked into the giant language models too. The total volume of textual material accessible and accessed for training AI models makes books—even nearly 200,000 of them—seem a speck by comparison.
It is understandable, I suppose, to hold literary works in greater esteem than banana-bread-recipe introductions or Am I the Asshole subreddit posts or water-inlet-valve-replacement instructions. But it is also pretentious. We who write and publish magazines and books are professionals with a personal stake in the gravity of authorship. We are also few in number. Almost anyone can write, over years, millions of words on social media, in texts and emails, in reports and memos for their work. I love books and respect them, but, as a published author and professional writer, I may be in the category least at risk of losing my connection to the written word and its spoils. If an AI collage of Stephen King and Yelp can do better than me, what business do I have calling myself a writer in the first place?
I became an author because language offers a special medium for experimenting with ideas. Words and sentences are malleable. Texts arise from basements of subtext. What I say embraces what I don’t and makes room for what you read. Once bound and published, boxed and shipped, my books find their way to places I might never have anticipated. As vessels for ideas, I hope, but also as doorstops or insect-execution devices or as the last inch of a stack that holds up a laptop for an important Zoom. Or even—even!—as a litany of tokens, chunked apart to be reassembled by the alien mind of a weird machine. Why not? I am an author, sure, but I am also a man who put some words in order amid the uncountable others who have done the same. If authorship is nothing more than vanity, then let the machines put us out of our misery.