AI and Copyright
Techdirt: “Judge: Just Because AI Trains On Your Publication, Doesn’t Mean It Infringes On Your Copyright”
Part of the problem is that these lawsuits assume, incorrectly, that these AI services really are, as some people falsely call them, “plagiarism machines.” The assumption is that they’re just copying everything and then handing out snippets of it.
My five cents:
- Copyright means explicitly the right to copy. AI doesn't copy; it produces a summary of your content and draws a resultant, a vector, telling people what you meant (ideally) and what the circumstances were when you meant it. Theoretically, it is a basic reading/understanding process, similar to students': they don’t copy while they read and learn; they ingest existing data to produce solutions to new problems.
Imagine a professor suing their students for copyright infringement because they used the information the professor authored to answer other people’s questions (mutatis mutandis). I know a professor does this willingly, but the technical process is the same. We’ll get to the "willing" part later. - Money: AIs are making money indirectly from your content by communicating your resultant to the public. Somehow, students do the same when they get hired, don’t they? Again, we come down to intention and responsibility: a student is meant to do that, and society agrees with the idea of studying somebody else’s copyrighted material, while AIs seem more like money-making tools supported by massive corporations.
- No-quote: For a content creator who depends on being linked to their content, this is a showstopper.
- When AIs show resultants, they generate an average of an enormous number of quotes and texts (words, tokens) gathered together with their most likely combination. They cannot pinpoint your exact content to quote. It’s like asking someone what color the handful of sand they are holding is, then asking them to pick out the exact grain that came from Florida.
- There appears to be no way of convincing an AI to quote you from its statistics unless there is manual intervention.
- Therefore, you either allow that AI to summarize/vectorize your content or you don’t.
- Copyright is protection against human theft, meant to prevent people from copying without attributing ownership. AIs cannot detect ownership unless they are specifically asked to mention particular, unique cases. When scaling AIs up (which is the main thing with AIs), this is no longer a solution for solving copyright infringements. You cannot ask a statistician to name individuals.
- Content creators need to separate copyrights for humans from copyrights for AIs. When AIs process your content, you get statistics—very different from the direct gain humans get (plagiarism). Content should be marked with Creative Commons for Machines and, separately, Creative Commons for people.
- As a side note: The harm has already been done—the AIs have already parsed your existing content. Judges will never side with you when AI’s stakes are so high. You might be considered a good citizen, creative and whatnot, but your hens have already been stolen and eaten by a wolf that is preparing to become a herding dog.