Uhhrrr 26 minutes ago

From Vinge's "Rainbow's End":

> In fact this business was the ultimate in deconstruction: First one and then the other would pull books off the racks and toss them into the shredder's maw. The maintenance labels made calm phrases of the horror: The raging maw was a "NaviCloud custom debinder." The fabric tunnel that stretched out behind it was a "camera tunnel...." The shredded fragments of books and magazine flew down the tunnel like leaves in tornado, twisting and tumbling. The inside of the fabric was stitched with thousands of tiny cameras. The shreds were being photographed again and again, from every angle and orientation, till finally the torn leaves dropped into a bin just in front of Robert. Rescued data. BRRRRAP! The monster advanced another foot into the stacks, leaving another foot of empty shelves behind it.

dehrmann an hour ago

The important parts:

> Alsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use

> "All Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies"

It was always somewhat obvious that pirating a library would be copyright infringement. The interesting findings here are that scanning and digitizing a library for internal use is OK, and using it to train models is fair use.

  • 6gvONxR4sf7o an hour ago

    You skipped quotes about the other important side:

    > But Alsup drew a firm line when it came to piracy.

    > "Anthropic had no entitlement to use pirated copies for its central library," Alsup wrote. "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy."

    That is, he ruled that

    - buying, physically cutting up, physically digitizing books, and using them for training is fair use

    - pirating the books for their digital library is not fair use.

    • jonas21 3 minutes ago

      As they mentioned, the piracy part is obvious. It's the fair use part that will set an important precedent for being able to train on copyrighted works as long as you have legally acquired a copy.

    • throwawayffffas 41 minutes ago

      So all they have to do is go and buy a copy of each book they pirated. They will have ceased and desisted.

      • superfrank 30 minutes ago

        I'm trying to find the quote, but I'm pretty sure the judge specifically said that going and buying the book after the fact won't absolve them of liability. He said that for the books they pirated they broke the law and should stand trial for that and they cannot go back and un-break in by buying a copy now.

        Found it: https://www.nbcnews.com/tech/tech-news/federal-judge-rules-c...

        > “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft,” [Judge] Alsup wrote, “but it may affect the extent of statutory damages.”

      • dragonwriter 18 minutes ago

        > So all they have to do is go and buy a copy of each book they pirated.

        No, that doesn't undo the infringement. At most, that would mitigate actual damages, but actual damages aren't likely to be important, given that statutory damages are an alternative and are likely to dwarf actual damages. (It may also figure into how the court assigns statutory damages within the very large range available for those, but that range does not go down to $0.)

        > They will have ceased and desisted.

        "Cease and desist" is just to stop incurring additional liability. (A potential plaintiff may accept that as sufficient to not sue if a request is made and the potential defendant complies, because litigation is uncertain and expensive. But "cease and desist" doesn't undo wrongs and neutralize liability when they've already been sued over.)

  • jpalawaga an hour ago

    I don't think that's new. google set precedent for that more than a decade ago. you're allowed to transform a book to digital.

bgwalter 3 hours ago

Here is how individuals are treated for massive copyright infringement:

https://investors.autodesk.com/news-releases/news-release-de...

  • JimDabell 2 hours ago

    > illegally copying and selling pirated software

    This is very different to what Anthropic did. Nobody was buying copies of books from Anthropic instead of the copyright holder.

    • rvnx 26 minutes ago

      At the very least, they should have purchased the originals once

      • arandomhuman 12 minutes ago

        Yeah, people have gone to jail for a few copies of content. Taking that large of a corpus and getting off without penalty would be a farce of the justice system.

  • farceSpherule an hour ago

    Peterson was copying and selling pirated software.

    Come up with a better comparison.

    • organsnyder an hour ago

      Anthropic is selling a service that incorporates these pirated works.

      • adolph 23 minutes ago

        That a service incorporating the authors' works exists is not at issue. The plaintiffs' claims are, as summarized by Alsup:

          First, Authors argue that using works to train Claude’s underlying LLMs 
          was like using works to train any person to read and write, so Authors 
          should be able to exclude Anthropic from this use (Opp. 16). 
        
          Second, to that last point, Authors further argue that the training was 
          intended to memorize their works’ creative elements — not just their 
          works’ non-protectable ones (Opp. 17).
        
          Third, Authors next argue that computers nonetheless should not be 
          allowed to do what people do. 
        
        https://media.npr.org/assets/artslife/arts/2025/order.pdf
        • codedokode 13 minutes ago

          Computers cannot learn and are not subjects to laws. What happens, is a human takes a copyrighted work, makes an unauthorized digital copy, and loads it into a computer without authorization from copyright owner.

  • nh23423fefe 2 hours ago

    What point are you making? 20 years ago, someone sold pirated copies of software (wheres the transformation here) and that's the same as using books in a training set? Judge already said reading isnt infringement.

    This is reaching at best.

  • chourobin 2 hours ago

    copyright is not the same as piracy

    • asadotzler 2 hours ago

      piracy isn't a thing, except on the high seas. what you're thinking about is copyright violation.

      • downrightmike 2 hours ago

        Yup, piracy sounds better than copyright violation.

        “Piracy” is mostly a rhetorical term in the context of copyright. Legally, it’s still called infringement or unauthorized copying. But industries and lobbying groups (e.g., RIAA, MPAA) have favored “piracy” for its emotional weight.

        • collingreen 2 hours ago

          Emotional weight or because it's intentionally misleading.

        • admissionsguy an hour ago

          Does piracy have negative connotations? I thought everyone thought pirates were cool

    • achierius 2 hours ago

      Can you explain why? What makes them categorically different or at the very least why is "piracy" quantitatively worse than 'just' copyright violation?

      • arrosenberg 2 hours ago

        Piracy is theft - you have taken something and deprived the original owner of it.

        Copyright infringement is unauthorized reproduction - you have made a copy of something, but you have not deprived the original owner of it. At most, you denied them revenue although generally less than the offended party claims, since not all instances of copying would have otherwise resulted in a sale.

        • fuzzfactor an hour ago

          I have about the same concept of piracy these days.

          Real piracy always involves booty.

          Naturally booty is wealth that has been hoarded.

          Has nothing to do with wealth that may or may not come in the future, regardless of whether any losses due to piracy have taken place already or not.

      • NoMoreNicksLeft an hour ago

        Asked unironically: "What's worse, hijacking ships at sea and holding their crews hostage for ransom on threat of death, or downloading a song off the internet?" ...

      • charcircuit 2 hours ago

        Saying that piracy isn't copyright violation is an RMS talking point. It's not worth trying to ask why because the answer will be RMS said so and will not be backed by the common usage of the word.

        • buzzerbetrayed an hour ago

          You legitimately have it completely backwards. The word "piracy" was coopted to put a more severe spin on copyright violation. As a result, it became "the common usage of the word". But that was by design. And it's worth pushing back on.

          • carlhjerpe an hour ago

            Sweden has a political party called "The Pirate Party"(1), and "The Pirate Bay" is Swedish so I think a couple of Swedes memeing before it was cool has a significant impact on making the name stick but also taking the seriousness out of it.

            1: https://piratpartiet.se/en/

          • charcircuit an hour ago

            I don't have it backwards. Language evolved, and piracy got a new definition. It's even in the dictionary. Trying to redefine words like this is futile and avoiding certain words or replacing them with others is a quirk that RMS has.

codedokode 31 minutes ago

If AI companies are allowed to use pirated material to create their products, does it mean that everyone can use pirated software to create products? Where is the line?

Also please don't use word "learning", use "creating software using copyrighted materials".

Also let's think together how can we prevent AI companies from using our work using technical measures if the law doesn't work?

  • rvnx 26 minutes ago

    ~1B USD in cash is the line where laws apply very differently

  • redcobra762 14 minutes ago

    It's abusive and wrong to try and prevent AI companies from using your works at all.

    The whole point of copyright is to ensure you're paid for your work. AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.

    If that LLM reproduces your work, then the AI company is violating copyright, but if the LLM doesn't reproduce your work, then you have not been harmed. Trying to claim harm when you haven't been due to some philosophical difference in opinion with the AI company is an abuse of the courts.

    • codedokode a minute ago

      It is not wrong at all. The author decides what to do with their work. AI companies are rich and can simply buy the rights or hire people to create works.

      I could agree with exceptions for non-commercial activity like scientific research, but AI companies are made for extracting profits and not for doing research.

      > AI companies shouldn't pirate, but if they pay for your work, they should be able to use it however they please, including training an LLM on it.

      It doesn't work this way. If you buy a movie it doesn't mean you can sell goods with movie characters.

      > then you have not been harmed.

      I am harmed because less people will buy the book of they can simply get an answer from LLM. Maybe instead of books we should start making applications that protect the content and do not allow copying text or making screenshots.

guywithahat an hour ago

If you own a book, it should be legal for your computer to take a picture of it. I honestly feel bad for some of these AI companies because the rules around copyright are changing just to target them. I don't owe copyright to every book I read because I may subconsciously incorporate their ideas into my future work.

  • Bjorkbat 31 minutes ago

    Something missed in arguments such as these is that in measuring fair use there's a consideration of impact on the potential market for a rightsholder's present and future works. In other words, can it be proven that what you are doing is meaningfully depriving the author of future income.

    Now, in theory, you learning from an author's works and competing with them in the same market could meaningfully deprive them of income, but it's a very difficult argument to prove.

    On the other hand, with AI companies it's an easier argument to make. If Anthropic trained on all of your books (which is somewhat likely if you're a fairly popular author) and you saw a substantial loss of income after the release of one of their better models (presumably because people are just using the LLM to write their own stories rather than buy your stuff), then it's a little bit easier to connect the dots. A company used your works to build a machine that competes with you, which arguably violates the fair use principle.

    Gets to the very principle of copyright, which is that you shouldn't have to compete against "yourself" because someone copied you.

    • parliament32 11 minutes ago

      > a consideration of impact on the potential market for a rightsholder's present and future works

      This is one of those mental gymnastics exercises that makes copyright law so obtuse and effectively unenforceable.

      As an alternative, imagine a scriptwriter buys a textbook on orbital mechanics, while writing Gravity (2013). A large number of people watch the finished film, and learn something about orbital mechanics, therefore not needing the textbook anymore, causing a loss of revenue for the textbook author. Should the author be entitled to a percentage of Gravity's profit?

      We'd be better off abolishing everything related to copyright and IP law alltogether. These laws might've made sense back in the days of the printing press but they're just nonsensical nowadays.

  • raincole an hour ago

    Are we reading the same article? The article explicitly states that it's okay to cut up and scan the books you own to train a model from them.

    > I honestly feel bad for some of these AI companies because the rules around copyright are changing just to target them

    The ruling would be a huge win for AI companies if held. It's really weird that you reached the opposite conclusion.

  • atomicnumber3 37 minutes ago

    The core problem here is that copyright already doesn't actually follow any consistent logical reasoning. "Information wants to be free" and so on. So our own evaluation of whether anything is fair use or copyrighted or infringement thereof is always going to be exclusively dictated by whatever a judge's personal take on the pile of logical contradictions is. Remember, nominally, the sole purpose of copyright is not rooted in any notions of fairness or profitability or anything. It's specifically to incentivize innovation.

    So what is the right interpretation of the law with regards to how AI is using it? What better incentivizes innovation? Do we let AI companies scan everything because AI is innovative? Or do we think letting AI vacuum up creative works to then stochastically regurgitate tiny (or not so tiny) slices of them at a time will hurt innovation elsewhere?

    But obviously the real answer here is money. Copyright is powerful because monied interests want it to be. Now that copyright stands in the way of monied interests for perhaps the first time, we will see how dedicated we actually were to whatever justifications we've been seeing for DRM and copyright for the last several decades.

  • organsnyder an hour ago

    The difference here is that an LLM is a mechanical process. It may not be deterministic (at least, in a way that my brain understands determinism), but it's still a machine.

    What you're proposing is considering LLMs to be equal to humans when considering how original works are created. You could make the argument that LLM training data is no different from a human "training" themself over a lifetime of consuming content, but that's a philosophical argument that is at odds with our current legal understanding of copyright law.

    • kevinpet 39 minutes ago

      That's not a philosophical argument at odds with our current understanding of copyright law. That's exactly what this judge found copyright law currently is and it's quoted in the article being discussed.

  • rapind an hour ago

    Everything is different at scale. I'm not giving a specific opinion on copyright here, but it just doesn't make sense when we try to apply individual rights and rules to systems of massive scale.

    I really think we need to understand this as a society and also realize that moneyed interests will downplay this as much as possible. A lot of the problems we're having today are due to insufficient regulation differentiating between individuals and systems at scale.

  • zerotolerance 39 minutes ago

    "Judge says training Claude on books was fair use, but piracy wasn't."

marapuru 6 hours ago

Apparently it's a common business practice. Spotify (even though I can't find any proof) seems to have build their software and business on pirated music. There is some more in this Article [0].

https://torrentfreak.com/spotifys-beta-used-pirate-mp3-files...

Funky quote:

> Rumors that early versions of Spotify used ‘pirate’ MP3s have been floating around the Internet for years. People who had access to the service in the beginning later reported downloading tracks that contained ‘Scene’ labeling, tags, and formats, which are the tell-tale signs that content hadn’t been obtained officially.

  • techjamie 4 hours ago

    Crunchyroll was originally an anime piracy site that went legit and started actually licensing content later. They started in mid-2006, got VC funding in 2008, then made their first licensing deal in 2009.

    https://www.forbes.com/2009/08/04/online-anime-video-technol...

    https://venturebeat.com/business/crunchyroll-for-pirated-ani...

    • Cyph0n 2 hours ago

      Yep, they were huge too - virtually anyone who watched free anime back then would have known about them.

      My theory is that once they saw how much traffic they were getting, they realized how big of a market (subbed/dubbed) anime was.

    • haiku2077 3 hours ago

      Good Old Games started out with the founders selling pirated games on disc at local markets.

    • Shank an hour ago

      And now Crunchyroll is owned by (through a lot of companies, like Aniplex of America, Aniplex, A1 Pictures) Sony, who produces a large amount of anime!

  • dathinab 3 hours ago

    not just Spotify pretty much any (most?) current tech giant was build by

    - riding a wave of change

    - not caring too much about legal constraints (or like they would say now "distrupting" the market, which very very often means doing illigal shit which beings them far more money then any penalties they will ever face from it)

    - or caring about ethics too much

    - and for recent years (starting with Amazone) a lot of technically illegal financing (technically undercutting competitors prices long term based on money from else where (e.g. investors) is unfair competitive advantage (theoretically) clearly not allowed by anti monopoly laws. And before you often still had other monopoly issues (e.g. see wintel)

    So yes not systematic not complying with law to get unfair competitive advantage knowing that many of the laws are on the larger picture toothless when applied to huge companies is bread and butter work of US tech giants

    • benced an hour ago

      As you point out, they mostly did this before they were large companies (where the public choice questions are less problematic). Seems like the breaking of these laws was good for everybody.

  • pembrook an hour ago

    It wasn’t just the content being pirated, but the early Spotify UI was actually a 1:1 copy of Limewire.

  • pjc50 6 hours ago

    "recording obtained unofficially" and "doesn't have rights to the recording" are separate things. So they could well have got a license to stream a publisher's music but that didn't come with an actual copy of some/all of the music.

  • KoolKat23 6 hours ago

    There's plenty of startups gone legitimate.

    Society underestimates the chasm that exists between an idea and raising sufficient capital to act on those ideas.

    Plenty of people have ideas.

    We only really see those that successfully cross it.

    Small things EULA breaches, consumer licenses being used commercially for example.

    • hinterlands 3 hours ago

      The problem is that these "small things" are not necessarily small if you're an individual.

      If you're an individual pirating software or media, then from the rights owners' perspective, the most rational thing to do is to make an example of you. It doesn't happen everyday, but it does happen and it can destroy lives.

      If you're a corporation doing the same, the calculation is different. If you're small but growing, future revenues are worth more than the money that can be extracted out of you right now, so you might get a legal nastygram with an offer of a reasonable payment to bring you into compliance. And if you're already big enough to be scary, litigation might be just too expensive to the other side even if you answer the letter with "lol, get lost".

      Even in the worst case - if Anthropic loses and the company is fined or even shuttered (unlikely) - the people who participated in it are not going to be personally liable and they've in all likelihood already profited immensely.

    • dathinab 2 hours ago

      but it's not some small things

      but systematic wide spread big things and often many of them, giving US giant a unfair combative advantage

      and don't think if you are a EU company you can do the same in the US, nop nop

      but naturally the US insist that US companies can do that in the EU and complain every time a US company is fined for not complying for EU law

    • jowea 27 minutes ago

      Uber

    • Barrin92 an hour ago

      >Society underestimates the chasm that exists between an idea and raising sufficient capital to act on those ideas.

      The AI sector, famously known for its inability to raise funding. Anthropic has in the last four years raised 17 billion dollars

    • pyman 6 hours ago

      There's no credible evidence Spotify built their company and business on pirated music.

      This is a narrative that gets passed around in certain circles to justify stealing content.

      • YPPH 6 hours ago

        "Stealing" isn't an apt term here. Stealing a thing permanently deprives the owner of the thing. What you're describing is copyright infringement, not stealing.

        In this context, stealing is often used as a pejorative term to make piracy sound worse than it is. Except for mass distribution, piracy is often regarded as a civil wrong, and not a crime.

        • KoolKat23 6 hours ago

          Best/most succinct explanation I've seen to date.

        • pyman 6 hours ago

          Pirating a book and selling it on claude.ai is stealing, both legally and morally.

          Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work.

          Try doing that yourself and you'd get a knock on the door real quick.

          • Paradigma11 8 minutes ago

            There are tests that determine if a work infringes on the copyright of another. That is well established law. Just use that test and show that this work is infringing on that work. If you cant it doesn't.

          • KoolKat23 5 hours ago

            Properly remixing the content so that it can be considered distinct would be fair use. You can't copyright a style, concept or idea.

            Also mostly this would be a civil lawsuit for "damages".

            • pyman 5 hours ago

              It might be legal in the US, but not in the rest of the world.

              The trial is scheduled for December 2025. That’s when a jury will decide how much Anthropic owes for copying and storing over seven million pirated books

          • ungreased0675 2 hours ago

            There seems to be an unwritten rule for VC-backed tech companies, that if a law is broken at massive scale and very quickly, it’s ok. It’s the fait accompli strategy many of the large tech companies used to get where they are.

            Don’t have legal access to training data? Simply steal it, but move fast enough to keep ahead of the law. By the time lawsuits hit the company is worth billions and the product is embedded in everyday life.

      • lmm 3 hours ago

        > There's no credible evidence Spotify built their company and business on pirated music.

        That's a statement carefully crafted to be impossible to disprove. Of course they shipped pirated music (I've seen the files). Of course anyone paying attention knew. Nothing in the music industry was "clean" in those days. But, sure, no credible evidence because any evidence anyone shows you you'll decide is not credible. It's not in anyone's interests to say anything and none of it matters.

  • Workaccount2 2 hours ago

    The common meme is that megacorps are shamelessly criminalistic organizations that get away with doing anything they can to maximize profits, while true in some regard, totally pales in comparison to the illegal things small businesses and start-ups do.

  • NoMoreNicksLeft an hour ago

    This isn't as meaningful as it sounds. Nintendo was apparently using scene roms for one of the official emulators on Wii (I think?). Spotify might have received legally-obtained mp3s from the record companies that were originally pulled from Napster or whatever, because the people who work for record companies are lazy hypocrites.

  • reaperducer an hour ago

    Apparently it's a common business practice.

    It's not a common business practice. That's why it's considered newsworthy.

    People on the internet have forgotten that the news doesn't report everyday, normal, common things, or it would be nothing but a listing of people mowing their lawns or applying for business loans. The reason something is in the news is because it is unusual or remarkable.

    "I saw it online, so it must happen all the time" is a dopy lack of logic that infects society.

  • lysace an hour ago

    You are missing the point. Spotify had permission from the copyright holders and/or their national proxies to use those songs in a limited beta in Sweden. They didn't have access to clean audio data directly from the record companies, so in many cases they used pirated rips instead.

    What you really should be asking is whether they infringed on the copyrights of the rippers. /s

  • motbus3 6 hours ago

    They had a second company (which I don't remember the name) that allowed users to backup and share their music. When they were exposed they dug that as deep as they could

    • pyman 6 hours ago

      No. There's no credible evidence Spotify had any secret second company that allowed users to back up and share music without authorisation

  • pyman 6 hours ago

    It was the opposite. Their mission was to combat music piracy by offering a better, legal alternative.

    Daniel Ek said: "my mission is to make music accessible and legal to everyone, while ensuring artists and rights holders got paid"

    Also, the Swedish government has zero tolerance for piracy.

    • eviks 2 hours ago

      Mission is just words, they can mean the opposite of deeds, but they can't be the opposite, they live in different realms.

    • pyman 5 hours ago

      I know this might come as a shock to those living in San Francisco, but things are different in other parts of the world, like Uruguay, Sweden and the rest of Europe. From what I’ve read, the European committee actually cares about enforcing the law.

codedokode 22 minutes ago

> "Like any reader aspiring to be a writer, Anthropic's LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different," he wrote.

But this analogy seems wrong. First, LLM is not a human and cannot "learn" or "train" - only human can do it. And LLM developers are not aspiring to become writers and do not learn anything, they just want to profit by making software using copyrighted material. Also people do not read millions of books to become a writer.

pyman 7 hours ago

These are the people shaping the future of AI? What happened to all the ethical values they love to preach about?

We've held China accountable for counterfeiting products for decades and regulated their exports. So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?

  • ffsm8 3 hours ago

    > We've held China accountable for counterfeiting products for decades and regulated their exports

    We have? Are we from different multi-verses?

    The one I've lived in to date has not done anything against Chinese counterfeits beyond occasionally seizing counterfeit goods during import. But that's merely occasionally enforcing local counterfeit law, a far cry from punishing the entity producing it.

    As a matter of fact, the companies started outsourcing everything to China, making further IP theft and quasi-copies even easier

    • Workaccount2 2 hours ago

      I was gonna say, the enforcement is so weak that it's not even really worth it to pursue consumer hardware here in the US. Make product that is a hit, patent it, and still 1 month later IYTUOP will be selling an identical copy for 1/3rd the price on Amazon.

      • delfinom 2 hours ago

        Patent enforcement requires the patent holder to go after violators. The said thing is, there are grounds to sue Amazon facilitating it, just nobody has had the money to do it. And no big company ever will because of the threat of being locked out of AWS.

        It's quite the mafia operation over at Amazon.

  • wmf 2 hours ago

    The unethical ones didn't buy any books.

  • benjiro 3 hours ago

    One rule for you, one rule for me ...

    You never noticed the hypocrite behavior all over society?

    * O, you drunk drive, big fine, lots of trouble. * O, you drunk drive and are a senator, cop, mayor, ... Well, lets look the other way.

    * You have anger management issues and slam somebody to the ground. Jail time. * You as a cop have anger management issues and slams somebody to the ground. Well, paid time off while we investigate and maybe a reprimand. Qualified immunity boy!

    * You tax fraud for 10k, felony record, maybe jail time. * You as a exec of a company do tax fraud for 100 million. After 10 years lawyering around, maybe you get something, maybe, ... o, here is a fine of 5 million.

    I am sorry but the idea of everybody being equal under the law has always been a illusion.

    We are holding China accountable for counterfeiting products because it hurts OUR companies, and their income. But when its "us vs us", well, then it becomes a bit more messy and in general, those with the biggest backing (as in $$$, economic value, and lawyers), tends to win.

    Wait, if somebody steal my book, i can sue that person in court, and get a payout (lawyers will cost me more but that is not the point). If some AI company steals my book, well, the chance you win is close to 1%, simply because lots of well paid lawyers will make your winning hard to impossible.

    Our society has always been based upon power, wealth and influence. The more you have of it, the more you get away (or reduced) with things, that gets other fined or jailed.

  • seydor 3 hours ago

    break things and move fast

  • carlosjobim an hour ago

    Why is it unethical of them to use the information in all these books? They are clearly not reselling the books in any way, shape, or form. The information itself in a book can never be copyrighted. You can also publish and sell material where you quote other books within it.

  • lofaszvanitt 6 hours ago

    This is the underlying caste system coming to life right before your eyes :D.

    • stephenitis 5 hours ago

      I think caste system is the wrong analogy here.

      Comment is more about the pseudo ethical high ground

      • MangoToupe 3 hours ago

        Companies being above the law does create a stratified system in this country for those who can benefit from said companies and those who cannot. Call it what you like.

  • bmitc 2 hours ago

    Silicon Valley has always been the antithesis of ethics. It's foundations are much more right wing and libertarian, along the extremist lines.

  • DrillShopper 2 hours ago

    > So why should Anthropic be allowed to export their products and services after engaging in the same illegal activity?

    Rules don't apply to corporations making money for VCs.

    So it goes.

ramon156 7 hours ago

Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books. I'm not saying this is justified, but what would you have done in their situation?

Sayi "they have the money" is not an argument. It's about the amount of effort that is needed to individually buy, scan, process millions of pages. If that's done for you, why re-do it all?

  • pyman 6 hours ago

    The problem with this thinking is that hundreds of thousands of teachers who spent years writing great, useful books and sharing knowledge and wisdom probably won't sue a billion dollar company for stealing their work. What they'll likely do is stop writing altogether.

    I'm against Anthropic stealing teacher's work and discouraging them from ever writing again. Some teachers are already saying this (though probably not in California).

    • NoMoreNicksLeft an hour ago

      Stealing? In what way?

      Training a generative model on a book is the mechanical equivalent of having a human read the book and learn from it. Is it stealing if a person reads the book and learns from it?

    • lofaszvanitt 6 hours ago

      They won't be needed anymore, once singularity is reached. This might be their thought process. This also exemplifies that the loathed caste system found in India is indeed in place in western societies.

      There is no equality, and seemingly there are worker bees who can be exploited, and there are privileged ones, and of course there are the queens.

      • SketchySeaBeast 3 hours ago

        > They won't be needed anymore, once singularity is reached.

        And it just so happens that that belief says they can burn whatever they want down because something in the future might happen that absolves them of those crimes.

      • pyman 6 hours ago

        :D

        Note: My definition of singularity isn't the one they use in San Francisco. It's the moment founders who stole the life's work of thousands of teachers finally go to prison, and their datacentres get seized.

        • lofaszvanitt 6 hours ago

          You can bet that this never gonna happen...

          • covercash 3 hours ago

            When the rich and powerful face zero consequences for breaking laws and ignoring the social contracts that keep our society functioning, you wind up with extreme overcorrections. See Luigi.

            • achierius 2 hours ago

              How extreme is that, really? Not to justify murder: that is clearly bad. But "killing one man" is evidently something we, as a society, consider an "acceptable side-effect" when a corporation does it -- hell, you can kill thousands and get away scot-free if you're big enough.

              Luigi was peanuts in comparison.

              “THERE were two “Reigns of Terror,” if we would but remember it and consider it; the one wrought murder in hot passion, the other in heartless cold blood; the one lasted mere months, the other had lasted a thousand years; the one inflicted death upon ten thousand persons, the other upon a hundred millions; but our shudders are all for the “horrors” of the minor Terror, the momentary Terror, so to speak; whereas, what is the horror of swift death by the axe, compared with lifelong death from hunger, cold, insult, cruelty, and heart-break? What is swift death by lightning compared with death by slow fire at the stake? A city cemetery could contain the coffins filled by that brief Terror which we have all been so diligently taught to shiver at and mourn over; but all France could hardly contain the coffins filled by that older and real Terror—that unspeakably bitter and awful Terror which none of us has been taught to see in its vastness or pity as it deserves.”

              - Mark Twain

    • glimshe 6 hours ago

      That will be sad, although there will still be plenty of great people who will write books anyway.

      When it comes to a lot of these teachers, I'll say, copyright work hand in hand with college and school course book mandates. I've seen plenty of teachers making crazy money off students' backs due to these mandates.

      A lot of the content taught in undergrad and school hasn't changed in decades or even centuries. I think we have all the books we'll ever need in certain subjects already, but copyright keeps enriching people who write new versions of these.

    • CuriouslyC 6 hours ago

      If you care so little about writing that AI puts you off it, TBH you're probably not a great writer anyhow.

      Writers that have an authentic human voice and help people think about things in a new way will be fine for a while yet.

      • 4b11b4 4 hours ago

        Yeah, people will still want to write. They might need new ways to monetize it... that being said, even if people still want to write they may not consider it a viable path. Again, have to consider other monetization.

  • TimorousBestie 6 hours ago

    150K per work is the maximum fine for willful infringement (which this is).

    105B+ is more than Anthropic is worth on paper.

    Of course they’re not going to be charged to the fullest extent of the law, they’re not a teenager running Napster in the early 2000s.

    • voxic11 3 hours ago

      Even if they don't qualify for willful infringement damages (lets say they have a good faith belief their infringement was covered by fair use) the standard statutory damages for copyright infringement are $750-$30,000 per work.

  • glimshe 6 hours ago

    Isn't "pirating" a felony with jail time, though? That's what I remember from the FBI warning I had to see at the beginning of every DVD I bought (but not "pirated" ones).

    • voxic11 3 hours ago

      Yes criminal copyright infringement (willful copyright infringement done for commercial gain or at a large scale) is a felony.

    • pyman 6 hours ago

      Absolutely.

      Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work. Try doing that yourself and you'd get a knock on the door real quick.

      • dmix 3 hours ago

        A court just ruled on Anthropic and said an LLM response wasn't a form of counterfeiting (ie, essentially selling pirate books on the black market). Although tbf that is the most radical interpretation still being put forward by the lawyers of publishers like NYTimes, despite the obvious flaws.

  • suyjuris 6 hours ago

    Just downloading them is of course cheaper, but it is worth pointing out that, as the article states, they did also buy legitimate copies of millions of books. (This includes all the books involved in the lawsuit.) Based on the judgement itself, Anthropic appears to train only on the books legitimately acquired. Used books are quite cheap, after all, and can be bought in bulk.

    • asadotzler 2 hours ago

      Buying a book is not license to re-sell that content for your own profit. I can't buy a copy of your book, make a million Xeroxes of it and sell those. The license you get when you buy a book is for a single use, not a license to do what ever you want with the contents of that book.

      • suyjuris 36 minutes ago

        Yes, of course! In this case, the judge identified three separate instances of copying: (1) downloading books without authorisation to add to their internal library, (2) scanning legitimately purchased books to add to their internal library, and (3) taking data from their internal library for the purposes of training LLMs. The purchasing part is only relevant for (2) — there the judge ruled that this is fair use. This makes a lot of sense to me, since no additional copies were created (they destroyed the physical books after scanning), so this is just a single use, as you say. The judge also ruled that (3) is fair use, but for a different reason. (They declined to decide whether (1) is fair use at this point, deferring to a later trial.)

      • thedevilslawyer an hour ago

        What are you on about - the judge has literally said this was not resell, and is transformative and fair use.

  • darkoob12 5 hours ago

    This is not about paying for a single copy. It would still be wrong even if they have bought every single one of those books. It is a form of plagiarism. The model will use someone else's idea without proper attribution.

    • jeroenhd 3 hours ago

      Legally speaking, we don't know that yet. Early signs are pointing at judges allowing this kind of crap because it's almost impossible for most authors to point out what part of the generated slop was originally theirs.

  • blibble an hour ago

    > Pirate and pay the fine is probably hell of a lot cheaper than individually buying all these books.

    $500,000 per infringement...

  • maeln 6 hours ago

    If you wanted to be legit with 0 chance of going to court, you would contact publisher and ask to pay a license to get access to their catalog for training, and negotiate from that point.

    This is what every company using media are doing (think Spotify, Netflix, but also journal, ad agency, ...). I don't know why people in HN are giving a pass to AI company for this kind of behavior.

    • pyman 5 hours ago

      100%

      It's the new narrative in certain circles, especially in San Francisco: it's us vs China. We're doing all this to beat them, no matter the cost. While teachers are left scratching their heads with four kids to feed.

      • edgineer 3 hours ago

        The paradigm is that teachers will teach life skills like public speaking and entrepreneurship. Book smarts that can be more effectively taught by AI will be, once schools catch up.

    • ohashi 3 hours ago

      Because they are mostly software developers who think it's different because it impacts them.

  • tmaly 3 hours ago

    At minimum they should have to buy the book they are deriving weights from.

    • SirMaster an hour ago

      But should the purchase be like a personal license? Or like a commercia license that costs way more?

      Because for example if you buy a movie on disc, that's a personal license and you can watch it yourself at home. But you can't like play it at a large public venue that sell tickets to watch it. You need a different and more expensive license to make money off the usage of the content in a larger capacity like that.

  • kevingadd 6 hours ago

    Google did it the legal way with Google Books, didn't they?

    • pyman 6 hours ago

      No, Google did not sell the books through Google Books. Anthropic is selling the transformed version of the books on claude.ai.

      Pirating 7 million books, remixing their content, and using that to make money on Claude.ai is like counterfeiting 7 million branded products and selling them on your Shopify website. The original creators don't get payment, and someone's profiting off their work.

      • suyjuris 5 hours ago

        The judge appears to disagree with you on this. They found that training and selling an LLM are fair use, based on the fact that it is exceedingly transformative, and that the copyright holders are not entitled to any profits thereof due to copyright. (They also did get paid — Anthropic acquired millions of books legally, including all of the authors in this complaint. This would not retroactively absolve them of legal fault for past infringements, of course.)

        • pyman 5 hours ago

          The trial is scheduled for December 2025. That's when a jury will decide how much Anthropic owes for copying and storing over seven million pirated books

          • suyjuris 4 hours ago

            Yes, that would by an interesting trial. But it is only about six books, and all claims regarding Claude have been dismissed already. So only the internal copies remain, and there the theory for them being infringing is somewhat convoluted: you have to argue that they are not just for purposes of training (which was ruled fair use), and award damages even though these other purposes never materialised (since by now, they have legal copies of those books). I can see it, but I would not count on there being a trial.

        • flaptrap 3 hours ago

          The fallacy in the 'fair use' logic is that a person acquires a book and learns from it, but a machine incorporates the text. Copyright does not allow one to create a derivative work without permission. Only when the result of the transformation resembles the original work could it be said that it is subject to copyright. Do not regard either of those legal issues are set in concrete yet.

          • mensetmanusman 3 hours ago

            Both a human and a machine learn from it. You can design an LLM that doesn’t spit back the entire text after annealing. It just learns the essence like a human.

            • badmintonbaseba 3 hours ago

              Morally maybe, but AFAIK machines "learning" and creating creative works on their own is not recognized legally, at least certainly not the same way as for people.

              • Workaccount2 2 hours ago

                >AFAIK machines "learning" and creating creative works on their own is not recognized legally

                Did you read the article? The judge literally just legally recognized it.

  • bmitc 2 hours ago

    > I'm not saying this is justified, but what would you have done in their situation?

    Individuals would have their lives ruined either from massive fines or jail time.

pyman 8 hours ago

Anthropic's cofounder, Ben Mann, downloaded million copies of books from Library Genesis in 2021, fully aware that the material was pirated.

Stealing is stealing. Let's stop with the double standards.

  • originalvichy 7 hours ago

    At least most pirates just consume for personal use. Profiting from piracy is a whole other level beyond just pirating a book.

    • pyman 7 hours ago

      Someone on Twitter said: "Oh well, P2P mp3 downloads, although illegal, made contributions to the music industry"

      That's not what's happening here. People weren't downloading music illegally and reselling it on Claude.ai. And while P2P networks led to some great tech, there's no solid proof they actually improved the music industry.

      • Imustaskforhelp 6 hours ago

        I really feel as if Youtube is the best sort of convenience for music videos where most people watch ads whereas some people can use an ad blocker.

        I use an adblocker and tbh I think so many people on HN are okay with ad blocking and not piracy when basically both just block the end user from earning money.

        I kind of believe that if you really like a software, you really like something. Just ask them what their favourite charity is and donate their or join their patreon/a direct way to support them.

        • Workaccount2 2 hours ago

          If you are someone who can think clearly, it's extremely obvious that the conversation around copyright, LLMs, piracy, and ad-blocking is

          "What serves me personally the best for any given situation" for 95% of people.

    • mnky9800n 7 hours ago

      I feel like profit was always a central motive of pirates. At least from the historical documents known as, "The Pirates of the Caribbean".

    • KoolKat23 6 hours ago

      This isn't really profiting from piracy. They don't make money off the raw input data. It's no different to consuming for personal use.

      They make money off the model weights, which is fair use (as confirmed by recent case law).

      • j_w 6 hours ago

        This is absurd. Remove all of the content from the training data that was pirated and what is the quality of the end product now?

        • KoolKat23 6 hours ago

          That's the law.

          Please keep in mind, copyright is intended as a compromise between benefit to society and to the individual.

          A thought experiment, students pirating textbooks and applying that knowledge later on in their work?

          • j_w 5 hours ago

            When you say that's the law, as far as I'm aware a single ruling by a lower court has been issued which upholds that application. Hardly settled case law.

            • KoolKat23 5 hours ago

              True, until then best to act as if it is the case.

              In my opinion, it will be upheld.

              Looking at what is stored and the manner which it is stored. It makes sense that it's fair use.

        • pyman 6 hours ago

          With Claude, people are paying Anthropic to access answers that are generated from pirated books, without the authors permission, credit, or compensation.

          • KoolKat23 6 hours ago

            There is no copyright on knowledge.

            If it outputs parts of the book verbatim then that's a different story.

            • SirMaster an hour ago

              >If it outputs parts of the book verbatim then that's a different story.

              But it does...

            • pyman 5 hours ago

              Let's don't change the focus of the debate.

              Pirating 7 million books, remixing their content, and using that to power Claude.ai is like counterfeiting 7 million branded products and selling them on your personal website. The original creators don't get credit or payment, and someone’s profiting off their work.

              All this happens while authors, many of them teachers, are left scratching their heads with four kids to feed

              • KoolKat23 5 hours ago

                That may be the case, but you'd have to have laws changed.

    • mrcwinn 3 hours ago

      > At least most pirates just consume for personal use.

      Easy for the pirate to say. Artists might argue their intent was to trade compensation for one's personal enjoyment of the work.

      • Workaccount2 2 hours ago

        The gut punch of being a photographer selling your work on display, someone walks by and lines up their phone to take a perfect picture of your photograph, and then exclaims to you "Your work is beautiful! I can't wait to print this out and put it on my wall!"

      • jobs_throwaway 2 hours ago

        All the evidence shows that piracy is good for artists' business. You make a good work, people are exposed to it through piracy, and they end up buying more of your stuff than they would otherwise. But keep crying about the artist's plight

        • SketchySeaBeast 2 hours ago

          The way you've presented this, the evidence is just "common sense", which isn't much evidence at all.

  • dathinab 2 hours ago

    stealing with the intent to gain a unfair marked advantage so that you can effectively kill any ethically legally correctly acting company in a way which is very likely going to hurt many authors through the products you create is far worse then just stealing for personal use

    that isn't "just" stealing, it's organized crime

  • 1970-01-01 2 hours ago

    Let's get actual definitions of 'theft' before we leap into double standards.

  • NoMoreNicksLeft an hour ago

    >Stealing is stealing.

    Yes, but copying isn't stealing, because the person you "take" from still has their copy.

    If you're allowed to call copying stealing, then I should be allowed to call hysterical copyright rabblerousing rape. Quit being a rapist, pyman.

  • x3n0ph3n3 7 hours ago

    Copyright infringement is not stealing.

    • impossiblefork 3 hours ago

      It's very similar to theft of service.

      There's so many texts, and they're so sparse that if I could copyright a work and never publish it, the restriction would be irrelevant. The probability that you would accidentally come upon something close enough that copyright was relevant is almost infinitesimal.

      Because of this copyright is an incredibly weak restriction, and that it is as weak as it is shows clearly that any use of a copyrighted work is due to the convenience that it is available.

      That is, it's about making use of the work somebody else has done, not about that restricting you somehow.

      Therefore copyright is much more legitimate than ordinary property. Ordinary property, especially ownership of land, can actually limit other people. But since copyright is so sparse infringing on it is like going to world with near-infinite space and picking the precise place where somebody has planted a field and deciding to harvest from that particular field.

      Consequently I think copyright infringement might actually be worse than stealing.

      • jpalawaga an hour ago

        you've created a very obvious category mistake in your final summary by confusing intellectual property--which can be copied at no penalty to an owner (except nebulous 'alternate universe' theories)--with actual property, and a farmer and his land, with a crop that cannot be enjoyed twice.

        you're saying copying a book is worse than robbing a farmer of his food and/or livelihood, which cannot be replaced to duplicated. Meanwhile, someone who copies a book does not deprive the author of selling the book again (or a tasty proceedings from harvest).

        I can't say I agree, for obvious reasons.

        • impossiblefork 41 minutes ago

          With this special infinite-land-land though, what's special about the farmer's land is that he's expended energy to make it that way, just as the author has expended energy to find his text.

          Just as the farmer obtains his livelihood from the investment-of-energy-to-raise-crops-to-energy cycle the author has his livelihood by the investment-of-energy-to-finding-a-useful-work-to-energy cycle.

          So he is in fact robbed in a very similar way.

    • pyman 6 hours ago

      Pirating a book and selling it on claude.ai is stealing, both legally and morally.

      • BlackFly 5 hours ago

        Making a copy differs from taking an existing object in all aspects: literally, technically, legally and ethically. Piracy is making a copy you have no legal right to. Stealing is taking a physical object that you have no legal right to. While the "no legal right to" seems the same superficially, in practice the laws differ quite a bit because the literal, technical and ethical aspects differ.

      • thedevilslawyer 40 minutes ago

        Where can I download Harry Potter on claude.ai pls?

        • slater 40 minutes ago

          Why would you want to download a shitty book?

      • TiredOfLife 5 hours ago

        They are not selling it on claude.ai. If you can prove that they are you will be rich.

      • zb3 6 hours ago

        Who got robbed? Just because I'd pay for AI it doesn't mean I'd buy these books.

        • pyman 6 hours ago

          You should ask the teachers who spent years writing those books.

          • zb3 39 minutes ago

            I did not ask them to write those books, and I wouldn't buy those.

          • azangru 2 hours ago

            You keep saying the word "teachers"; but that word does not appear in the text of the article. Why focus on the teachers in particular?

            Also, there are various incentives for teachers to publish books. Money is just one of them (I wonder how much revenue books bring to the teachers). Prestige and academic recognition is another. There are probably others still. How realistic is the depiction of a deprived teacher whose livelihood depended on the books he published once every several years?

    • seydor 3 hours ago

      property infringement isn't either?

      • eviks an hour ago

        If you infringe by destroying property, then yes, it's not stealing

    • 1oooqooq 6 hours ago

      actually, the Only time it's a (ethical) crime is when a corporation does it at scale for profit.

  • damnesian 8 hours ago

    oh well, the product has a cute name and will make someone a billionaire, let's just give it the green light. who cares about copyright in the age of AI?

  • Der_Einzige 3 hours ago

    Information wants to be free.

    • troyvit an hour ago

      Then why does Claude cost money?

hellohihello135 an hour ago

It’s easy to point fingers at others. Meanwhile the top comment in this thread links to stolen content from Business Insider.

koolala 30 minutes ago

Anyone read the 2006 sci-fi book Rainbow's End that has this? It was set in 2025.

  • solfox 22 minutes ago

    I was 100% thinking this. GREAT book. And they, too, shredded books to ingest them into the digital library! I don't recall if it was an attempt to bypass copyright though; in Rainbow's End, it was more technical, as it was easier to shred, scan the pieces, and reassemble them in software, rather than scanning each page.

1970-01-01 an hour ago

The buried lede here is Antrhopic will need to attempt to explain to a judge that it is impossible to de-train 7M books from their models.

trinsic2 2 hours ago

I'm not seeing how this is fair use in either case.

Someone correct me if I am wrong but aren't these works being digitized and transformed in a way to make a profit off of the information that is included in these works?

It would be one thing for an individual to make person use of one or more books, but you got to have some special blindness not to see that a for-profit company's use of this information to improve a for-profit model is clearly going against what copyright stands for.

  • jimbob21 2 hours ago

    They clearly were being digitized, but I think its a more philosophical discussion that we're only banging our heads against for the first time to say whether or not it is fair use.

    Simply, if the models can think then it is no different than a person reading many books and building something new from their learnings. Digitization is just memory. If the models cannot think then it is meaningless digital regurgitation and plagiarism, not to mention breach of copyright.

    The quotes "consistent with copyright's purpose in enabling creativity and fostering scientific progress." and "Like any reader aspiring to be a writer" say, from what I can tell, that the judge has legally ruled the model can think as a human does, and therefore has the legal protections afforded to "creatives."

    • palmotea 2 hours ago

      > Simply, if the models can think then it is no different than a person reading many books and building something new from their learnings.

      No, that's fallacious. Using anthropomorphic words to describe a machine does not give it the same kinds of rights and affordances we give real people.

      • pavon 33 minutes ago

        The judge did use some language that analogized the training with human learning. I don't read it as basing the legal judgement on anthropomorphizing the LLM though, but rather discussing whether it would be legal for a human to do the same thing, then it is legal for a human to use a computer to do so.

          First, Authors argue that using works to train Claude’s underlying LLMs was like using
          works to train any person to read and write, so Authors should be able to exclude Anthropic
          from this use (Opp. 16). But Authors cannot rightly exclude anyone from using their works for
          training or learning as such. Everyone reads texts, too, then writes new texts. They may need
          to pay for getting their hands on a text in the first instance. But to make anyone pay
          specifically for the use of a book each time they read it, each time they recall it from memory,
          each time they later draw upon it when writing new things in new ways would be unthinkable.
          For centuries, we have read and re-read books. We have admired, memorized, and internalized
          their sweeping themes, their substantive points, and their stylistic solutions to recurring writing
          problems.
        
          ...
        
          In short, the purpose and character of using copyrighted works to train LLMs to generate
          new text was quintessentially transformative. Like any reader aspiring to be a writer,
          Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but
          to turn a hard corner and create something different. If this training process reasonably
          required making copies within the LLM or otherwise, those copies were engaged in a
          transformative use.
        
        [1] https://authorsguild.org/app/uploads/2025/06/gov.uscourts.ca...
      • jimbob21 2 hours ago

        Actually, it does, at least for this case. The judge just said so.

        • NoOn3 40 minutes ago

          People have rights, machines don't. Otherwise, maybe give machines the right to vote, for example?...

  • wrs 2 hours ago

    Copyright is not on “information”, It’s on the tangible expression (i.e., the actual words). “Transformative use” is a defense in copyright infringement.

  • NoMoreNicksLeft 41 minutes ago

    Digitizing the books is the equivalent of a blind person doing something to the book to make it readable to them... the software can't read analog pages.

    Learning from the book is, well, learning from the book. Yes, they intended to make money off of that learning... but then I guess a medical student reading medical textbooks intends to profit off of what they learn from them. Guess that's not fair use either (well, it's really just use, as in the intended use for all books since they were first invented).

    Once a person has to believe that copyright has any moral weight at all, I guess all rational though becomes impossible for them. Somehow, they're not capable of entertaining the idea that copyright policy was only ever supposed to be this pragmatic thing to incentivize creative works... and that whatever little value it has disappears entirely once the policy is twisted to consolidate control.

  • pavon an hour ago

    There is another case where companies slurped up all of the internet and profited off the information, that makes a good comparison - search engines.

    Judges consider a four factor when examining fair use[1]. For search engines,

    1) The use is transformative, as a tool to find content is very different purpose than the content itself.

    2) Nature of the original work runs the full gamut, so search engines don't get points for only consuming factual data, but it was all publicly viewable by anyone as opposed to books which require payment.

    3) The search engine store significant portions of the work in the index, but it only redistributes small portions.

    4) Search engines, as original devised, don't compete with the original, in fact they can improve potential market of the original by helping more people find them. This has changed over time though, and search engines are increasingly competing with the content they index, and intentionally trying to show the information that people want on the search page itself.

    So traditional search which was transformative, only republished small amounts of the originals, and didn't compete with the originals fell firmly on the side of fair use.

    Google News and Books on the other hand weren't so clear cut, as they were showing larger portions of the works and were competing with the originals. They had to make changes to those products as a result of lawsuits.

    So now lets look at LLMs:

    1) LLM are absolutely transformative. Generating new text at users request is a very different purpose and character from the original works.

    2) Again runs the full gamut (setting aside the clear copyright infringement downloading of illegally distributed books which is a separate issue)

    3) For training purposes, LLMs don't typically preserve entire works, so the model is in a better place legally than a search index, which has precedent that storing entire works privately can be fair use depending on the other factors. For inference, even though they are less likely to reproduce the originals in their outputs than search engines, there are failure cases where an LLM over-trained on a work, and a significant amount the original can be reproduced.

    4) LLMs have tons of uses some of which complement the original works and some of which compete directly with them. Because of this, it is likely that whether LLMs are fair use will depend on how they are being used - eg ignore the LLM altogether and consider solely the output and whether it would be infringing if a human created it.

    This case was solely about whether training on books is fair use, and did not consider any uses of the LLM. Because LLMs are a very transformative use, and because they don't store original verbatim, it weighs strongly as being fair use.

    I think the real problems that LLMs face will be in factors 3 and 4, which is very much context specific. The judge himself said that the plaintiffs are free to file additional lawsuits if they believe the LLM outputs duplicate the original works.

    [1] https://fairuse.stanford.edu/overview/fair-use/four-factors/

  • kristofferR 2 hours ago

    What do you think fair use is? The whole point of the fair use clauses is that if you transform copyrighted works enough you don't have to pay the original copyright holder.

shrubble an hour ago

Let’s say my AI company is training an AI on woodworking books and at the end, it will describe in text and wireframe drawings (but not the original or identical photos) how to do a particular task.

If I didn’t license all the books I trained on, am I not depriving the publisher of revenue, given people will pay me for the AI instead of buying the book?

  • mrkstu an hour ago

    If you paid a human author to do the same you’d be breaking no law. Learning is the point of books existing in the first place.

  • mathiaspoint an hour ago

    The same argument applies to someone who learned from the book and wrote an article explaining the idea to someone else.

tliltocatl 7 hours ago

If the AI movement will manage to undermine Imaginary Property, it would redeem it's externalities threefold.

  • 57473m3n7Fur7h3 6 hours ago

    I don’t think that’s gonna happen. I think they will manage to get themselves out of trouble for it, while the rest of us will still face serious problems if we are caught torrenting even one singular little book.

    • tliltocatl 6 hours ago

      Even so, would be hard to prove that this particular little book wasn't generated by Claude (oopsie, it happens to be a verbatim copy of a copyrighted work, that happens sometimes, those pesky LLMs).

      • pyman 5 hours ago

        You just need to audit their system. Shouldn't take more than a couple of hours.

    • 2OEH8eoCRo0 3 hours ago

      The Ocean Full of Bowling Balls

  • ttoinou 6 hours ago

    It would be great, but I think some are worried that new AI BigTech will find a way to continue enforcing IP on the rest of society while it won't exist for them

    • Imustaskforhelp 6 hours ago

      I think that we are worried because I think that's exactly what's going to happen/ is happening.

  • bayindirh 6 hours ago

    What are your feelings about how the small fish is stripped of their arts, and their years of work becomes just a prompt? Mainly comic artists and small musicians who are doing things they like and putting out for people, but not for much money?

    • tliltocatl 5 hours ago

      "But think about the children". The copyright system is doing too much damage to culture and society. Yes, it does provides a pond for some small fish, but the overall damage outweighs this. Like the fact that first estate provided sustainable for arts and crafts to flourish doesn't make the ancient régime any less screwed up.

      • bayindirh 4 hours ago

        I think I have worded my question wrong. I asked about not about how AI affects the financials of these smaller artists, but their wellbeing in general.

        There are many small artists who do this not for money, but for fun and have their renowned styles. Even their styles are ripped off by these generative AI companies and turned into a slot machine to earn money for themselves. These artists didn't consent to that, and this affects their (mental) well-beings.

        With that context in mind, what do you think about these people who are not in this for money is ripped out of their years of achievement and their hard work exploited for money by generative AI companies?

        It's not about IP (with whatever expansion you prefer) or laws, but ethics in general.

        Substitute comics for any medium. Code, music, painting, illustration, literature, short movies, etc.

        • tliltocatl 3 hours ago

          I see your point, "AI art" sucks in general and this is ethically sketchy as hell, but AIAK style copying has never been covered by copyright in the first place. Yea, it sucks to be alienated form your works. That's one of the externalites I mentioned in the original comment. But there is simply no remedy there. That's how the reality is.

          • bayindirh 3 hours ago

            Thanks for your answer, and taking your time for writing it!

            Yes, style copying is generally considered legal, but as another commenter posted in a related thread "scale matters".

            Maybe this will be reconsidered in the near future as the scale is in a much more different level with Generative AI. While there can be no technological solution to this (since it's a social problem to begin with), maybe public opinion about this issue will evolve over time.

            To be crystal clear: I'm not against the tech. I'm against abusing and exploiting people for solely monetary profit.

        • frozenseven 32 minutes ago

          (1) You can't copyright an art style. That's not a thing.

          (2) Once you make something publicly available, anyone can learn from it. No consent necessary.

          (3) Being upset does not grant you special privileges under the law.

          (4) If you don't like the idea of paying for AI art, free software is both plentiful and competitive with just about anything proprietary.

  • karel-3d 6 hours ago

    That would render GPL and friends redundant too... copyleft depends on copyright.

  • pxc 3 hours ago

    It's true that intellectual property is a flawed and harmful mechanism for supporting creative work, and it needs to change, but I don't think ensuring a positive outcome is as simple as this. Whether or not such a power struggle between corporate interests benefits the public rather than just some companies will be largely accidental.

    I do support intellectual property reform that would be considered radical by some, as I imagine you do. But my highest hopes for this situation are more modest: if AI companies are told that their data must be in the public domain to train against, we will finally have a powerful faction among capitalists with a strong incentive to push back against the copyright monopolists when it comes to the continuous renewal of copyright terms.

    If the "path of least resistance" for companies like Google, Microsoft, and Meta becomes enlarging the public domain, we might finally begin to address the stagnation of the public domain, and that could be a good thing.

    But I think even such a modest hope as that one is unlikely to be realized. :-\

  • Der_Einzige 3 hours ago

    Yup.

    My response to this whole thread is just “good”

    Aaron Swartz is a saint and a martyr.

  • LtWorf 3 hours ago

    It will undermine it only for the rich owner of AI companies, not for everyone.

russell_h 2 hours ago

The title is clearly meant to generate outrage, but what is wrong with cutting up a book that you own?

  • timewizard 2 hours ago

    It's destroying something with value for no sane reason. Wasteful and sociopathic.

    EDIT: Ah, the old, Hacker News pretend good faith question. Why bother answering you people? You're not interested in anything other than your existing point of view. The least Hacker thing about this place.

    • jobs_throwaway 2 hours ago

      poverty mindset. We can make more books, and now these copies contribute to a corpus of knowledge that far more people benefit from

      • justinrubek 2 hours ago

        Wasteful mindset. They don't need the books, they need the data. They should never have been printed if they were going to he destroyed.

      • timewizard 2 hours ago

        People who pay Anthropic you mean. There is no benefit. And only the owner can make more books.

        Fake altruistic mindset. Super sociopathic.

Kim_Bruning 6 hours ago

actual title:

"Anthropic cut up millions of used books to train Claude — and downloaded over 7 million pirated ones too, a judge said."

A not-so-subtle difference.

That said, in a sane world, they shouldn't have needed to cut up all those used books yet again when there's obviously already an existing file that does all the work.

dathinab 3 hours ago

as far as I understand while training on books is clearly not fair use (as the result will likely hurt the lively hood of authors, especially not "best of the best" authors).

as long as you buy the book it still should be legal, that is if you actually buy the book and not a "read only" eBook

but the 7_000_000 pirated books are a huge issue, and one from which we have a lot of reason to believe isn't just specific to Anthropic

  • asadotzler 2 hours ago

    Buying a copy of a book does not give you license to take the exact content of that book, repackage it as a web service, and sell it to millions of others. That's called theft.

carlosjobim 2 hours ago

If ingesting books into an AI makes Anthropic criminals, then Google et al are also criminals alike for making search indexes of the Internet. Anything published online is equally copyrighted.

  • kristofferR 2 hours ago

    Yeah, we can all agree that ingesting books is fair use and transformative, but you gotta own what you ingest, you can't just pirate it.

    I can read 100 books and write a book based on the inspiration I got from the 100 books without any issue. However, if I pirate the 100 books I've still committed copyright infringement despite my new book being fully legal/fair use.

    • carlosjobim an hour ago

      I disagree that it has anything to do with copyright. It is at most theft. If I steal a bunch of books from the library, I haven't committed any breach of copyright.

ruffrey 2 hours ago

Two of the top AI companies flouted ethics with regard to training data. In OpenAI's case, the whistleblower probably got whacked for exposing it.

Can anyone make a compelling argument that any of these AI companies have the public's best interest in mind (alignment/superalignment)?

sidewndr46 4 hours ago

So using the standard industry metrics for calculating the financial impact of piracy, this would equate to something like trillions of damages to the book publishing industry?

nickpsecurity 2 hours ago

Buying, scanning, and discarding was in my proposal to train under copyright restrictions.

You are often allowed to nake a digital copy of a physical work you bought. There are tons of used, physical works thay would be good for training LLM's. They'd also be good for training OCR which could do many things, including improve book scanning for training.

This could be reduced to a single act of book destruction per copyrighted work or made unnecessary if copyright law allowed us to share others' works digitally with their licensed customers. Ex: people who own a physical copy or a license to one. Obviously, the implementation could get complex but we wouldn't have to destroy books very often.

  • asadotzler 2 hours ago

    You are allowed to make a digital copy FOR YOUR OWN USE. You are not allowed to make a billion digital copies and sell those, that's called theft.

adolph 44 minutes ago

  Alsup detailed Anthropic's training process with books: The OpenAI rival 
  spent "many millions of dollars" buying used print books, which the 
  company or its vendors then stripped of their bindings, cut the pages, 
  and scanned into digital files.
I've noticed an increase in used book prices in the recent past and now wonder if there is an LLM effect in the market.
motbus3 6 hours ago

It is shocking how courts have being ruling towards the benefits of ai companies despite the obvious problem of allowing automatic plagiarism

  • kristofferR 2 hours ago

    Not really, plagiarism is not a legal concept.

k__ 2 hours ago

So, how should we as a society handle this?

Ensure the models are open source, so everyone can use them, as everyones data is in there?

Close those companies and force them to delete the models, as they used copyright material?

greenavocado 5 hours ago

Should have listened to those NordVPN ads on YouTube

dandanua 6 hours ago

Same did Meta and probably other big companies. People who praise AGI are very short sighted. It will ruin the world with our current morals and ethics. It's like a nuclear weapon in the hands of barbarians (shit, we have that too, actually).

Lionga 6 hours ago

Based on the fact people went to jail for downloading some music or movies, this guy will face a lifetime in prison for 7 million books that he then used for commercial profit right?

Right guys we don't have rules for thee but not for me in the land of the free?

2OEH8eoCRo0 3 hours ago

I've begun to wonder if this is why some large torrent sites haven't been taken down. They are essentially able to crowdsource all the work. There are some users who spend ungodly amounts of time and money on these sites that I suspect are rich industry benefactors.

1oooqooq 6 hours ago

Aaron Swartz rolling

  • pyman 6 hours ago

    He downloaded millions of academic articles and the government charged him with multiple felonies.

    The difference is, Aaron Swartz wasn't planning to build massive datacenters with expensive Nvidia servers all over the world.

    • mikewarot 6 hours ago

      >the government charged him with multiple felonies.

      This was the result of a cruel and zealous overreach by the prosecutor to try to advance her political career. It should never have gone that far.

      The failure of MIT to rally in support of Aaron will never be forgiven.

      • pyman 6 hours ago

        I agree

    • omnimus 6 hours ago

      It's even worse considering all he downloaded was in public domain so it was much less problematic considering copyright.

      Lesson is simple. If you want to break a law make sure it is very profitable because then you can find investors and get away with it. If you play robin hood you will be met with a hammer.

outside1234 2 hours ago

So if you incorporate you can do whatever you want without criminal charges?

neo__ 8 hours ago

Hopefully they were all good books at least.

  • pyman 7 hours ago

    they pirated the best ones, according to the authors

NHQ 2 hours ago

The farce of treating a corporation as an individual precludes common sense legal procedure to investigate people who are responsible for criminal action taken by the company. Its obviously premeditated and in all ways an illicit act knowingly perpetrated by persons. The only discourse should be about upending this penthouse legalism.

  • NHQ 2 hours ago

    The irony is that actually litigating copyright law would lead to the repeal of said copyright law. And so in all cases of backwaters laws that are used to "protect interests" of "corporations" yet criminalize petty individual cases.

    This of course cannot be allowed to happen, so the the legal system is just a limbo, a bar which regular individuals must strain to pass under but that corporations regularly overstep.

booleandilemma 6 hours ago

So if I'm working on an LLM can I just steal millions of copyrighted books? Is that how our farcical justice system works?

  • famahar 2 hours ago

    Make sure you have a few billion dollars ready so you can pay a few million on the lawsuits. A volcano getting a cup of water poured into it.

aaron695 7 hours ago

Good, this is what Aaron Swartz was fighting for.

Against companies like Elsevier locking up the worlds knowledge.

Authors are no different to scientists, many had government funding at one point, and it's the publishing companies that got most of the sales.

You can disagree and think Aaron Swartz was evil, but you can't have both.

You can take what Anthropic have show you is possible and do this yourself now.

isohunt: freedom of information