• 3 Posts
  • 183 Comments
Joined 2 years ago
cake
Cake day: August 27th, 2023

help-circle






  • AI has always been able to train on copyrighted data because it’s considered transformative.

    If this changes, seeing the huge amount of data needed for competitive generative AI, then open source AI cannot afford the data and dies. Strengthening copyrights would force everyone out of the game except Meta, Google and Microsoft.

    The system that open source AI grew out of is exactly what is being attacked.



  • Because if AI has to pay, you kill the open-source scene and give a fat monopoly to the handful of companies that can afford the data. Not to mention that data is owned by a few publishing house and none of the writers are getting a dime.

    Yes it’s silly that students pay so much, but we should be arguing for less copyrights so we can have both proper prices in education and a vibrant open source scene.

    Most people argue for a strengthening of copyrights which only helps data brokers and big AI players. If you want subscription services and censorship while still keeping all the drawbacks of AI, this is how you do it.







  • The outputs are still bound to copyright laws. Tracing pixel per pixel over an artwork doesn’t make it immune to copyright laws, maliciously over training gen ai to act like a database and outright copy shouldn’t either.

    If you have a carbon copy of someone’s github, it doesn’t matter if you generated it, it’s still a copy. Although code is a difficult example since I’m not entirely where the line is for one repo to be different then the other when they are accomplishing the same task.

    I always imagined businesses just grabbed the gpl software and would tell their employees to rewrite it but different. Most things I dive down into seem to stem from one algorithm or two from a paper and the rest is fluff.



  • In such a scenario, it will be worth it. Llm aren’t databases that just hold copy pasted information. If we get to a point where it can spit out whole functional githubs replicating complex software, it will be able to do so with most software regardless of being trained on similar data or not.

    All software will be a prompt away including the closed sourced ones. I don’t think you can get more open source then that. But that’s only if strident laws aren’t put in place to ban open source ai models, since Google will put that one prompt behind a paychecks worth of money if they can.