What’s the worst that can happen?

What'S The Worst That Can Happen?



When a generative artificial intelligence (AI) system produces something similar to the data it was trained on, is it copyright infringement or a bug in the system? That's the question at the heart of the New York Times' latest lawsuit against ChatGPT maker OpenAI.

The New York Times Open AI said it used additional content from the NYT website to train its models from any proprietary source — only datasets containing Wikipedia and United States patent documents.

OpenAI says that training on copyrighted data is “fair use” and the New York Times' lawsuit is “without merit.”

The stakes

The case can be settled in court; It could end in injury, dismissal or many other consequences. But beyond financial relief or injunctions (which may be considered temporary, pending appeal, or unsuccessful appeal), the damages can have an impact on American society as a whole, and even have an international impact.

First, if courts find in favor of OpenAI that training AI systems on copyrighted material is fair use, it could have a significant impact on the US legal system.

As Mike Cook, senior lecturer at King's College, wrote in a recent interview:

“If you use AI to answer emails or summarize work for you, you might consider approving ChatGPT as a last resort. However, we may be concerned that the only way to achieve that is to exempt certain corporate entities from rules that apply to everyone else.”

The New York Times argues that such exemptions represent a clear threat to the business model.

OpenAI acknowledges that ChatGPT sometimes has a “bug” where it produces passages of text similar to copyrighted works. According to The NYT, this could affect the company's ability to bypass paywalls, lose advertising revenue, and carry out its core business.

If Open is allowed to continue training on copyrighted material without restrictions, the long-term implications for The New York Times and other journalism outlets whose work is used to train AI systems could be dire, the lawsuit says.

The same can be said for other areas where copyrighted material makes a profit, including film, television, music, literature, and other print media.

On the other hand, in documents submitted to the UK House of Lords Communications and Digital Committee, he said, “It is not possible to train today's leading AI models without using copyrighted material.”

The AI ​​firm added:

“Limiting training data to public domain books and pictures created a century ago may make for interesting experiments, but it will not provide AI systems that meet the needs of today's citizens.”

Black box

Complicating matters further is the fact that consensus can be difficult to achieve. OpenAI has taken steps to stop ChatGPT and other products from extracting copyrighted material, but there are no technological guarantees that it will not continue to do so.

AI models like ChatGPT are referred to as “black box” systems. This is because the developers who create them have no way of knowing exactly why the system produces the results it does.

With this black box and the way large language models like ChatGPT are trained, there is no way to exclude data from The New York Times or another copyright holder after a model is trained.

Related: OpenAI faces new copyright lawsuit a week after NYT lawsuit

Based on current technology and methods, there is a strong possibility that OpenAI will cancel ChatGPT and start from scratch if it is banned from using the copyrighted material entirely. Ultimately, this may be too expensive and ineffective to be worthwhile.

OpenAI hopes to tackle the problem by partnering with news and media organizations, alongside a pledge to continue its work to eliminate the “mistake” of resurgence.

The worst case

The worst case scenario in the field of artificial intelligence is the loss of the ability to monetize models trained on copyrighted material. Although this would not necessarily affect efforts related to self-driving cars or AI systems, it could make it illegal to bring to market derivative products such as ChatGPT that are used to run supercomputer simulations.

And, as far as copyright holders are concerned, the worst case scenario is a court declaration that copyrighted content can be freely used to train AI systems.

This could, in theory, give AI companies free reign to redistribute slightly modified copyrighted material, holding end users legally liable in any case where modifications do not meet legal requirements to avoid copyright infringement.

Leave a Reply

Pin It on Pinterest