Dutch foundation takes down illegally used AI training dataset

Citing copyright infringement, the Dutch-based organization BREIN has succeeded in taking down a large language dataset that was being used in training for AI. 

In a statement released on Tuesday, BREIN explained that the dataset comprised 10,000 books, news articles, and Dutch language subtitles for movies and TV series that were obtained without permission. 

EU’s AI Act aims to regulate training data sources

According to director Bastiaan van Ramshorst, it was not immediately clear how much the dataset could have been used by AI firms. “It’s very difficult to know, but we are trying to be on time” to avoid future lawsuits, he said.

The European Union’s recently proposed AI Act will also require AI companies to provide access to their dataset and source of data used to train AI models. Other related legal battles are still being fought in the United States. For example, Microsoft-backed OpenAI regularly gets involved in various legal issues, like the recent one with the New York Times.

Microsoft has been said to have allegedly copied the plaintiff’s registered journalism works in addition to other copyrighted journalism works. On the issue of potential infringement, the company’s CEO has been quoted as saying that the company has this data. 

The allegations suggest that Microsoft used these copyrighted materials in AI products, including ChatGPT and Copilot, without obtaining the licenses. The complaint specifically accuses Microsoft of removing significant information from these works. Such as the author’s name, title of work, ‘copyright’ watermark, and other restrictions. 

In Denmark, anti-piracy measures have also produced substantial results in the fight against copyright infringement. Last year, a copyright protection group based in Denmark, the Danish Rights Alliance, demanded and got the “Books3” dataset pulled down from the Internet.

Dataset provider complies with court order, removes content

The person who provided the Dutch dataset adhered to the court order made by BREIN. This agreement resulted in the dataset being taken down from the website that previously provided the dataset for download. BREIN refused to disclose the identity of a person involved in this case because of the Dutch privacy laws.

The removal of this dataset shows that copyright enforcement groups continue to fight for the protection of intellectual property rights in the digital world.  To address the issue of mass scraping of copyrighted materials, BREIN recommends rights holders use reservations as provided under the Copyright Act (Article 15o.1).


Earn more PRC tokens by sharing this post. Copy and paste the URL below and share to friends, when they click and visit Parrot Coin website you earn: https://parrotcoin.net0


PRC Comment Policy

Your comments MUST BE constructive with vivid and clear suggestion relating to the post.

Your comments MUST NOT be less than 5 words.

Do NOT in any way copy/duplicate or transmit another members comment and paste to earn. Members who indulge themselves copying and duplicating comments, their earnings would be wiped out totally as a warning and Account deactivated if the user continue the act.

Parrot Coin does not pay for exclamatory comments Such as hahaha, nice one, wow, congrats, lmao, lol, etc are strictly forbidden and disallowed. Kindly adhere to this rule.

Constructive REPLY to comments is allowed

Leave a Reply