An advocacy group revealed that image generators have used billions of images of Brazilian kids to train their AI models without their consent. Human Rights Watch (HRW) carried out research that shows popular image generators like Stable Diffusion used images of kids “spanning their entire childhood” to train their models.
Also read: Tech companies urged to combat surge in AI-generated child s****l abuse material
The HRW study reveals these images were taken from about 10 Brazilian states. It reported that these pictures pose a huge “privacy risk to kids” because the act also increases the production of non-consensual images bearing their likeness.
Billions of Brazilian kids’ images used to train AI models
HRW researcher Hye Jung Han exposed the problem after analyzing a fraction (less than 0.0001%) of LAION-5B, a dataset built from Common Crawl snapshots of the public web. She revealed that the dataset did not have the actual photos, but contained “image text pairs” taken from nearly 6 billion pictures and captions posted since 2008.
Kids’ pictures from across 10 Brazilian states were found, most of them comprising family pictures uploaded on parenting and personal blogs. According to the report, these are pictures that internet users do not easily stumble upon.
Also read: UK to declare s******y explicit deepfakes creation a criminal act
HRW removed links to the images in collaboration with LAION, the German nonprofit that created the dataset. Concerns still remain that the dataset may still be referencing children’s images from around the world since removing links alone does not entirely solve the problem.
“This is a larger and very concerning issue and as a volunteer organization, we will do our part to help,” LAION spokesperson Nate Tyler told Ars.
Children’s identities are easily traceable
The HRW’s report further revealed that the identities of many Brazilian kids could be traceable as their names and locations were used in the captions that built the dataset. It also raised concerns the kids may be at risk of being targeted by bullies while their images may be used for explicit content.
“The photos reviewed span the entirety of childhood,” reads part of the report.
“They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home…”
HRW.
Han however revealed that “all publicly available versions of LAION-5B were taken down,” and therefore less risk of the Brazilian kids’ photos being used now.
According to HRW, the dataset will not be available again until LAION is certain all flagged content is removed. The decision was made after a Stanford University report also “found links in dataset pointing to illegal content on the public web,” including over 3,000 suspected instances of child s****l abuse content.
At least 85 girls in Brazil have also reported their classmates harassing them by using AI to generate s******y explicit deepfake content “based on photos taken from their social media content.”
Protecting children’s privacy
According to Ars, LAION-5B was introduced in 2022, reportedly to replicate OpenAI’s dataset, and was touted as the biggest “freely available image-text dataset.”
When HRW contacted LAION over the images, the organization responded by saying AI models trained on LAION-5B “could not produce kids’ data verbatim,” although they acknowledged the privacy and security risks.
The organization then started removing some images but also opined that parents and guardians were responsible for removing children’s personal photos from the internet. Han disagreed with their argument, saying:
“Children and their parents shouldn’t be made to shoulder responsibility for protecting kids against a technology that’s fundamentally impossible to protect against. It’s not their fault.”
Han.
HRW called for Brazilian lawmakers’ urgent intervention to protect the rights of children from emerging technologies. New laws must be in place to prohibit the scrapping of children’s data into AI models, as per HRW recommendations.
Cryptopolitan reporting by Enacy Mapakame
Earn more PRC tokens by sharing this post. Copy and paste the URL below and share to friends, when they click and visit Parrot Coin website you earn: https://parrotcoin.net0
PRC Comment Policy
Your comments MUST BE constructive with vivid and clear suggestion relating to the post.
Your comments MUST NOT be less than 5 words.
Do NOT in any way copy/duplicate or transmit another members comment and paste to earn. Members who indulge themselves copying and duplicating comments, their earnings would be wiped out totally as a warning and Account deactivated if the user continue the act.
Parrot Coin does not pay for exclamatory comments Such as hahaha, nice one, wow, congrats, lmao, lol, etc are strictly forbidden and disallowed. Kindly adhere to this rule.
Constructive REPLY to comments is allowed