AI Privacy

When privacy becomes training data

Researchers found millions of passports, credit cards, résumés, and faces in DataComp CommonPool, a massive AI training dataset scraped from the web. Auditing just 0.1% revealed hundreds of millions of likely PII (personally identifiable information) items, including sensitive job and health details. Despite face-blurring tools, researchers estimate 102 million faces were missed, and metadata/captions still […]