Anonymisation, Pseudonymisation and PETs
Recital 26 territory: anonymisation removes data from the GDPR entirely, while pseudonymisation is still personal information so GDPR obligations apply. The scraping problem and three privacy-enhancing technologies - differential privacy, homomorphic encryption and secure multi-party computation - shape AI data strategy.
Recital 26 territory. One takes data out of the GDPR entirely, the other does not. Know which is which cold.
| Anonymisation | Pseudonymisation | |
|---|---|---|
| GDPR status | GDPR does not apply → no longer personal information | Still personal information → GDPR obligations apply |
| Notes | Threshold varies by jurisdiction and is high under the GDPR → AI benefit is processing vast datasets | Helpful for protection → but deidentification drops data utility for AI |
Training datasets are gathered by scraping digital content (social media, articles, blogs) → much of it personal information, often collected without end-user knowledge, engagement or consent, at petabyte scale. User prompts and interests feed back into improving systems → a live conflict. Because current legislation was built without AI in mind, there are open questions whether AI can truly rely on pseudonymous or anonymised data. At scale → privacy and security controls must be dynamic enough to change with the AI system, and the ideal outcome is making systems succeed without using personal information at all.
Privacy-enhancing technologies:
- Differential privacy → needed because of the utility drop, so inquiries must be limited.
- Homomorphic encryption → not at scale yet.
- Secure multi-party computation → targeted pockets of use; fine for simple arithmetic, compute-intensive for multiplication and division.
Always weigh the trade-offs between privacy benefits and operational costs → performance, scalability, complexity.