Anonymisation, Pseudonymisation and PETs

Recital 26 territory: anonymisation removes data from the GDPR entirely, while pseudonymisation is still personal information so GDPR obligations apply. The scraping problem and three privacy-enhancing technologies - differential privacy, homomorphic encryption and secure multi-party computation - shape AI data strategy.

Recital 26 territory. One takes data out of the GDPR entirely, the other does not. Know which is which cold.

Anonymisation vs pseudonymisation
	Anonymisation	Pseudonymisation
GDPR status	GDPR does not apply → no longer personal information	Still personal information → GDPR obligations apply
Notes	Threshold varies by jurisdiction and is high under the GDPR → AI benefit is processing vast datasets	Helpful for protection → but deidentification drops data utility for AI

The scraping problem

Training datasets are gathered by scraping digital content (social media, articles, blogs) → much of it personal information, often collected without end-user knowledge, engagement or consent, at petabyte scale. User prompts and interests feed back into improving systems → a live conflict. Because current legislation was built without AI in mind, there are open questions whether AI can truly rely on pseudonymous or anonymised data. At scale → privacy and security controls must be dynamic enough to change with the AI system, and the ideal outcome is making systems succeed without using personal information at all.

Privacy-enhancing technologies:

Differential privacy → needed because of the utility drop, so inquiries must be limited.
Homomorphic encryption → not at scale yet.
Secure multi-party computation → targeted pockets of use; fine for simple arithmetic, compute-intensive for multiplication and division.

Always weigh the trade-offs between privacy benefits and operational costs → performance, scalability, complexity.

Key terms - quick answers

What is “Anonymisation”?

Removing personal data so the GDPR no longer applies; threshold is high under the GDPR.

What is “Pseudonymisation”?

Replacing identifiers so data is still personal information and GDPR obligations still apply; drops AI data utility.

What is “Scraping problem”?

Training data gathered at petabyte scale from digital content, often personal and without end-user consent.

What is “Differential privacy”?

A PET that limits inquiries due to a utility drop when protecting individuals in datasets.