AIGP Study Guide
Module 5: Existing Laws & AI · BoK III.A

Sensitive and Special Categories of Data

Special categories of data need extra protection under the GDPR and Brazil's LGPD - eight types captured by the mnemonic "Really Private Records Take Guarding, Because Health's Sensitive." The bias-testing dilemma pits minimisation against the need for sensitive data, with four ways to square the circle including Bayesian Improved Surname Geocoding.

Personal information needing extra protection under the GDPR and Brazil's LGPD. Know the list, the lawful gateways, and the bias-testing dilemma.

Mnemonic · Really Private Records Take Guarding, Because Health's Sensitive

Racial/ethnic origin · Political views · Religious/philosophical beliefs · Trade union membership · Genetic data · Biometric data for identification · Health data · Sex life or sexual orientation.

When processing is allowed (GDPR):

  • Explicit consent → active communication of agreement.
  • Manifestly made public by the individual.
  • Legal obligations → employment, social security, social protection law.
  • Vital interests → consent impossible, e.g. emergencies.
  • Legal claims → establish, exercise, defend.
  • Substantial public interest → e.g. public health, balanced against rights.
  • Not-for-profits → for members or regular contacts.
The bias-testing dilemma

Minimisation pushes organisations to avoid sensitive data → but without it, adequate bias testing of AI systems becomes harder. As audit requirements grow (best practice and legal), companies face increasing pressure to collect sensitive data to evaluate their AI.

Four ways to square the circle:

  1. Collect directly → design in collection, handling and protection from the start; keep sensitive data out of the training model but available for testing and oversight.
  2. Generate intentional proxies → derive demographic insight from less sensitive data; the most prominent method is Bayesian Improved Surname Geocoding.
  3. Buy data → brokers, public datasets, existing access; raises parallel concerns on source, sharing and purpose limitation alignment.
  4. Ask customers → consent is often valid; partial information from a select set of users may suffice for representative testing, and explain why it's needed.
Exam flash

Encryption and access controls for sensitive data buy you → legal compliance · fewer incidents and breaches · resource savings (fines, claims) · trust · audit and regulator readiness. And the proxy-method name to memorise → Bayesian Improved Surname Geocoding.

Key terms - quick answers

What is “Special categories of data”?
Sensitive personal data needing extra protection under the GDPR and Brazil's LGPD - eight types.
What is “LGPD”?
Brazil's General Data Protection Law, which also protects sensitive/special categories of data.
What is “Bias-testing dilemma”?
Minimisation discourages holding sensitive data, but bias testing needs it - creating pressure to collect it for evaluation.
What is “Bayesian Improved Surname Geocoding”?
The prominent proxy method to infer demographics from less sensitive data (surname plus geography).