AIGP Study Guide
Module 6: Governing AI Development · BoK III.B

Governing the AI Data Life Cycle

Data governance spans ingestion to decommissioning with cross-functional stewardship. The data life cycle runs Collection → Use → Disclosure → Retention → Destruction, the last making personal data unrecoverable.

Data governance does not stop at collection → it spans ingestion to decommissioning, with cross-functional stewardship shared by privacy, risk, ML, legal and governance teams.

The data life cycle

Collection (gathering data about an individual) → UseDisclosure (sharing or providing access) → Retention (saving until destruction) → Destruction (making personal data unrecoverable).

Five oversight areas:

  • Training data governance → validate lawful basis, data minimisation, accuracy and diversity during training → assess bias risks → maintain reproducibility logs
  • Evaluation & testing → govern validation and test sets → fairness metrics, drift testing, edge case analysis → embed explainability obligations early
  • Deployment → policies for real-time data inputs, human-in-the-loop models and retraining triggers → enforce access controls and logging
  • Monitoring & drift detection → continuously audit inputs and outputs for data drift, concept drift and changes in quality or representativeness → flag anomalies for governance review
  • Decommissioningsecure deletion or archiving of datasets, training artifacts and output logs per regulatory retention policies → document rationales and impacts

Key terms - quick answers

What is “Data life cycle”?
Collection, Use, Disclosure, Retention, Destruction - the stages personal data passes through.
What is “Destruction”?
Making personal data unrecoverable at the end of the data life cycle.