AIGP Study Guide
Module 6: Governing AI Development · BoK III.B

Data Questions, Quality, Jurisdiction and Lineage

Without the right, enough and accurate data the system won't perform - garbage in, garbage out. Anticipate jurisdiction (data localisation laws, KYC), and keep data lineage and data provenance charted and documented.

If you don't have the right data, enough data or accurate data, the system won't perform. Four questions, then the quality and traceability layer.

  • WHAT data is required?
  • HOW MUCH data is needed?
  • HOW is data collected?
  • WHERE is data stored?

Quality checks. "Garbage in, garbage out" → is the data accurate? Is it representative of the data used in practice? Is it free from bias? Statistical sampling helps identify gaps.

Jurisdictional requirements. Anticipate → privacy requirements · data localisation laws · regulatory disclosures like KYC ("Know Your Customer"), the process by which financial institutions verify customers and check that funding sources are legitimate. Investigate compliance obligations now, build them into development.

Lineage vs provenance
Data lineageData provenance
Tracks the flow of data over time → origin, how it changed, destination, across the life cycleTracks and logs the history and origin → creation and collection through transformation, incl. sources, processes, actors, methods
Used for historical context and tracing issues to a root causeUsed to ensure integrity and quality and to identify applicable laws tied to the data's origins
Document both

Both Data lineage and Data provenance must be charted and documented → use datasheets or model inventory templates to record them.

From the experts

"You really need to look at the quality of the data that is going into your AI design and your overall system and model." - Julie McEwen, AIGP, CIPM, CIPT, FIP

Key terms - quick answers

What is “KYC”?
Know Your Customer - process by which financial institutions verify customers and check funding sources are legitimate.
What is “Data localisation laws”?
Jurisdictional requirements governing where data may be physically stored.
What is “Data lineage”?
Tracking the flow of data over time - origin, how it changed, destination - used to trace root causes.
What is “Data provenance”?
Tracking and logging the history and origin of data (sources, processes, actors, methods) to ensure integrity and identify applicable laws.