Module 8: AI Governance Vocabulary

Risks, security and harms

The attack surface and the falsehood family. Intent is the dividing line: disinformation is deliberate, misinformation is not - and data poisoning is an attacker corrupting training data, unlike natural data drift.

This cluster is the attack surface and the falsehood family. In several places intent is the discriminator that separates the right answer from the trap.

Adversarial attack → deliberately crafted inputs designed to fool a model into errors or unsafe behaviour.
Data poisoning → corrupting the training data so the model learns wrong or malicious behaviour (an attacker is involved - unlike data drift).
Data leak → out-of-scope data leaking in and inflating performance, or a model exposing sensitive training data in its outputs.
Deepfakes → AI-generated or manipulated audio-visual content realistically depicting people doing or saying things they never did.
Bias → systematic error producing unfair outcomes, in computational, cognitive and societal flavours.
Red teaming → simulating adversarial attacks to expose vulnerabilities, flaws, bias and misinformation, with findings going to developers before release.

Misinformation vs disinformation

Same falsehood, different heart: Disinformation is spread deliberately, with intent to deceive; Misinformation is spread without intent to deceive - wrong, but not malicious. Intent is the discriminator.

Drift vs poisoning, in practice

Accuracy slips because customer behaviour changed → data drift (no attacker). An attacker corrupting training records → Data poisoning. The presence of an adversary is the tell.

Key terms - quick answers

What is “Adversarial attack”?

Deliberately crafted inputs designed to fool a model into errors or unsafe behaviour.

What is “Data poisoning”?

Corrupting the training data so the model learns wrong or malicious behaviour.

What is “Data leak”?

Out-of-scope data leaking in to inflate performance, or a model exposing sensitive training data.

What is “Deepfakes”?

AI-generated/manipulated audio-visual content realistically depicting fabricated acts or speech.

← Model mechanics and performance Governance, assurance and oversight →