Architectures and the buzzwords that matter

Governance pros must hold a credible conversation about architectures: transformer models (process inputs in parallel), multimodal models/LMMs (WHO 2024 ethics guidance), generative and specialised networks (CNN/RNN/GNN), and RAG for pulling in external information.

You will not build these, but you must hold a credible conversation about them to assess risk.

Transformer models (foundation models):

Deep learning that learns context and meaning by tracking relationships in sequential data (words in a sentence)
Find patterns mathematically → no need for large labelled datasets
Process inputs in parallel → efficient training and inference
Enable modern NLP and multimodal models
Bonus uses → protein sequencing for medications, DNA sequencing

Multimodal models (LMMs):

Inputs and outputs across image, video, audio and text (unimodal = one modality)
NLP is a key component
Use cases → weather forecasting, medical diagnoses, code generation
WHO released AI ethics guidance for LMMs in 2024 → concerns about inaccurate or biased output affecting health decisions, poor training data, patient privacy
Tools → Gemini, ChatGPT, ImageBind (Meta), Inworld AI

Other architectures: generative architectures create new text, images, audio or code from learned patterns (GPT, LLaMA, DALL-E 2). Specialised networks → know the acronyms: CNN (convolutional, images), RNN (recurrent, sequences), GNN (graph). RAG → retrieval-augmented generation lets a GenAI system pull in external information when answering, boosting LLM accuracy and relevance.

Key terms - quick answers

What is “Transformer models”?

DL that learns context by tracking relationships in sequential data and processing inputs in parallel.

What is “Multimodal models (LMMs)”?

Models handling inputs/outputs across image, video, audio and text, with NLP as a key component.

What is “CNN”?

Convolutional neural network specialised for images.

What is “RNN”?

Recurrent neural network specialised for sequences.