Training — Promptpedia

01

The analogy

Think of training a doctor: first, years reading the whole library (pre-training), then supervised practice with cases corrected by tutors (instruction tuning), and finally feedback from real patients that polishes their manner (refinement from human preferences).

02

In detail

Typical training has phases: self-supervised pre-training on trillions of tokens (predicting the next word), supervised fine-tuning on quality examples (SFT) and reinforcement learning from human feedback (RLHF) to align behavior. Training a large model costs millions; using it (inference) costs cents.

03

An example

An example Promptpedia

The same base model that completes “Paris is the capital of…” learns, after tuning, to answer politely, refuse harmful requests and stick to the format you ask for.

04

Parameters Fine-tuning Inference