Lab Workspace

PEFT Tuning (LoRA)

Classical parameter-efficient fine-tuning on HybrIE — the primary, stable training route. Train small LoRA adapters on top of frozen base models, version and register them with the runtime, and score every artifact against the zero-LoRA baseline.

What PEFT LoRA is

PEFT (parameter-efficient fine-tuning) trains a small set of LoRA weight matrices on top of a frozen base model. The base weights never change — an adapter only adds low-rank deltas with an explicit rank (r) and alpha scaling to the attention/MLP modules listed in target_modules. The result is a checkpoint that is orders of magnitude smaller than a full fine-tune, cheap to train, and cheap to swap at serving time.

Adapters can be trained against these base model families:

Family	base_model value
Qwen3 4B	`qwen3-4b`
Qwen3 0.6B	`qwen3-0.6b`
Mistral 7B	`mistral-7b`

PEFT LoRA vs Doc-to-LoRA. These are different things — never conflate them. PEFT LoRA is classical parameter-efficient fine-tuning: you train one adapter with an explicit rank, alpha, and target modules, and you tune those knobs. Doc-to-LoRA is context internalization: a trained hypernetwork generates adapters from documents on the fly — there is no rank or alpha for you to tune, and its training endpoint trains the hypernetwork itself, not an adapter.

Adapter artifacts are versioned and immutable — retraining the same adapter produces a new version rather than mutating the existing one, so a version you serve today behaves identically tomorrow.

Train an adapter (SFT)

POST/v1/train/sft

Supervised fine-tuning of one plain PEFT LoRA adapter — the standard supervised route, and the SFT counterpart of RL training's peft-lora policy. Context is inlined in the prompt, as in standard SFT — /v1/train/d2l remains the hypernetwork path. The run writes a real PEFT adapter directory (adapter_model.safetensors + adapter_config.json) directly consumable by POST /v1/eval/adapter, POST /v1/eval/rl with policy=peft-lora, and as an init_checkpoint_dir warm-start for /v1/train/rl.

Parameter	Type	Description
`family`required	string	Base model family — `qwen3-4b`, `qwen3-0.6b`, or `mistral-7b`.
`lora_rank`	integer	LoRA rank (r) for the trained adapter; overrides the family preset.
`init_adapter_dir`	string	Warm-start from an exported PEFT adapter directory instead of a fresh initialization.
`num_examples`	integer	Number of training examples.
`epochs`	integer	Training epochs over the example set.
`lr`	float	Learning rate.
`l1_coef`	float	L1 regulariser coefficient applied to the adapter weights.
`eval_examples`	integer	Held-out eval size. When > 0, a held-out eval runs automatically once training finishes.
`seed`	integer	Seed for reproducible example generation.
`device`	string	auto, cuda, metal, or cpu.
`model_dir`	string	Optional base-model directory override.
`checkpoint_dir`	string	Artifact destination for the trained adapter directory. Default `~/hybrie-mounts/d2l-artifacts/sft-<job_id>/`.

curl

curl -X POST http://localhost:8080/v1/train/sft \
  -H "Content-Type: application/json" \
  -d '{
    "family": "qwen3-4b",
    "lora_rank": 16,
    "epochs": 3,
    "eval_examples": 100
  }'

Response (202 Accepted):

json

{
  "job_id": "sft-1718102400",
  "checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/sft-1718102400/"
}

SFT runs appear in Training Jobs as kind sft-peft-lora. Progress points stream step / loss samples — for SFT jobs the reward field carries the cross-entropy term. Runs are cancellable via DELETE /v1/train/jobs/:id, with partial checkpoints kept.

SFT → GRPO workflow. Train with SFT first, then optionally continue with reward-driven GRPO: pass the SFT checkpoint directory as init_checkpoint_dir to /v1/train/rl with policy=peft-lora.

Train an adapter (GRPO)

POST/v1/train/rl

For reward-driven training, RL training with policy=peft-lora trains one plain PEFT adapter directly against a verifiable environment with GRPO. The run writes the same standard adapter directory (adapter_model.safetensors + adapter_config.json) you can register below and score with POST /v1/eval/adapter.

Parameter	Type	Description
`family`required	string	Base model family — `qwen3-4b`, `qwen3-0.6b`, or `mistral-7b`.
`policy`	string	Set to `peft-lora` to train a classical PEFT LoRA adapter (default is `hypernet`).
`lora_rank`	integer	LoRA rank (r) for the trained adapter; overrides the family preset.
`init_checkpoint_dir`	string	Warm-start from an existing PEFT adapter directory — e.g. an SFT checkpoint — instead of a fresh initialization.
`environment`	string	Verifiable environment providing the reward signal. Default `niah`.
`prompts`	integer	Number of environment prompts to train on.
`group_size`	integer	GRPO group size — completions sampled per prompt.
`lr`	float	Learning rate.
`kl_beta`	float	KL penalty coefficient against the reference policy.
`seed`	integer	Seed for reproducible task generation.
`device`	string	auto, metal, cuda, or cpu.
`checkpoint_dir`	string	Artifact destination for the trained adapter directory.

curl

curl -X POST http://localhost:8080/v1/train/rl \
  -H "Content-Type: application/json" \
  -d '{
    "family": "qwen3-4b",
    "policy": "peft-lora",
    "lora_rank": 16,
    "prompts": 64
  }'

Response (202 Accepted):

json

{
  "job_id": "train-1718102400",
  "checkpoint_dir": "~/hybrie-mounts/d2l-artifacts/train-1718102400/"
}

See RL Training (GRPO) for the full training mechanics — group-relative advantages, environments, cancellation — and Training Jobs for tracking the run.

Four ways to produce a PEFT-format adapter: SFT training via /v1/train/sft (the primary supervised route), GRPO RL training via /v1/train/rl with policy=peft-lora, exporting from the D2L pipeline (its adapter artifacts are PEFT-format), or registering an externally trained adapter below.

Register an external adapter

POST/v1/adapters

Adapters trained outside the platform can be registered directly with the runtime — point artifact_ref at the adapter weights and the runtime serves them like any other registered adapter.

Parameter	Type	Description
`id`required	string	Stable identifier used to reference the adapter at load and inference time.
`version`	string	Version of the adapter artifact. Versions are immutable once registered.
`base_model`required	string	Base model family the adapter was trained on — `qwen3-4b`, `qwen3-0.6b`, or `mistral-7b`.
`rank`required	integer	LoRA rank (r) — the dimensionality of the low-rank update matrices.
`alpha`	integer	LoRA alpha scaling factor applied to the update.
`target_modules`	string[]	Model modules the LoRA matrices attach to (e.g. attention projections).
`artifact_ref`	string	Path or URI to the adapter weights.

curl

curl -X POST http://localhost:8080/v1/adapters \
  -H "Content-Type: application/json" \
  -d '{
    "id": "refund-policy",
    "version": "v3",
    "base_model": "qwen3-4b",
    "rank": 16,
    "alpha": 32,
    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"],
    "artifact_ref": "/var/hybrie/adapters/refund-policy/v3"
  }'

Score it

POST/v1/eval/adapter

Score any PEFT adapter directory on held-out NIAH against the zero-LoRA baseline — the report includes adapter_acc, base_acc, and lift, plus the adapter's own r / lora_alpha / target_modules so the score is self-describing. See Evaluation for the full parameter and report reference.

bash

stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>

Adapter registry

The runtime keeps a registry of adapters it knows how to serve.

GET/v1/adapters

List all registered adapters. The response also reports the serving configuration: lora_mode, max_active_loras, and loaded_loras[] — the adapters currently loaded into the runtime.

GET/v1/adapters/:id

Metadata for a single adapter — base model, rank/alpha, target modules, and versions.

GET/v1/adapters/status

Health of the adapter subsystem: what is registered, what is currently loaded, and serving readiness.

From the CLI

bash

# Train a classical PEFT LoRA adapter with SFT — the standard supervised route
stimulir lab train sft --family qwen3-4b --lora-rank 16 --epochs 3

# Or train with GRPO against a verifiable environment
stimulir lab train rl --family qwen3-4b --policy peft-lora --lora-rank 16 --prompts 64

# Score it on held-out NIAH
stimulir lab eval adapter --family qwen3-4b --adapter-dir <dir>

# List registered adapters
stimulir lab adapters list

Full GRPO mechanics — environments, KL penalty, cancellation — in RL Training (GRPO).
Serve adapters at runtime — load, unload, and per-request routing — in Hot-swap Inference.
Internalize documents as adapters with a hypernetwork in Doc-to-LoRA (Context Internalization).
Track asynchronous training runs in Training Jobs.