Policies¶

A policy is a rule the rollout engine evaluates before pushing a configuration to an agent. If the policy denies, the push does not happen and the audit log records why.

Policies are tenant-scoped — what one tenant requires is none of another tenant's business.

Built-in policies¶

Two ship out of the box (and are enforced by default):

Policy	What it forbids
`default-deny-exporter-removal`	Removing an exporter that has agents currently sending data through it without a manual override
`default-deny-tls-insecure-on-non-localhost`	An exporter with `tls.insecure: true` to a non-localhost endpoint

Both can be relaxed per tenant if your environment has a real reason.

Where policies run¶

The policy engine runs:

when a rollout is created (precondition check),
when a rollout starts pushing to each agent (per-agent evaluation),
when an auto-apply is queued.

A failed precondition refuses to start the rollout. A failed per-agent evaluation skips that one agent and records a Rollout.PushDenied audit event.

Listing policies¶

Settings → Policies lists every policy on the active tenant:

name,
type (Built-in or Custom),
status (Draft / Approved / Active / Retired),
last edit author and approver,
audit-log link.

Creating a custom policy¶

Settings → Policies → New policy:

Name — unique per tenant.
Description — free text. Reviewers will read this; explain intent.
Body — the DSL expression (see Custom policy DSL).
Save as draft.

Drafts run in the policy engine in a "shadow" mode — the engine evaluates them but never blocks; instead, every "would-deny" is recorded as a Policy.WouldDeny audit event so you can dry-run a new policy before turning it on.

Approval flow¶

A draft policy must be Approved by a different operator (the four-eyes principle) before it goes Active. See Approval flows.

Active policies¶

Active policies block in real time. The "would-deny" shadow stops; the engine becomes blocking. The audit-log row for the activation includes the policy version's hash for tamper-evidence.

Retiring a policy¶

Retire keeps the policy in the database for the audit trail but takes it out of evaluation. Retire is reversible (re-Activate). Hard delete is not offered — the audit history of what the policy did is itself audit material.

Performance and safety¶

The policy engine evaluates expressions with a 50 ms wall-clock budget per evaluation. An expression that times out is treated as deny, not allow — fail-closed by design. Expressions are cached in compiled form so the second evaluation reuses the parse tree.

For 1 000 agents in a single rollout step, total policy evaluation overhead is a few hundred milliseconds. Custom policies that walk the entire YAML AST are slower; the rule of thumb is "tens of milliseconds for a tight rule, low hundreds for a heavy one".

Examples¶

A handful of common custom policies tenants write:

"No exporter to a non-EU endpoint." — checks the destination hostname against an allow-list.
"Every traces pipeline must go through tail_sampling." — checks pipeline composition.
"Hostmetrics interval ≥ 60 s." — caps cardinality.
"No debug exporter in production." — gated by group label.

Walkthroughs of each are on Custom policy DSL.