Skip to content

Policies

A policy is a rule the rollout engine evaluates before pushing a configuration to an agent. If the policy denies, the push does not happen and the audit log records why.

Policies are tenant-scoped — what one tenant requires is none of another tenant's business.

Built-in policies

Two ship out of the box (and are enforced by default):

Policy What it forbids
default-deny-exporter-removal Removing an exporter that has agents currently sending data through it without a manual override
default-deny-tls-insecure-on-non-localhost An exporter with tls.insecure: true to a non-localhost endpoint

Both can be relaxed per tenant if your environment has a real reason.

Where policies run

The policy engine runs:

  • when a rollout is created (precondition check),
  • when a rollout starts pushing to each agent (per-agent evaluation),
  • when an auto-apply is queued.

A failed precondition refuses to start the rollout. A failed per-agent evaluation skips that one agent and records a Rollout.PushDenied audit event.

Listing policies

Settings → Policies lists every policy on the active tenant:

  • name,
  • type (Built-in or Custom),
  • status (Draft / Approved / Active / Retired),
  • last edit author and approver,
  • audit-log link.

Creating a custom policy

Settings → Policies → New policy:

  1. Name — unique per tenant.
  2. Description — free text. Reviewers will read this; explain intent.
  3. Body — the DSL expression (see Custom policy DSL).
  4. Save as draft.

Drafts run in the policy engine in a "shadow" mode — the engine evaluates them but never blocks; instead, every "would-deny" is recorded as a Policy.WouldDeny audit event so you can dry-run a new policy before turning it on.

Approval flow

A draft policy must be Approved by a different operator (the four-eyes principle) before it goes Active. See Approval flows.

Active policies

Active policies block in real time. The "would-deny" shadow stops; the engine becomes blocking. The audit-log row for the activation includes the policy version's hash for tamper-evidence.

Retiring a policy

Retire keeps the policy in the database for the audit trail but takes it out of evaluation. Retire is reversible (re-Activate). Hard delete is not offered — the audit history of what the policy did is itself audit material.

Performance and safety

The policy engine evaluates expressions with a 50 ms wall-clock budget per evaluation. An expression that times out is treated as deny, not allow — fail-closed by design. Expressions are cached in compiled form so the second evaluation reuses the parse tree.

For 1 000 agents in a single rollout step, total policy evaluation overhead is a few hundred milliseconds. Custom policies that walk the entire YAML AST are slower; the rule of thumb is "tens of milliseconds for a tight rule, low hundreds for a heavy one".

Examples

A handful of common custom policies tenants write:

  • "No exporter to a non-EU endpoint." — checks the destination hostname against an allow-list.
  • "Every traces pipeline must go through tail_sampling." — checks pipeline composition.
  • "Hostmetrics interval ≥ 60 s." — caps cardinality.
  • "No debug exporter in production." — gated by group label.

Walkthroughs of each are on Custom policy DSL.