Skip to content

Agents do not connect

Symptoms: an agent you have just configured never appears in Fleet, or appears once and disconnects, or shows up briefly and goes red.

Walk this list in order — each step's evidence informs the next.

1. Network reachability

From the agent host:

curl -fsS -o /dev/null -w "%{http_code}\n" \
  https://AMPORA_HOST/health/live
  • 200: network is fine; skip ahead.
  • Could not resolve host: DNS issue.
  • Connection refused: nothing listens at the host:port.
  • SSL handshake failed: TLS issue (see step 2).

If the network path goes through a proxy, the agent's collector image must respect the standard HTTPS_PROXY env vars, and the proxy must allow long-lived WebSocket upgrades.

2. TLS

A TLS error usually means one of:

  • the certificate is for a different name (CN/SAN mismatch) — fix the hostname in the agent config;
  • the certificate is signed by a CA the agent does not trust — install the CA bundle on the agent host (ca-certificates-style package);
  • the certificate has expired — renew it.

Run a non-Ampora TLS check:

openssl s_client -connect AMPORA_HOST:443 -servername AMPORA_HOST < /dev/null

The output starts with the cert chain. Confirm CN, expiry, and chain.

3. Bootstrap token

A bootstrap token authenticates the first connection. Check:

  • Did you copy the token complete and unmodified? The token format is amp-bs.<base32>.<crc> — a CRC mismatch means truncation.
  • Is the token still within its TTL? The Tokens UI shows expiry.
  • Has the token already been redeemed? Tokens are single-use; redemption shows up in audit. Re-issue if so.
  • Was the token revoked? Audit log will say.

The agent's Authorization header must use the literal Bearer scheme:

extensions:
  opamp:
    server:
      ws:
        endpoint: wss://AMPORA_HOST/v1/opamp
        headers:
          Authorization: "Bearer amp-bs.XXXXXXX.YYYY"

A common mistake is Authorization: amp-bs.... (no Bearer prefix) — Ampora returns 401.

4. mTLS cert (after first connect)

If the agent connected once and then fails, it is now on the mTLS path and the bootstrap token is no longer accepted. Either:

  • the cert lives where the opamp_extension expects it (check the extension's data directory);
  • the cert is still valid (openssl x509 -in cert.pem -noout -dates);
  • the cert has not been revoked (Identities UI; CRL/OCSP).

If the cert is wrong, re-bootstrap: revoke the existing identity and issue a new bootstrap token.

5. Reverse proxy

The proxy must:

  • terminate TLS,
  • forward Upgrade: websocket and Connection: upgrade headers,
  • not buffer the WebSocket frames (no proxy_buffering on in Nginx),
  • allow long read/send timeouts (≥ 1 hour),
  • forward X-SSL-Client-Cert (or equivalent) so Ampora sees the mTLS client cert.

The most common single misconfiguration: missing Upgrade / Connection headers. Test with:

curl -i -sS \
  -H "Upgrade: websocket" \
  -H "Connection: Upgrade" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  -H "Sec-WebSocket-Version: 13" \
  https://AMPORA_HOST/v1/opamp

A correct response is HTTP/1.1 101 Switching Protocols (or 401 if unauthenticated, which is also good — it means the upgrade reached Ampora).

6. Plaintext bootstrap is off

Production deployments reject bootstrap on plain ws://. The agent must use wss://. This default is the right one — only flip OpAmp:BootstrapPlaintextAllowed=true for local development.

7. Capability gating

If the agent connects but never receives a config, check capabilities:

  • The agent must signal AcceptsRemoteConfig to receive configs.
  • The agent must signal ReportsEffectiveConfig for Ampora to detect drift.
  • The agent must signal ReportsHealth for health gates to do anything.

The Capabilities chip on the agent detail page shows what the agent told Ampora.

8. Server-side rejection

If the agent gets to the server but is rejected, two metrics help:

  • ampora_opamp_frames_rejected_total — every rejected frame, with a reason label (oversize, malformed, cap_mismatch, auth_failed).
  • The audit log records every authentication failure with the source IP and the reason.

Match the timestamps; the cause is in one of the two.

9. Self-Agent test

Ampora ships an in-process Self-Agent that connects to the same OpAMP endpoint as a real agent would. If the Self-Agent on the same Ampora is healthy on the Fleet page but external agents fail, the problem is on the network path between the agent and the server, not on Ampora itself.

Still stuck?

Open an issue with:

  • the Ampora version (footer in the UI),
  • the agent type and version,
  • the agent_description JSON the agent reports (visible in audit),
  • the value in ampora_opamp_frames_rejected_total with labels,
  • the relevant audit-log rows (timestamps redacted to a 5-minute window is fine).