High Availability
Running multiple ctrl-exec instances sharing state for redundancy.
ctrl-exec is designed so that all persistent state lives on disk in known paths and the ctrl-exec process holds no runtime state. Any number of ctrl-exec instances sharing the same state files can serve requests interchangeably.
What Must Be Shared
For active/passive or active/active operation, three paths must be replicated or shared across all ctrl-exec instances:
/etc/ctrl-exec/- CA key and certificate, ctrl-exec key and certificate. All instances must present the same ctrl-exec cert — a divergent instance will be rejected by agents after a cert rotation.
/var/lib/ctrl-exec/agents/-
Agent registry. Agents continue to connect even if the registry is temporarily stale, but will not appear in
list-agentsuntil their entry is present. /var/lib/ctrl-exec/rotation.json- Cert rotation state and per-agent serial tracking.
Lock files (/var/lib/ctrl-exec/locks/) are per-instance and must not be shared.
Replication Approaches
A shared filesystem (NFS or DRBD) is the simplest approach — all instances read and write the same files. For environments where a shared mount is inconvenient, rsync the three paths from primary to standby after pairing or rotation events. In cloud environments, the agent registry can be stored in object storage. The CA key must not go in object storage — it belongs in a secrets manager or encrypted block volume.
Load Balancing
Each mTLS connection to port 7443 is self-contained — no session state must be pinned to a specific instance. Any TCP/L4 load balancer works: HAProxy in passthrough mode, keepalived for two-node VRRP failover, or DNS round-robin.
Active/Passive Failover
Promote the standby by starting its ctrl-exec services and moving the virtual IP or DNS entry. Agents reconnect transparently on their next request — no re-pairing required. The standby presents the same ctrl-exec cert and serial as the primary.
Active/Active Considerations
Multiple instances serving port 7443 simultaneously is supported for run and ping. Concurrency lock files are per-instance, so cross-instance duplicate runs of the same script on the same agent are possible — route requests for a given agent through the same instance if this matters.
Pairing mode and cert rotation should run on one designated node at a time. The pairing queue and ca.serial are not designed for concurrent write access from multiple instances.
Cert Rotation in an HA Setup
All instances must present the new cert before any agent processes its serial update. After running rotate-cert on the designated node, sync /etc/ctrl-exec/ to all other instances immediately and restart them.
What HA Does Not Solve
HA increases availability; it does not limit the blast radius of a CA key compromise. The CA is the single root of trust, shared across all instances. An attacker with the CA key can issue valid agent certificates regardless of how many ctrl-exec instances exist.
Reference Documentation
Complete HA guide — shared filesystem considerations, rsync procedures, promotion runbook, split-brain prevention: HIGH-AVAILABILITY