ctrl-exec - Security Operations
Operational security guidance, monitoring, incident response, and known limitations
ctrl-exec - Security Operations
This document covers the operational security posture of a running ctrl-exec deployment: what to monitor, how to respond to incidents, known limitations, and deployment-specific guidance. For the system's security model and architecture, see SECURITY.md.
ctrl-exec Host Security
The security of the entire fleet depends on the security of the ctrl-exec host. The CA key, ctrl-exec cert and key, full agent registry, and lock files all reside there. An attacker with root access to the ctrl-exec host can issue arbitrary agent certificates and connect to any agent.
Treat the ctrl-exec host as a privileged infrastructure node:
- Restrict interactive login to named administrators only; no shared accounts
- Audit all access via system auth logs (
/var/log/auth.logor equivalent) - Keep the ctrl-exec host off the general network; access via bastion or VPN
- Apply OS-level hardening (no unnecessary services, up-to-date packages)
- Do not run untrusted workloads on the ctrl-exec host
The ctrl-exec group grants CLI access to the ctrl-exec binary and read
access to the agent registry at /var/lib/ctrl-exec/agents/. This includes
each agent's hostname and IP address. Membership of the ctrl-exec group
is a privilege; treat it accordingly.
Token and Credential Lifecycle
ctrl-exec has no built-in token management. Tokens are arbitrary strings that
callers include in requests; they are forwarded to the auth hook as
ENVEXEC_TOKEN and to agents via the request body. All token issuance,
validation, expiry, and revocation logic lives in the auth hook.
The username field is a caller-supplied string with no structural meaning
within ctrl-exec. The ctrl-exec does not authenticate it, and it is not
verified to match any local or remote identity. Its purpose is to carry an
identity assertion that an auth hook can forward to an external authentication
service alongside the token. A hook that grants elevated permissions based
solely on the value of username, without verifying it through the token or
another mechanism, can be trivially bypassed by any caller that sets the field
to a privileged value.
The recommended pattern for identity-bearing requests:
- The caller supplies a
tokenthat encodes or binds to an identity (e.g. a signed JWT, an opaque token registered in an identity service, or an API key issued to a specific service account) - The caller also supplies
usernameas an advisory hint - The hook validates the token against an identity service; the service returns the authorised identity associated with that token
- The hook compares the authorised identity against the asserted
usernameonly as an additional consistency check, not as the primary access control basis - Privilege decisions are made on the validated token identity, not the asserted username
This pattern allows hooks to go beyond static local accounts: any identity service that can validate a token can be used. The hook does not need to maintain its own user database; it delegates to the identity service.
Token revocation for a compromised service: update the auth hook to reject the service's token. If the hook validates against a central service, revoke the token there. No ctrl-exec restart is required; the hook's own logic takes effect on the next request.
Auth Hook Security
- Hook update path
-
Do not push auth hook updates via
ctrl-exec run. If the hook is replaced by a script that a compromised token can invoke, the hook that validates that token can be overwritten. Update hooks through direct filesystem access, configuration management tooling (Ansible, Salt, Puppet), or a dedicated privileged deployment channel that does not pass through ctrl-exec itself. - Token exposure in hook logging
-
The token is available in the hook's
ENVEXEC_TOKENenvironment variable and in the JSON object on stdin. Do not log environment variables within the hook; log only specific fields from stdin. A hook that logsenvoutput exposes the token in syslog, where it may be accessible to non-root users depending on syslog permissions. - External validation service availability
- If the hook validates tokens against an external service, failure to reach that service must result in a denied request (exit code 1 or 2). Do not fail open. The operational impact of blocking all requests during a validation service outage is preferable to authorising unvalidated requests. Design the validation service for high availability if ctrl-exec operations are time-critical.
- Two-token pattern
- The ctrl-exec-side hook and the agent-side hook are independent and can validate different tokens. A higher-assurance deployment can issue separate credentials for the ctrl-exec-to-hook path and the agent-to-hook path. The ctrl-exec validates a ctrl-exec-level token; the agent validates a forwarded per-operation token. This is a supported configuration - the token is forwarded from ctrl-exec to agent in the request body and is available to both hooks.
- Agent hook scope
-
The agent hook only runs for
runrequests.pingrequests do not invoke the agent hook. An agent hook cannot restrict which sources may call thepingendpoint; source-based restrictions on the agent useallowed_ipsinagent.conforENVEXEC_SOURCE_IPin the hook forrunrequests. The absence of ahostsfield on the agent side is intentional: the agent is unaware of which other agents are targeted in the same invocation. - Allowlist information in hook responses
-
The agent hook is called after allowlist validation. A denied hook response
therefore confirms to the caller that the script name exists in the allowlist
(a non-existent script would have been rejected earlier with a different
error code). This is a known characteristic of the execution order. Operators
who need to conceal allowlist contents should note that hook denial does not
prevent an authorised caller from querying
/capabilitiesto enumerate the full allowlist.
Sensitive Script Output
Scripts that return credentials, key material, or other sensitive data will
have that data included in the API response and stored in the result file at
/var/lib/ctrl-exec/runs/<reqid>.json for 24 hours. The result directory
is 0770 root:ctrl-exec - readable by all members of the ctrl-exec group.
Options for handling sensitive output:
- Write sensitive data to a local file on the agent rather than stdout; return only a status code and path from the script
- Have the script encrypt the output before writing to stdout; the caller decrypts client-side
- Restrict
GET /status/{reqid}callers to the original caller only via the auth hook (not built in; hook logic required)
Result retrieval at GET /status/{reqid} is not currently logged with the
caller's identity. All authenticated API callers can retrieve any result by
reqid. A hook that restricts this must infer the original caller from the
token and compare it to the stored result's caller context (not provided by
the API directly - requires a lookup in the hook's own store).
update-ctrl-exec-serial Security
The update-ctrl-exec-serial script validates that its argument is a
lowercase hex string of 8–40 characters before writing to
/etc/ctrl-exec-agent/ctrl-exec-serial. Arguments that fail the hex pattern
check or fall outside the length range are rejected with a non-zero exit and
an error message; no file is written.
Despite this validation, an API caller with access to this script can still
write a plausible-looking but incorrect hex serial, causing all subsequent
/run and /ping operations to return 403 until the correct serial is
restored. The auth hook should restrict invocation of update-ctrl-exec-serial
to privileged tokens only. A standard operator token should not be able to call
this script. Use a separate token issued to the ctrl-exec's own rotation
machinery, and block it for all other callers in the hook.
Call rate limiting per agent
Even with token restriction, a rotation machinery bug or misconfigured caller
could issue rapid successive calls to update-ctrl-exec-serial. Each call
writes the serial file and sends SIGHUP, clearing all rate-limit state on the
agent. The following hook pattern adds a per-agent time-window limit on top of
the token restriction.
The hook uses a state file in a directory writable only by the hook's runtime user. The state file records the last accepted call time per agent hostname. Calls within the window are rejected with exit code 1 (deny, hook error logged).
#!/bin/bash
# Auth hook with rate-limit on update-ctrl-exec-serial
TOKEN_ROTATION="${ROTATION_TOKEN:-}" # set in hook environment or config
RATE_DIR="/var/lib/ctrl-exec/hook-rate"
WINDOW_SECONDS=300 # one call per agent per 5 minutes
# Only apply rate-limit logic to the target script
if [ "$ENVEXEC_SCRIPT" != "update-ctrl-exec-serial" ]; then
# Pass all other scripts through to normal token validation
if [ "$ENVEXEC_TOKEN" = "$TOKEN_ROTATION" ]; then exit 0; fi
exit 1
fi
# Rotation token required
if [ "$ENVEXEC_TOKEN" != "$TOKEN_ROTATION" ]; then
exit 1
fi
# Rate limit: one successful call per agent per window
mkdir -p "$RATE_DIR"
STATE_FILE="$RATE_DIR/${DISPATCHER_HOST///_}.last"
NOW=$(date +%s)
if [ -f "$STATE_FILE" ]; then
LAST=$(cat "$STATE_FILE" 2>/dev/null || echo 0)
ELAPSED=$(( NOW - LAST ))
if [ "$ELAPSED" -lt "$WINDOW_SECONDS" ]; then
echo "rate-limited: last call ${ELAPSED}s ago, window ${WINDOW_SECONDS}s" >&2
exit 1
fi
fi
echo "$NOW" > "$STATE_FILE"
exit 0
DISPATCHER_HOST is the target agent hostname as recorded in the registry —
it is not caller-supplied and cannot be spoofed. The state directory should
be 0700 owned by the user the hook runs as. The hook should be set 0700
with root ownership; its parent directory should not be writable by the
ctrl-exec process.
Note that SIGHUP from update-ctrl-exec-serial clears rate-limit state
in the agent's connection limiter, not in this hook's state file. The two
mechanisms are independent.
CA Compromise Recovery
If the CA private key is compromised, every cert signed by it must be treated as untrusted. An attacker with the CA key can issue valid agent certificates and connect to any agent as if they were the ctrl-exec.
Recovery procedure:
Take agents offline or isolate them immediately from all network sources. The priority is preventing the attacker from using newly-issued certs before recovery completes.
On the ctrl-exec host, regenerate the CA:
```bash
Back up the compromised material first for forensics
cp -a /etc/ctrl-exec /etc/ctrl-exec.compromised.$(date +%Y%m%d)
ctrl-exec setup-ca # generates new CA key and cert ctrl-exec setup-ctrl-exec # generates new ctrl-exec cert signed by new CA ```
Distribute the new CA cert to all agents. This cannot be done via ctrl-exec (the agents do not trust the new CA yet). Use SSH or configuration management tooling to push
/etc/ctrl-exec/ca.crtto/etc/ctrl-exec-agent/ca.crton each agent.Re-pair every agent. The agent certs signed by the old CA are no longer valid:
```bash
On each agent host
rm /etc/ctrl-exec-agent/agent.{key,crt} ctrl-exec-agent request-pairing --dispatcher
``` Once all agents are re-paired, decommission the compromised CA material. Ensure the old CA cert is removed from all trust stores.
Investigate how the CA key was accessed: review ctrl-exec host auth logs, check for unauthorised access to
/etc/ctrl-exec/ca.key, and determine the scope of the compromise before returning to normal operations.
This procedure affects the entire fleet. Test the re-pairing path before a
real incident - the orchestrated pairing flow (--background mode) is
designed for bulk re-pairing scenarios.
Monitoring and Alerting
ctrl-exec's structured logging provides the data for detection. Alerting must be configured in the operator's log management tooling (Graylog, Elasticsearch, Loki, syslog-ng filters, etc.). ctrl-exec does not include a monitoring component.
The complete alert pattern reference — covering security events, execution failures, rotation events, and configuration problems — is in LOGGING.md.
Key security-relevant actions to alert on: rate-block, serial-reject,
revoked-cert, ip-block, deny (repeated, same PEER). Key rotation
signals: serial-stale, serial-broadcast-fail (repeated for same agent),
cert-rotation-fail. These are documented in full in LOGGING.md.
Operational signals worth alerting on:
- All agents returning
ACTION=serial-rejectsimultaneously after a rotation indicates the rotation broadcast failed or was corrupted. Runctrl-exec serial-statusandctrl-exec rotate-certimmediately. - A sudden increase in
ACTION=run EXIT=non-zeroacross multiple agents may indicate a script was modified or a dependency broke. Correlate with deployment events. Note that non-zero exit is logged at INFO priority on both ctrl-exec and agent — alert on the EXIT value itself, not the log priority level.
cert_overlap_dayscalibration-
The default overlap window is 30 days. If agents in your fleet are routinely
offline for maintenance or hibernation longer than this,
stalestatus becomes normal background noise rather than a signal. Setcert_overlap_daysinctrl-exec.confto a value above the maximum observed downtime for your fleet. A stale alert only has diagnostic value if it is unexpected.
Known Limitations
- Request result access
-
GET /status/{reqid}returns stored run results to any authenticated caller, not only the original submitter. Result access is not logged with the caller's identity. Reqid format provides limited enumeration resistance (see reqid entropy below). Sensitive results should not be left in the result store; design scripts to minimise what they return via stdout if the results will be stored. - Rate state persistence
-
Rate limit state is held in memory and cleared on SIGHUP or agent restart.
update-ctrl-exec-serialsends SIGHUP as part of normal rotation, which clears all rate blocks. The window between the SIGHUP and the next connection is milliseconds in practice - not operationally meaningful - but operators should be aware that a serial update resets rate state on all agents. Persistent rate state across reloads is not currently supported. MemoryDenyWriteExecuteand JIT runtimes-
The agent systemd unit sets
MemoryDenyWriteExecute=yes. This is safe for the current bash-only script inventory but will cause silent failures if a JIT-compiled runtime (Java, Node.js, Python with JIT) is added to the allowlist. There is no mechanism to detect this conflict at allowlist load time. When adding a new script whose interpreter is a JIT runtime, removeMemoryDenyWriteExecute=yesfrom the unit file before deploying. - No ctrl-exec-side agent cert revocation
-
The revocation list on agents covers certs presented to the agent. There
is no equivalent mechanism on the ctrl-exec side to block a stolen agent
cert from connecting to the ctrl-exec. An agent that has been decommissioned
via
ctrl-exec unpairhas its cert left technically valid until natural expiry. See Unpairing and Decommission below for the recommended workflow to close this window promptly.
Unpairing and Decommission
ctrl-exec unpair <hostname> removes the agent from the registry. The agent
will no longer receive cert renewals and will become stale when the overlap
window expires. However, the agent's certificate remains cryptographically
valid until its natural expiry date, which is printed by the unpair command.
During that window, a host holding a copy of the agent cert and key can still
connect to the ctrl-exec on port 7443.
The recommended workflow after unpairing is:
Run
ctrl-exec unpair <hostname>. Note the expiry date printed.Obtain the agent cert serial:
bash openssl x509 -noout -serial -in /etc/ctrl-exec-agent/agent.crtIf you no longer have access to the agent host, retrieve the serial from the registry record before unpairing, or from the ctrl-exec's CA serial log if available.
Add the serial to the revocation list on every agent that the decommissioned host could have reached. The format
serial=DEADBEEF(direct openssl output) is accepted as-is:bash echo "serial=DEADBEEF" >> /etc/ctrl-exec-agent/revoked-serials systemctl reload ctrl-exec-agentFor fleet-wide distribution, use
ctrl-exec runto push the serial append and SIGHUP to all remaining agents before the unpairing takes effect.Verify the serial appears in the revocation list on the affected agents:
bash grep -i DEADBEEF /etc/ctrl-exec-agent/revoked-serialsDecommission or reimage the host promptly. Do not leave a host with a valid agent cert and key accessible after unpairing — revocation on the agents closes the inbound path, but the cert could be extracted and used elsewhere if the host is not secured.
If the agent cert and key have been confirmed destroyed (host reimaged, disk wiped), steps 2–4 are optional but recommended as defence in depth.
The revocation list is checked on every incoming mTLS connection before any request is processed. Once the reload completes on each agent, the decommissioned cert is blocked immediately on reconnect. Any in-flight connection established before the reload completes will run to completion — restart the agent service rather than reloading if in-flight connections must also be terminated.
Docker-Specific Security
- Docker socket access
-
Any user or process with access to the Docker socket on the ctrl-exec host
can start a container with the
ctrl-exec-datavolume mounted and read the CA private key. Restrict Docker socket access to root and any explicitly designated operators. Do not grant Docker socket access to services running on the ctrl-exec host that do not require it. This is the most significant additional risk of a containerised deployment versus a bare-metal install. - Stale pairing request
- The ctrl-exec's pairing queue automatically cleans up requests older than 10 minutes. In the Docker workflow, the agent container exits after sending its pairing request and must be restarted by the operator after approval. If the 10-minute window expires before the container is restarted and re-triggers the pairing, the request is silently deleted. Recovery: restart the agent container; it will send a fresh pairing request.
DISPATCHER_HOSTtrust-
The
DISPATCHER_HOSTenvironment variable in the agent container determines which host receives the pairing request including the agent's CSR. If this variable is misconfigured to point at an attacker-controlled host, the attacker receives the CSR and can return a certificate signed by their own CA. The agent stores whatever cert is returned. All subsequent operations use the attacker's CA as the trust anchor. VerifyDISPATCHER_HOSTpoints at the correct ctrl-exec before starting agent containers. For production deployments, consider settingDISPATCHER_HOSTin a compose file under version control rather than passing it as a runtime variable. allowed_ipsin containerised deployments-
In a Docker network, all containers on the same network can reach port 7443
on the agent container. Set
allowed_ipsinagent.confto the ctrl-exec container's IP or subnet to limit which containers can connect to the agent. The ctrl-exec container's IP is stable within a compose deployment (Docker assigns IPs deterministically by service name). Example:
ini
allowed_ips = 172.18.0.0/16
For tighter control, use Docker network policies or pin the ctrl-exec
container's IP in the compose file and use an exact IP in allowed_ips.
- Volume backup
-
All persistent state is on named volumes. Back up both
ctrl-exec-data(CA key, ctrl-exec cert) andctrl-exec-registry(agent registry) on the ctrl-exec side. On the agent side,agent-datacontains the agent cert and key. Loss ofagent-datarequires re-pairing that agent. Loss ofctrl-exec-datarequires regenerating the CA and re-pairing the entire fleet. Treat volume backup with the same priority as the CA key backup described in SECURITY.md.