Recipe: Tamper response
The problem
What should happen when the SDK detects that the policy bundle has been altered? The honest answer is "it depends on the severity of the workload." A dev laptop benefits from a loud warn. A financial- trading agent needs a quarantine.
The policy
version: '1'
# Tamper response.
#
# The four `default_on_tamper` settings select what the SDK does when
# it detects a corrupt or altered policy bundle, or when the machine
# is already in a local quarantine state. The rules below are the
# same for every tamper posture; what differs is the setting applied
# at the org level.
settings:
default_action: deny
default_on_missing: deny
# The `default_on_tamper` value is what this recipe cycles through
# in each scenario. Choose ONE of:
# warn -- log the tamper, still honor the rule decision.
# deny -- deny the one call whose eval triggered the check.
# deny-all -- deny every call until bundle re-verifies.
# quarantine -- enter quarantine state; operator recovery required.
default_on_tamper: quarantine
rules:
- id: allow-safe-read
allow: 'data:read'
reason: 'Regular data reads are allowed under a clean bundle.'
Why it works
Each signed policy bundle carries default_on_tamper as a top-level
field. Every SDK honors the same four-value enum, so an org policy
authored once behaves identically in Python, Node, Go, the Gateway,
and the coding-agent hook.
warnlogs a tamper event and continues to evaluate the rules. Useful during rollout of a new signing key or during low-stakes dev work.denyreturns a single deny on the call whose pre-eval check detected tamper, withreason_code: BUNDLE_TAMPERED. Good for "stop this one action" use-cases.deny-alldenies every future call until the bundle re-verifies. Equivalent to "the kill switch is pulled, but we did not quarantine the machine."quarantineis the heaviest response: the SDK writes a local quarantine marker and returnsMACHINE_QUARANTINEDon every call, even after the bundle is replaced. Recovery requires an operator to clear the quarantine state (or to re-enroll the device).
What gets blocked
default_on_tamper | Bundle state | Agent call | Decision | reason_code |
|---|---|---|---|---|
deny | tampered | data:read | deny | BUNDLE_TAMPERED |
deny-all | tampered | data:read | deny | BUNDLE_TAMPERED |
quarantine | quarantined | data:read | deny | MACHINE_QUARANTINED |
What gets allowed
default_on_tamper | Bundle state | Agent call | Decision | reason_code |
|---|---|---|---|---|
warn | tampered | data:read | allow | RULE_MATCH (tamper logged separately) |
quarantine | clean | data:read | allow | RULE_MATCH |
deny | clean | data:read | allow | RULE_MATCH |
Test it yourself
curl -O https://docs.controlzero.ai/recipes/tamper-response/policy.yaml
curl -O https://docs.controlzero.ai/recipes/tamper-response/scenarios.json
# The scenarios file uses `tamper_state` on each case so one YAML
# covers all four postures plus the clean baseline.
controlzero test-policy policy.yaml --scenarios scenarios.json
To actually test tamper response in a live SDK, corrupt a byte in the cached bundle file and watch the audit log. Detailed steps live in Enforcement Behavior under "Tamper detection."
Caveats
quarantineis sticky. Once a machine is quarantined, re-enrolling the device or clearing the quarantine file is an operator action, not something the agent can self-heal. Document the recovery path before you flip this setting on for production.- A legitimate bundle rotation can trip tamper detection if the SDK is behind the rotation. Roll new signing keys with overlap: add the new key, wait for the SDK refresh interval, then retire the old key.
warnis the default (pre-#228 Phase 2 behavior). If you need a stricter default org-wide, set it once at the org level and let individual projects override if needed.- The Gateway surface cannot emit
MACHINE_QUARANTINED(it has no per-machine state). If your fleet is heavy-Gateway, preferdeny-allfor the same kill-switch behavior.