Recipe: Tamper response

The problem

What should happen when the SDK detects that the policy bundle has been altered? The honest answer is "it depends on the severity of the workload." A dev laptop benefits from a loud warn. A financial- trading agent needs a quarantine.

The policy

version: '1'
# Tamper response.
#
# The four `default_on_tamper` settings select what the SDK does when
# it detects a corrupt or altered policy bundle, or when the machine
# is already in a local quarantine state. The rules below are the
# same for every tamper posture; what differs is the setting applied
# at the org level.
settings:
  default_action: deny
  default_on_missing: deny
  # The `default_on_tamper` value is what this recipe cycles through
  # in each scenario. Choose ONE of:
  #   warn       -- log the tamper, still honor the rule decision.
  #   deny       -- deny the one call whose eval triggered the check.
  #   deny-all   -- deny every call until bundle re-verifies.
  #   quarantine -- enter quarantine state; operator recovery required.
  default_on_tamper: quarantine
rules:
  - id: allow-safe-read
    allow: 'data:read'
    reason: 'Regular data reads are allowed under a clean bundle.'

Why it works

Each signed policy bundle carries default_on_tamper as a top-level field. Every SDK honors the same four-value enum, so an org policy authored once behaves identically in Python, Node, the Gateway, and the coding-agent hook.

warn logs a tamper event and continues to evaluate the rules. Useful during rollout of a new signing key or during low-stakes dev work.
deny returns a single deny on the call whose pre-eval check detected tamper, with reason_code: BUNDLE_TAMPERED. Good for "stop this one action" use-cases.
deny-all denies every future call until the bundle re-verifies. Equivalent to "the kill switch is pulled, but we did not quarantine the machine."
quarantine is the heaviest response: the SDK writes a local quarantine marker and returns MACHINE_QUARANTINED on every call, even after the bundle is replaced. Recovery requires an operator to clear the quarantine state (or to re-enroll the device).

What gets blocked

`default_on_tamper`	Bundle state	Agent call	Decision	reason_code
`deny`	tampered	`data:read`	deny	`BUNDLE_TAMPERED`
`deny-all`	tampered	`data:read`	deny	`BUNDLE_TAMPERED`
`quarantine`	quarantined	`data:read`	deny	`MACHINE_QUARANTINED`

What gets allowed

`default_on_tamper`	Bundle state	Agent call	Decision	reason_code
`warn`	tampered	`data:read`	allow	`RULE_MATCH` (tamper logged separately)
`quarantine`	clean	`data:read`	allow	`RULE_MATCH`
`deny`	clean	`data:read`	allow	`RULE_MATCH`

Test it yourself

curl -O https://docs.controlzero.ai/recipes/tamper-response/policy.yaml
curl -O https://docs.controlzero.ai/recipes/tamper-response/scenarios.json

# The scenarios file uses `tamper_state` on each case so one YAML
# covers all four postures plus the clean baseline.
controlzero test-policy policy.yaml --scenarios scenarios.json

To actually test tamper response in a live SDK, corrupt a byte in the cached bundle file and watch the audit log. Detailed steps live in Enforcement Behavior under "Tamper detection."

Caveats

quarantine is sticky. Once a machine is quarantined, re-enrolling the device or clearing the quarantine file is an operator action, not something the agent can self-heal. Document the recovery path before you flip this setting on for production.
A legitimate bundle rotation can trip tamper detection if the SDK is behind the rotation. Roll new signing keys with overlap: add the new key, wait for the SDK refresh interval, then retire the old key.
warn is the default (pre-#228 Phase 2 behavior). If you need a stricter default org-wide, set it once at the org level and let individual projects override if needed.
The Gateway surface cannot emit MACHINE_QUARANTINED (it has no per-machine state). If your fleet is heavy-Gateway, prefer deny-all for the same kill-switch behavior.

The problem​

The policy​

Why it works​

What gets blocked​

What gets allowed​

Test it yourself​

Caveats​

Related recipes​