Blueprint: The Fortified SRE

Protecting Production Infrastructure from Destructive Agent Actions

Agents with access to terminal tools or cloud APIs have broad capabilities, but pose a serious risk to production environments. A single "hallucinated" or accidental command like rm -rf / or db.dropDatabase() can take down an entire company.

This blueprint demonstrates how to implement Pattern-Based Action Gating and Dynamic Risk ABAC to protect critical infrastructure.

Architecture

1. Master Policy Definition

Create a policy that gates the sys:execute tool based on regex patterns and session risk attributes.

{
  "name": "sre-safety-policy",
  "priority": 5000,
  "rules": [
    {
      "id": "block-destructive-commands",
      "effect": "deny",
      "principals": ["*"],
      "actions": ["sys:execute"],
      "resources": ["*"],
      "conditions": {
        "danger_level": "critical"
      }
    },
    {
      "id": "deny-unauthorized-sre",
      "effect": "deny",
      "principals": ["group:intern"],
      "actions": ["sys:*"],
      "resources": ["*"]
    }
  ]
}

2. Implementation

Python Prototype

import os
import requests
from openai import OpenAI

# Initialize Client pointing to Control Zero Gateway
client = OpenAI(
    api_key="ignored", # Real key injected by Gateway secret storage
    base_url="http://cz-gateway:8001/v1",
    default_headers={
        "X-ControlZero-User-ID": "sre-user-1",
        "X-ControlZero-User-Group": "sre-ops",
        "X-ControlZero-Environment": "production"
    }
)

def run_command(command: str):
    # Pre-flight: Assess danger level (This can be done by a small local LLM or regex)
    is_destructive = any(p in command for p in ["rm -rf", "drop", "delete"])

    headers = {
        "X-ControlZero-Danger-Level": "critical" if is_destructive else "low"
    }

    try:
        response = client.chat.completions.create(
            model="gpt-5.4",
            messages=[{"role": "user", "content": f"Execute this: {command}"}],
            tools=[{
                "type": "function",
                "function": {
                    "name": "sys:execute",
                    "parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}}
                }
            }],
            extra_headers=headers # Inject risk context
        )
        return response
    except Exception as e:
        return f"Governance Intervention: {e}"

# Scenario A: Safe command
print(run_command("ls -la")) # Allowed

# Scenario B: Destructive command (Accidental halluncination)
print(run_command("rm -rf /app/data")) # BLOCKED by Control Zero

3. Validation Checklist

Identity Check: Verify that a user in the intern group is blocked from even simple ls commands.
Risk Check: Verify that rm -rf is blocked even when requested by an admin user if the Danger-Level is set to critical.
Audit Trail: Check the audit log store for entries where policy_decision = 'deny'.

Protecting Production Infrastructure from Destructive Agent Actions​

Architecture​

1. Master Policy Definition​

2. Implementation​

Python Prototype​

3. Validation Checklist​