Gateway Proxy
Supported modes: Hosted Hybrid Available in: Free Solo Teams
If you are building a new AI app from scratch, the SDK gives you finer control than the gateway. If you are governing developer tools (Claude Code, Cursor, Codex CLI), use coding hooks instead -- they intercept tool calls at the assistant level, not at the LLM API level.
Control Zero Gateway is a transparent proxy that sits between your AI agents and LLM providers. It intercepts every request, evaluates tool calls against your policies, and blocks unauthorized actions, all without changing your agent's code.
How It Works
The gateway operates in two phases:
- Pre-flight (request guard): Before forwarding to the LLM, the gateway checks model blocking rules, estimates cost against budget caps, and scans for PII in prompts.
- Response interception: After the LLM responds, every
tool_use(Anthropic) orfunction_call(OpenAI) block is evaluated against your policies. Denied tool calls are replaced inline with a policy denial message. Both streaming and non-streaming responses are supported.
Quick Start
Change your LLM provider base URL to point to the Control Zero gateway:
Anthropic (Claude)
# Before
ANTHROPIC_BASE_URL=https://api.anthropic.com
# After
ANTHROPIC_BASE_URL=https://gateway.controlzero.ai
Add the headers:
X-ControlZero-API-Key: cz_live_xxx
X-ControlZero-Agent-ID: my-first-agent
X-ControlZero-API-Key(required) is your project key from the dashboard.X-ControlZero-Agent-ID(optional) labels the caller for audit attribution. Defaults to<provider>-direct(e.g.anthropic-direct) if omitted.
OpenAI
# Before
OPENAI_BASE_URL=https://api.openai.com
# After
OPENAI_BASE_URL=https://gateway.controlzero.ai/v1
Add the same Control Zero headers as above.
That is it. No SDK installation, no code changes. Your existing agent code keeps working. The gateway enforces your policies transparently.
What the agent receives when a tool call is blocked
When the gateway blocks a request, the agent receives an HTTP 403 response with a JSON body:
{
"error": "policy_denied",
"reason": "<human-readable reason from the matching policy>",
"policy_id": "<id of the policy that matched>"
}
Handle this in your agent code the same way you'd handle an API error from the LLM provider.
Features
Pre-flight Request Guard
Before forwarding to the LLM provider, the gateway runs these checks:
- Model blocking: Deny requests to unauthorized models (e.g., block agents from using expensive models).
- Cost estimation: Reject if estimated token cost exceeds your budget cap.
- PII detection: Detect, mask, or block PII in prompts. Configurable via
CZ_GATEWAY_PII_ACTION(detect,mask, orblock).
If the policy engine is unavailable and fail_closed is enabled (the default), all requests are blocked. No silent failures.
Response-Side DLP
The gateway performs bidirectional PII scanning: both requests sent to the LLM and responses received from it are inspected. Response-side DLP catches model outputs that contain sensitive data such as credit card numbers, national ID numbers, health records, or API keys.
Three modes are available, controlled by CZ_GATEWAY_PII_ACTION:
| Mode | Behavior |
|---|---|
detect | Log the finding in the audit trail but return the response unmodified. |
mask | Replace detected PII with redaction tokens (e.g., [REDACTED:SSN]). |
block | Return a policy denial response and drop the entire LLM output. |
Response DLP supports 59 patterns across 6 locales. See Locale-Aware DLP for details on locale configuration and pattern coverage.
Rate Limiting
The gateway enforces configurable rate limits at three scopes:
| Scope | Environment Variable | Default | Description |
|---|---|---|---|
| Per-user | CZ_GATEWAY_RATE_LIMIT_PER_USER | 100 | Maximum requests per minute per user identity. |
| Per-org | CZ_GATEWAY_RATE_LIMIT_PER_ORG | 1000 | Maximum requests per minute per organization. |
| Per-provider | CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER | 500 | Maximum requests per minute per upstream provider (fallback when no per-provider override is set). |
Per-provider overrides are supported via CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME>, for example CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_OPENAI=2000 or CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_ANTHROPIC=1500. When a request's provider has a specific override set, that value is used; otherwise the fallback CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER applies.
Rate limit state is stored in an in-memory cache (CZ_GATEWAY_REDIS_URL, default redis://localhost:6379 — Redis-compatible URL scheme) and shared across gateway instances using a 60-second sliding window. When the cache is unreachable, rate limiting fails open and requests are allowed through with a warning logged.
When a limit is exceeded, the gateway returns HTTP 429 with a Retry-After header along with X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.
Correlation IDs
Every request processed by the gateway is assigned a correlation ID. If the caller includes an X-Request-ID header, the gateway preserves it. Otherwise, a new unique ID is generated.
The correlation ID is:
- Returned in the
X-Request-IDresponse header. - Included in every audit log entry for the request.
- Propagated to upstream LLM providers where supported.
Use correlation IDs to trace a single request across your agent, the gateway, and the LLM provider.
Prometheus Metrics
The gateway exposes a /metrics endpoint in Prometheus exposition format. Scrape it with any Prometheus-compatible collector.
Available metric families:
| Metric | Type | Description |
|---|---|---|
cz_gateway_requests_total | Counter | Total requests by provider, model, and status code. |
cz_gateway_request_duration_seconds | Histogram | Request latency distribution. |
cz_gateway_policy_evaluations_total | Counter | Policy evaluations by decision (allow/deny). |
cz_gateway_pii_detections_total | Counter | PII detections by type and direction (req/resp). |
cz_gateway_rate_limit_hits_total | Counter | Rate limit rejections by scope. |
cz_gateway_upstream_errors_total | Counter | Upstream provider errors by provider and status. |
cz_gateway_active_connections | Gauge | Current active connections. |
Enable or disable the metrics endpoint with CZ_GATEWAY_METRICS_ENABLED (default: true). Change the listen port with CZ_GATEWAY_METRICS_PORT (default: 9090).
Scrape the /metrics endpoint with Prometheus and build Grafana dashboards from the counters and histograms above.
Tool Call Interception
After the LLM responds:
- Every
tool_useblock (Anthropic) orfunction_call(OpenAI) is evaluated against your policies. - Denied tool calls are replaced inline with a policy denial message.
- Each decision is logged separately for auditing.
- Both streaming and non-streaming responses are supported.
Supported Providers
| Provider | Gateway Path | Protocol |
|---|---|---|
| Anthropic (Claude) | /v1/messages | Anthropic Messages API |
| OpenAI (GPT) | /v1/chat/completions | OpenAI Chat Completions |
| Google AI (Gemini) | /google/v1/chat/completions | OpenAI-compatible |
| Ollama | /ollama/v1/chat/completions | OpenAI-compatible |
| DeepSeek | /deepseek/chat/completions | OpenAI-compatible |
| MoonshotAI | /moonshot/v1/chat/completions | OpenAI-compatible |
| HuggingFace TGI | /huggingface/v1/chat/completions | OpenAI-compatible (no tool interception) |
| Mistral | /mistral/v1/chat/completions | OpenAI-compatible |
| Cohere | /cohere/v1/chat/completions | OpenAI-compatible |
Google AI, Ollama, DeepSeek, MoonshotAI, HuggingFace, Mistral, and Cohere are disabled by default. Enable them with environment variables:
CZ_GATEWAY_GOOGLE_ENABLED=true
CZ_GATEWAY_OLLAMA_ENABLED=true
CZ_GATEWAY_DEEPSEEK_ENABLED=true
CZ_GATEWAY_MOONSHOT_ENABLED=true
CZ_GATEWAY_HUGGINGFACE_ENABLED=true
CZ_GATEWAY_MISTRAL_ENABLED=true
CZ_GATEWAY_COHERE_ENABLED=true
Identity and Context Headers
| Header | Required | Description |
|---|---|---|
X-ControlZero-Agent-ID | Yes | Identifies the agent making the call |
X-ControlZero-API-Key | Yes | Control Zero API key (cz_live_ or cz_test_) |
X-ControlZero-Identity-Token | Optional | JWT with user claims for principal resolution |
X-ControlZero-User-ID | Optional | User identifier for policy scoping |
X-ControlZero-User-Group | Optional | User group for RBAC policy evaluation |
Fail-Closed Mode
If the gateway cannot reach the Control Zero backend, or the policy bundle is expired or tampered with, it blocks ALL requests by default. This is controlled by the CZ_GATEWAY_FAIL_CLOSED setting (default: true).
The gateway also runs periodic integrity self-checks on the loaded policy bundle (configurable via CZ_GATEWAY_INTEGRITY_CHECK_INTERVAL_SECONDS, default: 60s). If the bundle checksum fails, traffic is blocked and an alert is sent.
Tamper Detection
Policy bundles are encrypted at rest and cryptographically signed. The gateway verifies the signature and checksum on every load and at regular intervals. Tampering triggers fail-closed mode and an alert.
Audit Logging
Every proxied request is logged to the immutable audit trail with:
- Provider, model, and token usage
- Tool calls detected in the response
- Policy decisions (allow/deny) for each tool call
- Latency, status codes, and error information
- Agent ID and user identity context
Self-Hosted Deployment
The gateway runs as a Docker container:
docker run -d \
-p 8000:8000 \
-e CZ_GATEWAY_CZ_API_KEY=cz_live_xxx \
-e CZ_GATEWAY_CZ_BACKEND_URL=https://api.controlzero.ai \
-e CZ_GATEWAY_ANTHROPIC_API_KEY=sk-ant-xxx \
-e CZ_GATEWAY_OPENAI_API_KEY=sk-xxx \
controlzero/gateway:latest
Environment Variables
All gateway settings use the CZ_GATEWAY_ prefix:
| Variable | Default | Description |
|---|---|---|
CZ_GATEWAY_CZ_API_KEY | (required) | Your Control Zero API key |
CZ_GATEWAY_CZ_BACKEND_URL | http://control-zero-backend:8080 | Control Zero backend URL |
CZ_GATEWAY_ANTHROPIC_API_URL | https://api.anthropic.com | Anthropic upstream URL |
CZ_GATEWAY_ANTHROPIC_API_KEY | (empty) | Anthropic API key (injected if set) |
CZ_GATEWAY_OPENAI_API_URL | https://api.openai.com | OpenAI upstream URL |
CZ_GATEWAY_OPENAI_API_KEY | (empty) | OpenAI API key (injected if set) |
CZ_GATEWAY_FAIL_CLOSED | true | Block all traffic when policies unavailable |
CZ_GATEWAY_ENFORCE_TOOL_POLICIES | true | Enforce policies on tool calls (false = shadow mode) |
CZ_GATEWAY_ENFORCE_LLM_POLICIES | true | Enforce pre-flight checks (model/cost/PII) |
CZ_GATEWAY_PII_ACTION | detect | PII handling: detect, mask, or block |
CZ_GATEWAY_POLICY_REFRESH_INTERVAL_SECONDS | 300 | How often to re-pull policies |
CZ_GATEWAY_POLICY_MAX_AGE_SECONDS | 86400 | Max bundle age before fail-closed |
CZ_GATEWAY_ALERT_WEBHOOK_URL | (empty) | Slack webhook for tamper/failure alerts |
CZ_GATEWAY_DLP_LOCALES | default | Comma-separated DLP locales (see Locale-Aware DLP) |
CZ_GATEWAY_RATE_LIMIT_PER_USER | 100 | Per-user rate limit (requests per minute) |
CZ_GATEWAY_RATE_LIMIT_PER_ORG | 1000 | Per-org rate limit (requests per minute) |
CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER | 500 | Per-provider rate limit fallback (requests per minute) |
CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME> | (unset) | Per-provider override (e.g. _OPENAI, _ANTHROPIC) |
CZ_GATEWAY_REDIS_URL | redis://localhost:6379 | URL of the in-memory cache backing the sliding-window rate limiter |
CZ_GATEWAY_METRICS_ENABLED | true | Enable Prometheus metrics endpoint |
CZ_GATEWAY_METRICS_PORT | 9090 | Metrics endpoint listen port |
CZ_GATEWAY_GOOGLE_API_KEY | (empty) | Google AI API key (injected if set) |
CZ_GATEWAY_MISTRAL_API_KEY | (empty) | Mistral API key (injected if set) |
CZ_GATEWAY_COHERE_API_KEY | (empty) | Cohere API key (injected if set) |
CZ_GATEWAY_PORT | 8000 | Gateway listen port |
Shadow Mode
Set CZ_GATEWAY_ENFORCE_TOOL_POLICIES=false to run in shadow mode. The gateway evaluates every tool call against policies and logs the decision, but does not modify responses. Use this to audit what would be blocked before enabling enforcement.
Docker Compose
services:
cz-gateway:
image: controlzero/gateway:latest
ports:
- '8000:8000'
environment:
CZ_GATEWAY_CZ_API_KEY: cz_live_xxx
CZ_GATEWAY_CZ_BACKEND_URL: https://api.controlzero.ai
CZ_GATEWAY_ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
CZ_GATEWAY_OPENAI_API_KEY: ${OPENAI_API_KEY}
restart: unless-stopped
Gateway vs SDK
| Gateway | SDK | |
|---|---|---|
| Code changes | None. Change base URL only | Install package, wrap tool calls |
| Works with | Any agent that calls LLM APIs | Python, Node.js, Go |
| Enforcement point | Network layer (proxy) | Application layer (in-process) |
| Latency | Network hop to gateway | Local, in-process evaluation |
| Best for | Existing agents, quick rollout | New agents, tightest integration |
Both approaches enforce the same policies defined in your dashboard. You can use them together. The gateway handles LLM-level enforcement while the SDK handles application-level tool governance.
Multi-tenant mode: per-request API keys
For platforms proxying requests on behalf of multiple Control Zero customers, each request can carry its own project API key via the X-ControlZero-API-Key header. The gateway resolves project context per request and applies the corresponding policy bundle, cached for 5 minutes.
curl https://gateway.controlzero.ai/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "X-ControlZero-API-Key: cz_live_tenant_specific_key" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'
When the header is absent, the gateway falls back to its configured CZ_GATEWAY_CZ_API_KEY.
Next Steps
- Quick Start: Get up and running in 5 minutes.
- Policies: Learn how to write policies.
- Locale-Aware DLP: Configure region-specific PII detection patterns.
- Governing MCP tool calls: Govern MCP server and tool access.
- CLI Scanner: Scan projects for governance gaps in CI/CD.