Skip to main content

Gateway Proxy

Supported modes: Hosted Hybrid Available in: Free Solo Teams

When NOT to use this

If you are building a new AI app from scratch, the SDK gives you finer control than the gateway. If you are governing developer tools (Claude Code, Cursor, Codex CLI), use coding hooks instead -- they intercept tool calls at the assistant level, not at the LLM API level.

Control Zero Gateway is a transparent proxy that sits between your AI agents and LLM providers. It intercepts every request, evaluates tool calls against your policies, and blocks unauthorized actions, all without changing your agent's code.

How It Works

The gateway operates in two phases:

  1. Pre-flight (request guard): Before forwarding to the LLM, the gateway checks model blocking rules, estimates cost against budget caps, and scans for PII in prompts.
  2. Response interception: After the LLM responds, every tool_use (Anthropic) or function_call (OpenAI) block is evaluated against your policies. Denied tool calls are replaced inline with a policy denial message. Both streaming and non-streaming responses are supported.

Quick Start

Change your LLM provider base URL to point to the Control Zero gateway:

Anthropic (Claude)

# Before
ANTHROPIC_BASE_URL=https://api.anthropic.com

# After
ANTHROPIC_BASE_URL=https://gateway.controlzero.ai

Add the headers:

X-ControlZero-API-Key: cz_live_xxx
X-ControlZero-Agent-ID: my-first-agent
  • X-ControlZero-API-Key (required) is your project key from the dashboard.
  • X-ControlZero-Agent-ID (optional) labels the caller for audit attribution. Defaults to <provider>-direct (e.g. anthropic-direct) if omitted.

OpenAI

# Before
OPENAI_BASE_URL=https://api.openai.com

# After
OPENAI_BASE_URL=https://gateway.controlzero.ai/v1

Add the same Control Zero headers as above.

That is it. No SDK installation, no code changes. Your existing agent code keeps working. The gateway enforces your policies transparently.

What the agent receives when a tool call is blocked

When the gateway blocks a request, the agent receives an HTTP 403 response with a JSON body:

{
"error": "policy_denied",
"reason": "<human-readable reason from the matching policy>",
"policy_id": "<id of the policy that matched>"
}

Handle this in your agent code the same way you'd handle an API error from the LLM provider.

Features

Pre-flight Request Guard

Before forwarding to the LLM provider, the gateway runs these checks:

  • Model blocking: Deny requests to unauthorized models (e.g., block agents from using expensive models).
  • Cost estimation: Reject if estimated token cost exceeds your budget cap.
  • PII detection: Detect, mask, or block PII in prompts. Configurable via CZ_GATEWAY_PII_ACTION (detect, mask, or block).

If the policy engine is unavailable and fail_closed is enabled (the default), all requests are blocked. No silent failures.

Response-Side DLP

The gateway performs bidirectional PII scanning: both requests sent to the LLM and responses received from it are inspected. Response-side DLP catches model outputs that contain sensitive data such as credit card numbers, national ID numbers, health records, or API keys.

Three modes are available, controlled by CZ_GATEWAY_PII_ACTION:

ModeBehavior
detectLog the finding in the audit trail but return the response unmodified.
maskReplace detected PII with redaction tokens (e.g., [REDACTED:SSN]).
blockReturn a policy denial response and drop the entire LLM output.

Response DLP supports 59 patterns across 6 locales. See Locale-Aware DLP for details on locale configuration and pattern coverage.

Rate Limiting

The gateway enforces configurable rate limits at three scopes:

ScopeEnvironment VariableDefaultDescription
Per-userCZ_GATEWAY_RATE_LIMIT_PER_USER100Maximum requests per minute per user identity.
Per-orgCZ_GATEWAY_RATE_LIMIT_PER_ORG1000Maximum requests per minute per organization.
Per-providerCZ_GATEWAY_RATE_LIMIT_PER_PROVIDER500Maximum requests per minute per upstream provider (fallback when no per-provider override is set).

Per-provider overrides are supported via CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME>, for example CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_OPENAI=2000 or CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_ANTHROPIC=1500. When a request's provider has a specific override set, that value is used; otherwise the fallback CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER applies.

Rate limit state is stored in an in-memory cache (CZ_GATEWAY_REDIS_URL, default redis://localhost:6379 — Redis-compatible URL scheme) and shared across gateway instances using a 60-second sliding window. When the cache is unreachable, rate limiting fails open and requests are allowed through with a warning logged.

When a limit is exceeded, the gateway returns HTTP 429 with a Retry-After header along with X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.

Correlation IDs

Every request processed by the gateway is assigned a correlation ID. If the caller includes an X-Request-ID header, the gateway preserves it. Otherwise, a new unique ID is generated.

The correlation ID is:

  • Returned in the X-Request-ID response header.
  • Included in every audit log entry for the request.
  • Propagated to upstream LLM providers where supported.

Use correlation IDs to trace a single request across your agent, the gateway, and the LLM provider.

Prometheus Metrics

The gateway exposes a /metrics endpoint in Prometheus exposition format. Scrape it with any Prometheus-compatible collector.

Available metric families:

MetricTypeDescription
cz_gateway_requests_totalCounterTotal requests by provider, model, and status code.
cz_gateway_request_duration_secondsHistogramRequest latency distribution.
cz_gateway_policy_evaluations_totalCounterPolicy evaluations by decision (allow/deny).
cz_gateway_pii_detections_totalCounterPII detections by type and direction (req/resp).
cz_gateway_rate_limit_hits_totalCounterRate limit rejections by scope.
cz_gateway_upstream_errors_totalCounterUpstream provider errors by provider and status.
cz_gateway_active_connectionsGaugeCurrent active connections.

Enable or disable the metrics endpoint with CZ_GATEWAY_METRICS_ENABLED (default: true). Change the listen port with CZ_GATEWAY_METRICS_PORT (default: 9090).

Scrape the /metrics endpoint with Prometheus and build Grafana dashboards from the counters and histograms above.

Tool Call Interception

After the LLM responds:

  • Every tool_use block (Anthropic) or function_call (OpenAI) is evaluated against your policies.
  • Denied tool calls are replaced inline with a policy denial message.
  • Each decision is logged separately for auditing.
  • Both streaming and non-streaming responses are supported.

Supported Providers

ProviderGateway PathProtocol
Anthropic (Claude)/v1/messagesAnthropic Messages API
OpenAI (GPT)/v1/chat/completionsOpenAI Chat Completions
Google AI (Gemini)/google/v1/chat/completionsOpenAI-compatible
Ollama/ollama/v1/chat/completionsOpenAI-compatible
DeepSeek/deepseek/chat/completionsOpenAI-compatible
MoonshotAI/moonshot/v1/chat/completionsOpenAI-compatible
HuggingFace TGI/huggingface/v1/chat/completionsOpenAI-compatible (no tool interception)
Mistral/mistral/v1/chat/completionsOpenAI-compatible
Cohere/cohere/v1/chat/completionsOpenAI-compatible

Google AI, Ollama, DeepSeek, MoonshotAI, HuggingFace, Mistral, and Cohere are disabled by default. Enable them with environment variables:

CZ_GATEWAY_GOOGLE_ENABLED=true
CZ_GATEWAY_OLLAMA_ENABLED=true
CZ_GATEWAY_DEEPSEEK_ENABLED=true
CZ_GATEWAY_MOONSHOT_ENABLED=true
CZ_GATEWAY_HUGGINGFACE_ENABLED=true
CZ_GATEWAY_MISTRAL_ENABLED=true
CZ_GATEWAY_COHERE_ENABLED=true

Identity and Context Headers

HeaderRequiredDescription
X-ControlZero-Agent-IDYesIdentifies the agent making the call
X-ControlZero-API-KeyYesControl Zero API key (cz_live_ or cz_test_)
X-ControlZero-Identity-TokenOptionalJWT with user claims for principal resolution
X-ControlZero-User-IDOptionalUser identifier for policy scoping
X-ControlZero-User-GroupOptionalUser group for RBAC policy evaluation

Fail-Closed Mode

If the gateway cannot reach the Control Zero backend, or the policy bundle is expired or tampered with, it blocks ALL requests by default. This is controlled by the CZ_GATEWAY_FAIL_CLOSED setting (default: true).

The gateway also runs periodic integrity self-checks on the loaded policy bundle (configurable via CZ_GATEWAY_INTEGRITY_CHECK_INTERVAL_SECONDS, default: 60s). If the bundle checksum fails, traffic is blocked and an alert is sent.

Tamper Detection

Policy bundles are encrypted at rest and cryptographically signed. The gateway verifies the signature and checksum on every load and at regular intervals. Tampering triggers fail-closed mode and an alert.

Audit Logging

Every proxied request is logged to the immutable audit trail with:

  • Provider, model, and token usage
  • Tool calls detected in the response
  • Policy decisions (allow/deny) for each tool call
  • Latency, status codes, and error information
  • Agent ID and user identity context

Self-Hosted Deployment

The gateway runs as a Docker container:

docker run -d \
-p 8000:8000 \
-e CZ_GATEWAY_CZ_API_KEY=cz_live_xxx \
-e CZ_GATEWAY_CZ_BACKEND_URL=https://api.controlzero.ai \
-e CZ_GATEWAY_ANTHROPIC_API_KEY=sk-ant-xxx \
-e CZ_GATEWAY_OPENAI_API_KEY=sk-xxx \
controlzero/gateway:latest

Environment Variables

All gateway settings use the CZ_GATEWAY_ prefix:

VariableDefaultDescription
CZ_GATEWAY_CZ_API_KEY(required)Your Control Zero API key
CZ_GATEWAY_CZ_BACKEND_URLhttp://control-zero-backend:8080Control Zero backend URL
CZ_GATEWAY_ANTHROPIC_API_URLhttps://api.anthropic.comAnthropic upstream URL
CZ_GATEWAY_ANTHROPIC_API_KEY(empty)Anthropic API key (injected if set)
CZ_GATEWAY_OPENAI_API_URLhttps://api.openai.comOpenAI upstream URL
CZ_GATEWAY_OPENAI_API_KEY(empty)OpenAI API key (injected if set)
CZ_GATEWAY_FAIL_CLOSEDtrueBlock all traffic when policies unavailable
CZ_GATEWAY_ENFORCE_TOOL_POLICIEStrueEnforce policies on tool calls (false = shadow mode)
CZ_GATEWAY_ENFORCE_LLM_POLICIEStrueEnforce pre-flight checks (model/cost/PII)
CZ_GATEWAY_PII_ACTIONdetectPII handling: detect, mask, or block
CZ_GATEWAY_POLICY_REFRESH_INTERVAL_SECONDS300How often to re-pull policies
CZ_GATEWAY_POLICY_MAX_AGE_SECONDS86400Max bundle age before fail-closed
CZ_GATEWAY_ALERT_WEBHOOK_URL(empty)Slack webhook for tamper/failure alerts
CZ_GATEWAY_DLP_LOCALESdefaultComma-separated DLP locales (see Locale-Aware DLP)
CZ_GATEWAY_RATE_LIMIT_PER_USER100Per-user rate limit (requests per minute)
CZ_GATEWAY_RATE_LIMIT_PER_ORG1000Per-org rate limit (requests per minute)
CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER500Per-provider rate limit fallback (requests per minute)
CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME>(unset)Per-provider override (e.g. _OPENAI, _ANTHROPIC)
CZ_GATEWAY_REDIS_URLredis://localhost:6379URL of the in-memory cache backing the sliding-window rate limiter
CZ_GATEWAY_METRICS_ENABLEDtrueEnable Prometheus metrics endpoint
CZ_GATEWAY_METRICS_PORT9090Metrics endpoint listen port
CZ_GATEWAY_GOOGLE_API_KEY(empty)Google AI API key (injected if set)
CZ_GATEWAY_MISTRAL_API_KEY(empty)Mistral API key (injected if set)
CZ_GATEWAY_COHERE_API_KEY(empty)Cohere API key (injected if set)
CZ_GATEWAY_PORT8000Gateway listen port

Shadow Mode

Set CZ_GATEWAY_ENFORCE_TOOL_POLICIES=false to run in shadow mode. The gateway evaluates every tool call against policies and logs the decision, but does not modify responses. Use this to audit what would be blocked before enabling enforcement.

Docker Compose

services:
cz-gateway:
image: controlzero/gateway:latest
ports:
- '8000:8000'
environment:
CZ_GATEWAY_CZ_API_KEY: cz_live_xxx
CZ_GATEWAY_CZ_BACKEND_URL: https://api.controlzero.ai
CZ_GATEWAY_ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
CZ_GATEWAY_OPENAI_API_KEY: ${OPENAI_API_KEY}
restart: unless-stopped

Gateway vs SDK

GatewaySDK
Code changesNone. Change base URL onlyInstall package, wrap tool calls
Works withAny agent that calls LLM APIsPython, Node.js, Go
Enforcement pointNetwork layer (proxy)Application layer (in-process)
LatencyNetwork hop to gatewayLocal, in-process evaluation
Best forExisting agents, quick rolloutNew agents, tightest integration

Both approaches enforce the same policies defined in your dashboard. You can use them together. The gateway handles LLM-level enforcement while the SDK handles application-level tool governance.

Multi-tenant mode: per-request API keys

For platforms proxying requests on behalf of multiple Control Zero customers, each request can carry its own project API key via the X-ControlZero-API-Key header. The gateway resolves project context per request and applies the corresponding policy bundle, cached for 5 minutes.

curl https://gateway.controlzero.ai/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "X-ControlZero-API-Key: cz_live_tenant_specific_key" \
-H "anthropic-version: 2023-06-01" \
-d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'

When the header is absent, the gateway falls back to its configured CZ_GATEWAY_CZ_API_KEY.

Next Steps