Gateway Proxy

Supported modes: Hosted Hybrid Available in: Free Solo Teams

When NOT to use this

If you are building a new AI app from scratch, the SDK gives you finer control than the gateway. If you are governing developer tools (Claude Code, Cursor, Codex CLI), use coding hooks instead -- they intercept tool calls at the assistant level, not at the LLM API level.

Control Zero Gateway is a transparent proxy that sits between your AI agents and LLM providers. It intercepts every request, evaluates tool calls against your policies, and blocks unauthorized actions, all without changing your agent's code.

How It Works

The gateway operates in two phases:

Pre-flight (request guard): Before forwarding to the LLM, the gateway checks model blocking rules, estimates cost against budget caps, and scans for PII in prompts.
Response interception: After the LLM responds, every tool_use (Anthropic) or function_call (OpenAI) block is evaluated against your policies. Denied tool calls are replaced inline with a policy denial message. Both streaming and non-streaming responses are supported.

Quick Start

Change your LLM provider base URL to point to the Control Zero gateway:

Anthropic (Claude)

# Before
ANTHROPIC_BASE_URL=https://api.anthropic.com

# After
ANTHROPIC_BASE_URL=https://gateway.controlzero.ai

Add the headers:

X-ControlZero-API-Key: cz_live_xxx
X-ControlZero-Agent-ID: my-first-agent

X-ControlZero-API-Key (required) is your project key from the dashboard.
X-ControlZero-Agent-ID (optional) labels the caller for audit attribution. Defaults to <provider>-direct (e.g. anthropic-direct) if omitted.

OpenAI

# Before
OPENAI_BASE_URL=https://api.openai.com

# After
OPENAI_BASE_URL=https://gateway.controlzero.ai/v1

Add the same Control Zero headers as above.

That is it. No SDK installation, no code changes. Your existing agent code keeps working. The gateway enforces your policies transparently.

What the agent receives when a tool call is blocked

When the gateway blocks a request, the agent receives an HTTP 403 response with a JSON body:

{
  "error": "policy_denied",
  "reason": "<human-readable reason from the matching policy>",
  "policy_id": "<id of the policy that matched>"
}

Handle this in your agent code the same way you'd handle an API error from the LLM provider.

Features

Pre-flight Request Guard

Before forwarding to the LLM provider, the gateway runs these checks:

Model blocking: Deny requests to unauthorized models (e.g., block agents from using expensive models).
Cost estimation: Reject if estimated token cost exceeds your budget cap.
PII detection: Detect, mask, or block PII in prompts. Configurable via CZ_GATEWAY_PII_ACTION (detect, mask, or block).

If the policy engine is unavailable and fail_closed is enabled (the default), all requests are blocked. No silent failures.

Response-Side DLP

The gateway performs bidirectional PII scanning: both requests sent to the LLM and responses received from it are inspected. Response-side DLP catches model outputs that contain sensitive data such as credit card numbers, national ID numbers, health records, or API keys.

Three modes are available, controlled by CZ_GATEWAY_PII_ACTION:

Mode	Behavior
`detect`	Log the finding in the audit trail but return the response unmodified.
`mask`	Replace detected PII with redaction tokens (e.g., `[REDACTED:SSN]`).
`block`	Return a policy denial response and drop the entire LLM output.

Response DLP supports 59 patterns across 6 locales. See Locale-Aware DLP for details on locale configuration and pattern coverage.

Rate Limiting

The gateway enforces configurable rate limits at three scopes:

Scope	Environment Variable	Default	Description
Per-user	`CZ_GATEWAY_RATE_LIMIT_PER_USER`	`100`	Maximum requests per minute per user identity.
Per-org	`CZ_GATEWAY_RATE_LIMIT_PER_ORG`	`1000`	Maximum requests per minute per organization.
Per-provider	`CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER`	`500`	Maximum requests per minute per upstream provider (fallback when no per-provider override is set).

Per-provider overrides are supported via CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME>, for example CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_OPENAI=2000 or CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_ANTHROPIC=1500. When a request's provider has a specific override set, that value is used; otherwise the fallback CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER applies.

Rate limit state is stored in an in-memory cache (CZ_GATEWAY_REDIS_URL, default redis://localhost:6379 — Redis-compatible URL scheme) and shared across gateway instances using a 60-second sliding window. When the cache is unreachable, rate limiting fails open and requests are allowed through with a warning logged.

When a limit is exceeded, the gateway returns HTTP 429 with a Retry-After header along with X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers.

Correlation IDs

Every request processed by the gateway is assigned a correlation ID. If the caller includes an X-Request-ID header, the gateway preserves it. Otherwise, a new unique ID is generated.

The correlation ID is:

Returned in the X-Request-ID response header.
Included in every audit log entry for the request.
Propagated to upstream LLM providers where supported.

Use correlation IDs to trace a single request across your agent, the gateway, and the LLM provider.

Prometheus Metrics

The gateway exposes a /metrics endpoint in Prometheus exposition format. Scrape it with any Prometheus-compatible collector.

Available metric families:

Metric	Type	Description
`cz_gateway_requests_total`	Counter	Total requests by provider, model, and status code.
`cz_gateway_request_duration_seconds`	Histogram	Request latency distribution.
`cz_gateway_policy_evaluations_total`	Counter	Policy evaluations by decision (allow/deny).
`cz_gateway_pii_detections_total`	Counter	PII detections by type and direction (req/resp).
`cz_gateway_rate_limit_hits_total`	Counter	Rate limit rejections by scope.
`cz_gateway_upstream_errors_total`	Counter	Upstream provider errors by provider and status.
`cz_gateway_active_connections`	Gauge	Current active connections.

Enable or disable the metrics endpoint with CZ_GATEWAY_METRICS_ENABLED (default: true). Change the listen port with CZ_GATEWAY_METRICS_PORT (default: 9090).

Scrape the /metrics endpoint with Prometheus and build Grafana dashboards from the counters and histograms above.

Tool Call Interception

After the LLM responds:

Every tool_use block (Anthropic) or function_call (OpenAI) is evaluated against your policies.
Denied tool calls are replaced inline with a policy denial message.
Each decision is logged separately for auditing.
Both streaming and non-streaming responses are supported.

Supported Providers

Provider	Gateway Path	Protocol
Anthropic (Claude)	`/v1/messages`	Anthropic Messages API
OpenAI (GPT)	`/v1/chat/completions`	OpenAI Chat Completions
Google AI (Gemini)	`/google/v1/chat/completions`	OpenAI-compatible
Ollama	`/ollama/v1/chat/completions`	OpenAI-compatible
DeepSeek	`/deepseek/chat/completions`	OpenAI-compatible
MoonshotAI	`/moonshot/v1/chat/completions`	OpenAI-compatible
HuggingFace TGI	`/huggingface/v1/chat/completions`	OpenAI-compatible (no tool interception)
Mistral	`/mistral/v1/chat/completions`	OpenAI-compatible
Cohere	`/cohere/v1/chat/completions`	OpenAI-compatible

Google AI, Ollama, DeepSeek, MoonshotAI, HuggingFace, Mistral, and Cohere are disabled by default. Enable them with environment variables:

CZ_GATEWAY_GOOGLE_ENABLED=true
CZ_GATEWAY_OLLAMA_ENABLED=true
CZ_GATEWAY_DEEPSEEK_ENABLED=true
CZ_GATEWAY_MOONSHOT_ENABLED=true
CZ_GATEWAY_HUGGINGFACE_ENABLED=true
CZ_GATEWAY_MISTRAL_ENABLED=true
CZ_GATEWAY_COHERE_ENABLED=true

Identity and Context Headers

Header	Required	Description
`X-ControlZero-Agent-ID`	Yes	Identifies the agent making the call
`X-ControlZero-API-Key`	Yes	Control Zero API key (`cz_live_` or `cz_test_`)
`X-ControlZero-Identity-Token`	Optional	JWT with user claims for principal resolution
`X-ControlZero-User-ID`	Optional	User identifier for policy scoping
`X-ControlZero-User-Group`	Optional	User group for RBAC policy evaluation

Fail-Closed Mode

If the gateway cannot reach the Control Zero backend, or the policy bundle is expired or tampered with, it blocks ALL requests by default. This is controlled by the CZ_GATEWAY_FAIL_CLOSED setting (default: true).

The gateway also runs periodic integrity self-checks on the loaded policy bundle (configurable via CZ_GATEWAY_INTEGRITY_CHECK_INTERVAL_SECONDS, default: 60s). If the bundle checksum fails, traffic is blocked and an alert is sent.

Tamper Detection

Policy bundles are encrypted at rest and cryptographically signed. The gateway verifies the signature and checksum on every load and at regular intervals. Tampering triggers fail-closed mode and an alert.

Audit Logging

Every proxied request is logged to the immutable audit trail with:

Provider, model, and token usage
Tool calls detected in the response
Policy decisions (allow/deny) for each tool call
Latency, status codes, and error information
Agent ID and user identity context

Self-Hosted Deployment

The gateway runs as a Docker container:

docker run -d \
  -p 8000:8000 \
  -e CZ_GATEWAY_CZ_API_KEY=cz_live_xxx \
  -e CZ_GATEWAY_CZ_BACKEND_URL=https://api.controlzero.ai \
  -e CZ_GATEWAY_ANTHROPIC_API_KEY=sk-ant-xxx \
  -e CZ_GATEWAY_OPENAI_API_KEY=sk-xxx \
  controlzero/gateway:latest

Environment Variables

All gateway settings use the CZ_GATEWAY_ prefix:

Variable	Default	Description
`CZ_GATEWAY_CZ_API_KEY`	(required)	Your Control Zero API key
`CZ_GATEWAY_CZ_BACKEND_URL`	`http://control-zero-backend:8080`	Control Zero backend URL
`CZ_GATEWAY_ANTHROPIC_API_URL`	`https://api.anthropic.com`	Anthropic upstream URL
`CZ_GATEWAY_ANTHROPIC_API_KEY`	(empty)	Anthropic API key (injected if set)
`CZ_GATEWAY_OPENAI_API_URL`	`https://api.openai.com`	OpenAI upstream URL
`CZ_GATEWAY_OPENAI_API_KEY`	(empty)	OpenAI API key (injected if set)
`CZ_GATEWAY_FAIL_CLOSED`	`true`	Block all traffic when policies unavailable
`CZ_GATEWAY_ENFORCE_TOOL_POLICIES`	`true`	Enforce policies on tool calls (`false` = shadow mode)
`CZ_GATEWAY_ENFORCE_LLM_POLICIES`	`true`	Enforce pre-flight checks (model/cost/PII)
`CZ_GATEWAY_PII_ACTION`	`detect`	PII handling: `detect`, `mask`, or `block`
`CZ_GATEWAY_POLICY_REFRESH_INTERVAL_SECONDS`	`300`	How often to re-pull policies
`CZ_GATEWAY_POLICY_MAX_AGE_SECONDS`	`86400`	Max bundle age before fail-closed
`CZ_GATEWAY_ALERT_WEBHOOK_URL`	(empty)	Slack webhook for tamper/failure alerts
`CZ_GATEWAY_DLP_LOCALES`	`default`	Comma-separated DLP locales (see Locale-Aware DLP)
`CZ_GATEWAY_RATE_LIMIT_PER_USER`	`100`	Per-user rate limit (requests per minute)
`CZ_GATEWAY_RATE_LIMIT_PER_ORG`	`1000`	Per-org rate limit (requests per minute)
`CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER`	`500`	Per-provider rate limit fallback (requests per minute)
`CZ_GATEWAY_RATE_LIMIT_PER_PROVIDER_<NAME>`	(unset)	Per-provider override (e.g. `_OPENAI`, `_ANTHROPIC`)
`CZ_GATEWAY_REDIS_URL`	`redis://localhost:6379`	URL of the in-memory cache backing the sliding-window rate limiter
`CZ_GATEWAY_METRICS_ENABLED`	`true`	Enable Prometheus metrics endpoint
`CZ_GATEWAY_METRICS_PORT`	`9090`	Metrics endpoint listen port
`CZ_GATEWAY_GOOGLE_API_KEY`	(empty)	Google AI API key (injected if set)
`CZ_GATEWAY_MISTRAL_API_KEY`	(empty)	Mistral API key (injected if set)
`CZ_GATEWAY_COHERE_API_KEY`	(empty)	Cohere API key (injected if set)
`CZ_GATEWAY_PORT`	`8000`	Gateway listen port

Shadow Mode

Set CZ_GATEWAY_ENFORCE_TOOL_POLICIES=false to run in shadow mode. The gateway evaluates every tool call against policies and logs the decision, but does not modify responses. Use this to audit what would be blocked before enabling enforcement.

Docker Compose

services:
  cz-gateway:
    image: controlzero/gateway:latest
    ports:
      - '8000:8000'
    environment:
      CZ_GATEWAY_CZ_API_KEY: cz_live_xxx
      CZ_GATEWAY_CZ_BACKEND_URL: https://api.controlzero.ai
      CZ_GATEWAY_ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      CZ_GATEWAY_OPENAI_API_KEY: ${OPENAI_API_KEY}
    restart: unless-stopped

Gateway vs SDK

	Gateway	SDK
Code changes	None. Change base URL only	Install package, wrap tool calls
Works with	Any agent that calls LLM APIs	Python, Node.js, Go
Enforcement point	Network layer (proxy)	Application layer (in-process)
Latency	Network hop to gateway	Local, in-process evaluation
Best for	Existing agents, quick rollout	New agents, tightest integration

Both approaches enforce the same policies defined in your dashboard. You can use them together. The gateway handles LLM-level enforcement while the SDK handles application-level tool governance.

Multi-tenant mode: per-request API keys

For platforms proxying requests on behalf of multiple Control Zero customers, each request can carry its own project API key via the X-ControlZero-API-Key header. The gateway resolves project context per request and applies the corresponding policy bundle, cached for 5 minutes.

curl https://gateway.controlzero.ai/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "X-ControlZero-API-Key: cz_live_tenant_specific_key" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model": "claude-sonnet-4-6", "max_tokens": 1024, "messages": [...]}'

When the header is absent, the gateway falls back to its configured CZ_GATEWAY_CZ_API_KEY.

Next Steps

Quick Start: Get up and running in 5 minutes.
Policies: Learn how to write policies.
Locale-Aware DLP: Configure region-specific PII detection patterns.
Governing MCP tool calls: Govern MCP server and tool access.
CLI Scanner: Scan projects for governance gaps in CI/CD.

How It Works​

Quick Start​

Anthropic (Claude)​

OpenAI​

What the agent receives when a tool call is blocked​

Features​

Pre-flight Request Guard​

Response-Side DLP​

Rate Limiting​

Correlation IDs​

Prometheus Metrics​

Tool Call Interception​

Supported Providers​

Identity and Context Headers​

Fail-Closed Mode​

Tamper Detection​

Audit Logging​

Self-Hosted Deployment​

Environment Variables​

Shadow Mode​

Docker Compose​

Gateway vs SDK​

Multi-tenant mode: per-request API keys​

Next Steps​