Building a RAG Pipeline with Control Zero

This guide shows how to add governance to a Retrieval-Augmented Generation (RAG) pipeline. Control Zero automatically enforces which models agents can use and provides manual enforcement for custom actions like vector store access.

What You Will Build

A RAG system where:

LLM calls are automatically enforced via wrap_openai()
Vector store access is enforced with manual guard() calls (since vector stores are not standard LLM clients)
Policies control which data sources agents can query and which models they can use
Every decision is logged for audit

Architecture

Notice the two enforcement points:

Automatic: wrap_openai() handles LLM model governance.
Manual: cz.guard() handles vector store access (since it is a custom data source, not an LLM API).

Setup

pip install controlzero openai chromadb

from controlzero import Client
from controlzero.integrations.openai import wrap_openai
import openai
import chromadb

# Initialize Control Zero and wrap the OpenAI client
cz = Client(api_key="cz_live_your_api_key_here")
client = wrap_openai(openai.OpenAI(), cz)

# Initialize vector store
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("documents")

Define the Policy

In the Control Zero dashboard, create this policy:

{
  "name": "rag-pipeline-policy",
  "description": "Governance for RAG: control data access and model usage",
  "rules": [
    { "effect": "allow", "action": "llm:generate", "resource": "model/gpt-4" },
    {
      "effect": "allow",
      "action": "embedding:generate",
      "resource": "model/text-embedding-3-small"
    },
    { "effect": "deny", "action": "llm:generate", "resource": "model/gpt-4*" },
    { "effect": "allow", "action": "data:read", "resource": "vectorstore/documents" },
    { "effect": "deny", "action": "data:write", "resource": "vectorstore/documents" },
    { "effect": "deny", "action": "data:read", "resource": "vectorstore/internal-*" }
  ]
}

What this policy means:

LLM calls with GPT-4 are allowed (auto-enforced by wrapper).
Embeddings with text-embedding-3-small are allowed (auto-enforced by wrapper).
GPT-4 Turbo is blocked.
Reading from the documents collection is allowed.
Writing to any collection is blocked (read-only agents).
Reading from internal-* collections is blocked.

Implementation

Retrieve with Policy Enforcement

Vector store access is a custom action. Use guard() to check the policy:

def retrieve(query: str, agent_id: str, n_results: int = 5) -> list[str]:
    """Retrieve relevant documents with policy enforcement."""

    # Manual guard: check if this agent can read from the vector store
    cz.guard("vectorstore/documents", method="read", args={"agent_id": agent_id})

    # Generate query embedding (auto-enforced by wrap_openai)
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query,
    )
    query_embedding = response.data[0].embedding

    # Search the vector store
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results,
    )
    return results["documents"][0] if results["documents"] else []

Generate with Automatic Enforcement

LLM generation is automatically enforced by the wrapper. No guard() needed:

def generate_answer(query: str, context: list[str]) -> str:
    """Generate an answer. Model governance is automatic via wrap_openai."""

    context_text = "\n\n".join(context)

    # This call is automatically checked against your policy
    # because the client is wrapped with wrap_openai()
    response = client.chat.completions.create(
        model="gpt-5.4",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer the question based only on the provided context. "
                    "If the context does not contain enough information, say so."
                ),
            },
            {
                "role": "user",
                "content": f"Context:\n{context_text}\n\nQuestion: {query}",
            },
        ],
    )
    return response.choices[0].message.content

Full Pipeline

def rag_query(query: str, agent_id: str = "rag-agent") -> str:
    """Complete RAG pipeline with governance at every step."""
    try:
        context = retrieve(query, agent_id)
        if not context:
            return "No relevant documents found."

        return generate_answer(query, context)

    except controlzero.PolicyDeniedError as e:
        return f"Blocked by policy: {e.decision.reason}"


# Usage
answer = rag_query("What were Q4 revenue numbers?")
print(answer)

What Happens at Runtime

Step	What Happens	Enforcement
1. Retrieve docs	`cz.guard("vectorstore/documents", method="read")`	Manual, custom data source
2. Generate embedding	`client.embeddings.create(model="text-embedding-3-small")`	Automatic via `wrap_openai`
3. Generate answer	`client.chat.completions.create(model="gpt-5.4")`	Automatic via `wrap_openai`

When to Use Manual `guard()` vs Auto-Wrapping

Scenario	Method	Why
LLM API calls (OpenAI, Anthropic)	Auto-wrap the client	The SDK extracts model names automatically
Vector store queries	Manual `guard()`	Custom data source, not a standard LLM API
Database access	Manual `guard()`	Custom data source
File operations	Manual `guard()`	Custom action
MCP tool calls	Manual `guard()`	Custom protocol action

The rule: if Control Zero has a wrapper for it, use it. For everything else, use guard().

Next Steps

Semantic Search Guide: Simpler retrieval-only setup.
LangChain Integration: Build RAG with LangChain's callback handler.
Policies: Construct complex policies in the dashboard.

What You Will Build​

Architecture​

Setup​

Define the Policy​

Implementation​

Retrieve with Policy Enforcement​

Generate with Automatic Enforcement​

Full Pipeline​

What Happens at Runtime​

When to Use Manual guard() vs Auto-Wrapping​

Next Steps​