Skip to main content

Building a RAG Pipeline with Control Zero

This guide shows how to add governance to a Retrieval-Augmented Generation (RAG) pipeline. Control Zero automatically enforces which models agents can use and provides manual enforcement for custom actions like vector store access.

What You Will Build

A RAG system where:

  • LLM calls are automatically enforced via wrap_openai()
  • Vector store access is enforced with manual enforce() calls (since vector stores are not standard LLM clients)
  • Policies control which data sources agents can query and which models they can use
  • Every decision is logged for audit

Architecture

Notice the two enforcement points:

  • Automatic: wrap_openai() handles LLM model governance.
  • Manual: cz.enforce() handles vector store access (since it is a custom data source, not an LLM API).

Setup

pip install controlzero openai chromadb
import controlzero
from controlzero.integrations.openai import wrap_openai
import openai
import chromadb

# Initialize Control Zero and wrap the OpenAI client
cz = controlzero.init()
client = wrap_openai(openai.OpenAI(), cz)

# Initialize vector store
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("documents")

Define the Policy

In the Control Zero dashboard, create this policy:

{
"name": "rag-pipeline-policy",
"description": "Governance for RAG: control data access and model usage",
"rules": [
{ "effect": "allow", "action": "llm.generate", "resource": "model/gpt-4" },
{
"effect": "allow",
"action": "embedding.generate",
"resource": "model/text-embedding-3-small"
},
{ "effect": "deny", "action": "llm.generate", "resource": "model/gpt-4-turbo*" },
{ "effect": "allow", "action": "data.read", "resource": "vectorstore/documents" },
{ "effect": "deny", "action": "data.write", "resource": "vectorstore/documents" },
{ "effect": "deny", "action": "data.read", "resource": "vectorstore/internal-*" }
]
}

What this policy means:

  • LLM calls with GPT-4 are allowed (auto-enforced by wrapper).
  • Embeddings with text-embedding-3-small are allowed (auto-enforced by wrapper).
  • GPT-4 Turbo is blocked.
  • Reading from the documents collection is allowed.
  • Writing to any collection is blocked (read-only agents).
  • Reading from internal-* collections is blocked.

Implementation

Retrieve with Policy Enforcement

Vector store access is a custom action. Use enforce() to check the policy:

def retrieve(query: str, agent_id: str, n_results: int = 5) -> list[str]:
"""Retrieve relevant documents with policy enforcement."""

# Manual enforce: check if this agent can read from the vector store
cz.enforce(
action="data.read",
resource="vectorstore/documents",
context={"agent_id": agent_id},
)

# Generate query embedding (auto-enforced by wrap_openai)
response = client.embeddings.create(
model="text-embedding-3-small",
input=query,
)
query_embedding = response.data[0].embedding

# Search the vector store
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
)
return results["documents"][0] if results["documents"] else []

Generate with Automatic Enforcement

LLM generation is automatically enforced by the wrapper -- no enforce() needed:

def generate_answer(query: str, context: list[str]) -> str:
"""Generate an answer. Model governance is automatic via wrap_openai."""

context_text = "\n\n".join(context)

# This call is automatically checked against your policy
# because the client is wrapped with wrap_openai()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": (
"Answer the question based only on the provided context. "
"If the context does not contain enough information, say so."
),
},
{
"role": "user",
"content": f"Context:\n{context_text}\n\nQuestion: {query}",
},
],
)
return response.choices[0].message.content

Full Pipeline

def rag_query(query: str, agent_id: str = "rag-agent") -> str:
"""Complete RAG pipeline with governance at every step."""
try:
context = retrieve(query, agent_id)
if not context:
return "No relevant documents found."

return generate_answer(query, context)

except controlzero.PolicyViolationError as e:
return f"Blocked by policy: {e.message}"


# Usage
answer = rag_query("What were Q4 revenue numbers?")
print(answer)

What Happens at Runtime

StepWhat HappensEnforcement
1. Retrieve docscz.enforce("data.read", "vectorstore/documents")Manual -- custom data source
2. Generate embeddingclient.embeddings.create(model="text-embedding-3-small")Automatic -- wrap_openai
3. Generate answerclient.chat.completions.create(model="gpt-4")Automatic -- wrap_openai

When to Use Manual enforce() vs Auto-Wrapping

ScenarioMethodWhy
LLM API calls (OpenAI, Anthropic)Auto-wrap the clientThe SDK extracts model names automatically
Vector store queriesManual enforce()Custom data source, not a standard LLM API
Database accessManual enforce()Custom data source
File operationsManual enforce()Custom action
MCP tool callsManual enforce()Custom protocol action

The rule: if Control Zero has a wrapper for it, use it. For everything else, use enforce().

Next Steps