Building a RAG Pipeline with Control Zero
This guide shows how to add governance to a Retrieval-Augmented Generation (RAG) pipeline. Control Zero automatically enforces which models agents can use and provides manual enforcement for custom actions like vector store access.
What You Will Build
A RAG system where:
- LLM calls are automatically enforced via
wrap_openai() - Vector store access is enforced with manual
enforce()calls (since vector stores are not standard LLM clients) - Policies control which data sources agents can query and which models they can use
- Every decision is logged for audit
Architecture
Notice the two enforcement points:
- Automatic:
wrap_openai()handles LLM model governance. - Manual:
cz.enforce()handles vector store access (since it is a custom data source, not an LLM API).
Setup
pip install controlzero openai chromadb
import controlzero
from controlzero.integrations.openai import wrap_openai
import openai
import chromadb
# Initialize Control Zero and wrap the OpenAI client
cz = controlzero.init()
client = wrap_openai(openai.OpenAI(), cz)
# Initialize vector store
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("documents")
Define the Policy
In the Control Zero dashboard, create this policy:
{
"name": "rag-pipeline-policy",
"description": "Governance for RAG: control data access and model usage",
"rules": [
{ "effect": "allow", "action": "llm.generate", "resource": "model/gpt-4" },
{
"effect": "allow",
"action": "embedding.generate",
"resource": "model/text-embedding-3-small"
},
{ "effect": "deny", "action": "llm.generate", "resource": "model/gpt-4-turbo*" },
{ "effect": "allow", "action": "data.read", "resource": "vectorstore/documents" },
{ "effect": "deny", "action": "data.write", "resource": "vectorstore/documents" },
{ "effect": "deny", "action": "data.read", "resource": "vectorstore/internal-*" }
]
}
What this policy means:
- LLM calls with GPT-4 are allowed (auto-enforced by wrapper).
- Embeddings with
text-embedding-3-smallare allowed (auto-enforced by wrapper). - GPT-4 Turbo is blocked.
- Reading from the
documentscollection is allowed. - Writing to any collection is blocked (read-only agents).
- Reading from
internal-*collections is blocked.
Implementation
Retrieve with Policy Enforcement
Vector store access is a custom action. Use enforce() to check the policy:
def retrieve(query: str, agent_id: str, n_results: int = 5) -> list[str]:
"""Retrieve relevant documents with policy enforcement."""
# Manual enforce: check if this agent can read from the vector store
cz.enforce(
action="data.read",
resource="vectorstore/documents",
context={"agent_id": agent_id},
)
# Generate query embedding (auto-enforced by wrap_openai)
response = client.embeddings.create(
model="text-embedding-3-small",
input=query,
)
query_embedding = response.data[0].embedding
# Search the vector store
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results,
)
return results["documents"][0] if results["documents"] else []
Generate with Automatic Enforcement
LLM generation is automatically enforced by the wrapper -- no enforce() needed:
def generate_answer(query: str, context: list[str]) -> str:
"""Generate an answer. Model governance is automatic via wrap_openai."""
context_text = "\n\n".join(context)
# This call is automatically checked against your policy
# because the client is wrapped with wrap_openai()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": (
"Answer the question based only on the provided context. "
"If the context does not contain enough information, say so."
),
},
{
"role": "user",
"content": f"Context:\n{context_text}\n\nQuestion: {query}",
},
],
)
return response.choices[0].message.content
Full Pipeline
def rag_query(query: str, agent_id: str = "rag-agent") -> str:
"""Complete RAG pipeline with governance at every step."""
try:
context = retrieve(query, agent_id)
if not context:
return "No relevant documents found."
return generate_answer(query, context)
except controlzero.PolicyViolationError as e:
return f"Blocked by policy: {e.message}"
# Usage
answer = rag_query("What were Q4 revenue numbers?")
print(answer)
What Happens at Runtime
| Step | What Happens | Enforcement |
|---|---|---|
| 1. Retrieve docs | cz.enforce("data.read", "vectorstore/documents") | Manual -- custom data source |
| 2. Generate embedding | client.embeddings.create(model="text-embedding-3-small") | Automatic -- wrap_openai |
| 3. Generate answer | client.chat.completions.create(model="gpt-4") | Automatic -- wrap_openai |
When to Use Manual enforce() vs Auto-Wrapping
| Scenario | Method | Why |
|---|---|---|
| LLM API calls (OpenAI, Anthropic) | Auto-wrap the client | The SDK extracts model names automatically |
| Vector store queries | Manual enforce() | Custom data source, not a standard LLM API |
| Database access | Manual enforce() | Custom data source |
| File operations | Manual enforce() | Custom action |
| MCP tool calls | Manual enforce() | Custom protocol action |
The rule: if Control Zero has a wrapper for it, use it. For everything else, use enforce().
Next Steps
- Semantic Search Guide -- Simpler retrieval-only setup.
- LangChain Integration -- Build RAG with LangChain's callback handler.
- Policies -- Construct complex policies in the dashboard.