VPC Deployment
Air-Gapped LLM VPC Deployment
Stepfault deploys open-weights LLMs (Llama, Mistral, Qwen) inside a client-controlled, air-gapped VPC using Docker and Kubernetes. No prompt, weight, or document crosses the network boundary, which keeps Controlled Unclassified Information inside scope and removes public-API egress risk entirely.
1. Why air-gapped instead of public API
Public multi-tenant APIs ingest prompts and uploads that may be retained for model refinement, which destroys trade-secret protection and violates ITAR boundaries the moment controlled technical data is transmitted. A private VPC keeps the model weights and data 100% client property.
2. Reference architecture
- Containerized inference (vLLM / Ollama) on client GPU fleet or a dedicated Azure landing zone
- Kubernetes network policies enforcing zero egress from the inference namespace
- pgvector / Weaviate retrieval co-located inside the same isolation boundary
- MAPOS orchestration with policy guards, serialization gates, and HITL checkpoints
- OpenTelemetry spans exported only to in-boundary collectors for audit replay
3. Data sovereignty controls
- No outbound DNS or internet route from model-serving pods
- Encrypted-at-rest weights and vector stores under client-held keys
- Per-handoff audit logging for compliance evidence generation
4. Deployment topology
[ Client VPC ]
ingress-validator -> mapos-orchestrator -> policy-guard
|
+--------------------+--------------------+
| | |
retrieval-agent codegen-agent compliance-agent
| | |
pgvector sandbox audit-log (otel)
egress: DENY ALL