VPC Deployment

Air-Gapped LLM VPC Deployment

Stepfault deploys open-weights LLMs (Llama, Mistral, Qwen) inside a client-controlled, air-gapped VPC using Docker and Kubernetes. No prompt, weight, or document crosses the network boundary, which keeps Controlled Unclassified Information inside scope and removes public-API egress risk entirely.

1. Why air-gapped instead of public API

Public multi-tenant APIs ingest prompts and uploads that may be retained for model refinement, which destroys trade-secret protection and violates ITAR boundaries the moment controlled technical data is transmitted. A private VPC keeps the model weights and data 100% client property.

2. Reference architecture

  • Containerized inference (vLLM / Ollama) on client GPU fleet or a dedicated Azure landing zone
  • Kubernetes network policies enforcing zero egress from the inference namespace
  • pgvector / Weaviate retrieval co-located inside the same isolation boundary
  • MAPOS orchestration with policy guards, serialization gates, and HITL checkpoints
  • OpenTelemetry spans exported only to in-boundary collectors for audit replay

3. Data sovereignty controls

  • No outbound DNS or internet route from model-serving pods
  • Encrypted-at-rest weights and vector stores under client-held keys
  • Per-handoff audit logging for compliance evidence generation

4. Deployment topology

[ Client VPC ]
  ingress-validator -> mapos-orchestrator -> policy-guard
                                   |
              +--------------------+--------------------+
              |                    |                    |
        retrieval-agent      codegen-agent      compliance-agent
              |                    |                    |
           pgvector            sandbox             audit-log (otel)
  egress: DENY ALL

Related pages