Working draft — Sancto CS is finalizing the long version with redacted example policies from three of our clients.
Why AI startups specifically struggle
Standard SOC 2 templates assume your data goes to known sub-processors (AWS, Stripe, Datadog). AI products send customer data to OpenAI, Anthropic, Google, fine-tuning pipelines, vector DBs, and sometimes open-source models on Modal/Replicate. Every one of those is a separate sub-processor your auditor will ask about — and most teams haven't documented the data flow.
The other surprise: training data. If you've ever fine-tuned a model, the auditor wants to see how customer data was scrubbed, who approved, and whether any of it leaked into a model that serves other customers.
Days 1–30: foundation
- Pick the framework (we recommend Vanta or Drata — Secureframe is fine too)
- Document every sub-processor, including all LLM APIs, vector DBs, fine-tuning services
- Draft data flow diagrams for each AI feature: input → preprocessing → model call → output → storage
- Set up SSO, MFA everywhere, password manager rollout
- Inventory of devices, automated patching
Days 31–60: controls
- Access reviews (quarterly) — automate via Vanta/Drata
- Model governance policy: which models are approved, who approves fine-tuning, how PII is handled
- Customer data isolation: each tenant's vectors in a separate namespace, never cross-contaminated
- Logging and monitoring: who accessed what, when, what model returned
- Incident response runbook — including "model returned wrong/harmful output"
Days 61–90: audit prep
- Pick an auditor — boutiques are cheaper but slower; Type II observation window is 3–12 months
- Run an internal readiness check (Vanta does this automatically)
- Pre-write your security page (saves dozens of customer questionnaires later)
- Schedule the audit. Type I first if you need a fast checkmark; Type II if you can wait 3+ months
What auditors flag specifically about AI
- "Do you log every prompt sent to the LLM?" → They want yes. Even if you don't store, you should be able to.
- "How do you prevent prompt injection from exposing other tenants' data?" → System prompts + per-tenant retrieval scoping
- "Is customer data used to train your models?" → Have a written policy. Default to no.
- "What happens to OpenAI/Anthropic's logs?" → You need their zero-retention agreement on file
The fastest path to Type II for an AI startup is Type I now, observation window starting tomorrow, Type II in 6 months. Don't wait for "perfect."