A senior-built integration layer that sits between your app and the LLM providers — with provider routing, fallback chains, rate-limiting, prompt versioning, observability and hard cost caps. So your CTO can sleep, and your finance team doesn't get a R250k surprise on the 5th.
We've cleaned up enough of these that we built a generic fix. Sound familiar?
Direct openai.create() calls sprinkled across services. No retries. No timeouts. No observability. When OpenAI hiccups, your whole feature is down.
One developer writes a recursive prompt. One Friday afternoon. By Monday, the team's R8k/mo budget hits R85k — and finance discovers it on the credit-card statement.
SDK calls baked into 40 files. New provider drops a better model? You quote 6 weeks to migrate. By the time you ship, that "better model" has been deprecated.
Your best prompt is a 280-line string literal. Edit needs a deploy. PM wants A/B. Three versions in git, none in production. Nobody trusts the analytics.
Sitect's gateway is opinionated. These eight capabilities ship with every integration — you turn each one on or off in config, you don't have to write any of them.
Per-route preference order with automatic fallback when the primary returns 429 / 500 / timeout.
prefer: [...]Per-call, per-user, per-team and per-month spend ceilings. Hit the cap, the gateway throws — not the bill.
budget: 0.05Token-bucket per-key + per-user limiting. Burst absorbed; sustained abuse politely 429'd. Never lets the upstream limit hit your customer.
rate: 60/minSemantic + exact-match caching for repeat queries. Cuts spend & latency on FAQ-style workloads by 60–80%.
cache: "semantic"Prompts versioned in Git or our UI. Edit a prompt, hot-reload without deploy. A/B test two versions. Roll back instantly.
prompt: "summary@v3"Every call traced with token in, token out, cost, model, latency, user. Exports OpenTelemetry for Grafana / Datadog.
span.set(...)Names, emails, ID numbers, phone numbers stripped before the prompt leaves your tenant. POPIA Section 19 by default.
redact: ["pii"]Regression tests for prompts. Catch quality drops before customers do. Plug into CI — fail the build on a model swap that hurts your golden set.
eval run --suiteSame feature. Same provider. Same guarantees. Two very different on-call experiences.
// Direct OpenAI call with manual retries async function summarise(text) { let attempt = 0; while (attempt < 3) { try { const r = await openai.chat.completions .create({ model: "gpt-4o", messages: [...] }); // no cost tracking · no logging // no fallback to claude · no cache return r.choices[0].message.content; } catch (e) { if (e.status === 429) { await sleep(attempt * 1000); attempt++; } else throw e; // dies on 500s } } throw new Error("out of retries"); } // repeat ×40 across the codebase…
// All of the above — built in. async function summarise(text) { return (await sitect.complete({ prompt: "summary@v3", inputs: { text }, prefer: ["claude-3.5-sonnet", "gpt-4o"], cache: "semantic", budget: 0.05, })).text; } // → Auto-retries · auto-fallback · cached · logged // → Cost-capped · PII-redacted · OpenTelemetry-traced // → A/B-able · versioned · eval-tested // ✓ 200 OK · 380ms · R$0.0024
The gateway abstracts over every major LLM provider. Same API, different model — pick by cost, latency, accuracy or data residency.
The gateway is a thin service you run alongside your existing app — Docker image, K8s manifest, or our managed cloud. Sits between you and the providers, takes the configuration as code.
Every call streams metrics into the dashboard your CTO checks before the standup. No more "is the model down?" pinging in Slack. No more end-of-month bill surprises.
No SaaS lock-in. The gateway, the SDK, the dashboard — all open-source license to you, hosted on your infra, in your registry.
The core gateway. Stateless, horizontally scalable, ships with a Helm chart and Terraform module. Sub-100MB image.
JS/TS, Python, PHP, Go and .NET clients — typed, tested, with auto-retries and traces wired in. Drop-in OpenAI-compatible mode for zero-migration.
Web UI for the dashboard, prompt registry editor, A/B test config, eval runs and audit log search. Hosted on your domain.
Prompts moved out of code and into a registry. Version control, draft/prod separation, A/B traffic split, instant rollback.
30+ test cases tuned for your domain. Runs on every prompt change and every provider swap. CI-integrated.
Markdown runbook for the top 12 incident types. Sitect on-call hand-off for the first 30 days, then your team owns it.
The final cut-over is usually a single PR your team reviews. Most of the engagement is in the audit and the architecture review — the wiring is the easy part.
Map every existing LLM call-site. Identify retry-gaps, missing limits, runaway prompts, lock-in points.
Pick deployment shape (your K8s, your VM, our cloud). Define routes, budgets, redaction rules, prompts to migrate.
Gateway deployed, SDK installed, call-sites migrated. Cut-over PR with feature flags so you roll in 10%/50%/100%.
Eval suite written, your team trained on the dashboard. Sitect on-call hand-off for 30 days post-launch.
Aggregated across our deployed gateways. Your numbers will be yours — these are the order-of-magnitude bracket.
Build fee is fixed in writing. Once shipped, you own the code — no per-call platform fee, no SaaS subscription, no lock-in. LLM provider tokens billed at provider cost direct to your accounts.
If yours isn't here, ping us — we'll answer with a code sample, not a sales call.
LiteLLM is provider routing. OpenRouter is a hosted multi-provider gateway. Helicone is observability. The Sitect gateway is opinionated and bundles all three plus PII redaction, prompt registry, eval suite and an SA-tuned dashboard. You'll see common building blocks from those tools under the hood — we don't reinvent for the sake of reinventing.openai SDK at https://gw.your-domain.com/v1 and it'll route through us with no code change. You miss out on some advanced features (prompt registry, semantic cache) until you adopt the native SDK, but you get retries, cost caps and observability for free on day one.Drop your repo (or a screenshare of your current LLM call-sites). One of our seniors will spend 30 minutes finding the 3 biggest risks in your setup — gaps in retries, cost-runaway prompts, missing redaction, vendor lock-in. Written follow-up within 48h. No sales pitch.