060 279 5587 info@sitect.co.za 139 Davies Street, Doornfontein, Johannesburg, 2001 Gauteng, SA
Case Studies

How we cut an SA retailer's WhatsApp support response time by 87%

A field report from the NorthStar Retail engagement: what we built, what worked, what we'd do differently. 8,000 monthly messages, ~33-second median reply.

26 Apr 2026 · 5 min read · 9 views

Read the full case study on our Case Studies page. This is the engineering-side post.

The stack

  • Twilio WhatsApp Business as the channel
  • Laravel as the orchestration backbone
  • pgvector on Postgres for RAG (we tried Pinecone first — overkill for the scale)
  • gpt-4o-mini for triage, gpt-4o for full replies
  • A small Vue dashboard for the support team to take over conversations

What worked

The intent-classification step in front of the main reply path. We classify every inbound message into one of ~30 intents before deciding whether to fully answer, ask a clarifying question, or escalate. This made the system feel faster and more right than a single big prompt.

The "draft for human" mode that the team can toggle when they want eyes on every reply. About 12% of the team's time is now spent in that mode, mostly during sales weekends.

What we'd do differently

We over-built the eval suite at the start. A simpler "every 20th reply gets reviewed by a senior agent, scored 1–5" loop would have caught the same drift earlier and cost less to maintain.

We under-invested in the take-over UX. The team had to refresh constantly for a week before we added live updates via Pusher. Listen to the support team early.