We are not religious about model providers. We use Claude, we use Gemini, we use Llama via Bedrock. But when a South African client asks us "which model should we put behind our customer chatbot", our default is still OpenAI's GPT-4o and gpt-4o-mini.
Here's why, after about 50,000 production calls split across providers.
Latency from SA matters
Claude is hosted primarily in US-East. Round-trip from JHB averages 380ms on the first token, 220ms steady. OpenAI's primary region is also US but via Azure's South African POP it's 180ms first token. For a chat widget where the user is watching for the response, that 200ms difference is felt.
SA English vs International English
GPT-4o handles Afrikaans-flavoured English, isiZulu transliteration and our number conventions ("R10k" for ten thousand rand) noticeably better than Claude in our blind A/B testing. Not by a huge margin — but consistently.
JSON-mode reliability
For production agents that must return structured JSON, gpt-4o's response_format: json_object is rock-solid. Claude's structured output via XML tags is good but not as predictable in our hands.
Cost
GPT-4o-mini is roughly half the cost of Haiku and a tenth the cost of Sonnet for short-turn conversational workloads. For a SA SME running 10k chats a month, that's a real budget line.
None of this is forever. We re-evaluate every quarter. If you want our current per-use-case recommendation, drop us a line.