What is changing: CNBC reported this week that OpenAI and Anthropic — the principal beneficiaries of the "spend more on AI" mentality that fuelled exponential growth through 2024 and 2025 — are now facing a shift in customer behaviour. Businesses are demanding clear returns before spending more. Cheaper alternatives are absorbing traffic that would previously have gone to frontier models. And the platforms themselves are building model routing into their products, normalising the idea that not every task needs the most powerful model.

What "tokenmaxxing" was and why it is ending

"Tokenmaxxing" is the informal term for the enterprise behaviour of routing every AI task through the most capable (and most expensive) frontier model available, on the assumption that more capability means better outcomes. During 2023 and 2024, this was a reasonable heuristic — frontier models significantly outperformed mid-tier alternatives on most tasks, and the cost premium was acceptable when AI was being proved out.

In 2026, three things have changed. First, mid-tier and open-weight models have become dramatically more capable, narrowing the gap with frontier models on routine tasks. Second, AI costs have fallen far enough that the price difference between tiers is now significant and measurable in business budgets. Third, business leaders — who lived through the "AI at all costs" investment thesis — are now asking for ROI, not experiments.

The result is a behavioural shift that is visible in traffic data, pricing decisions, and product strategy across the major AI providers.

Model routing: what it is and how it works

Model routing is the practice of analysing each task and directing it to the AI model that is sufficient to complete it well, rather than always using the most powerful available model. The principle is straightforward: a simple task such as drafting a brief email reply, reformatting a table, or answering a common customer FAQ does not require the same model as a complex task such as analysing a 200-page contract, generating a detailed financial plan, or reasoning across multiple conflicting data sources.

Microsoft has already built automatic model routing into GitHub Copilot. Enterprise AI platforms are adding routing layers to their products. In 2026, roughly 95% of enterprise AI usage still runs on frontier models — but that number is expected to change rapidly as routing becomes a standard feature of AI infrastructure.

The DeepSeek signal

AI startup Lindy moved 100% of its traffic away from Anthropic's Claude models to DeepSeek — a Chinese company producing cheaper open-weight alternatives — and reported dramatic cost savings. This is an extreme case, but it illustrates that the performance gap between frontier and non-frontier models has closed enough for some production workloads to shift entirely. UK businesses with API-based AI workflows should at minimum test whether a cheaper model produces acceptable output for their specific tasks.

What the efficiency pivot means for UK service businesses

Most UK small service businesses — chimney sweeps, tradespeople, accountants, consultants, care operators — are not running AI through an API with metered per-token costs. They are using ChatGPT, Claude.ai, or similar subscription tools with flat monthly fees. For these businesses, "model routing" is less directly relevant as a cost mechanism.

However, the efficiency pivot translates into two practical signals for UK service businesses:

  • Stop chasing the newest model, start using your current model more consistently. The ROI from AI comes from repetition and integration, not from always having the latest version. The businesses reporting 68% efficiency gains from AI are not necessarily using better tools — they are using whatever tools they have, consistently, on real work.
  • When evaluating new AI tools or upgrades, ask "what specifically do I need this capability for?" The tokenmaxxing logic — "get the best, use it for everything" — is being replaced by task-specific evaluation. Apply that logic to your own tool decisions. If you are paying for a premium tier of any AI tool, list three specific tasks that tier enables which the standard tier cannot. If you cannot name three, the standard tier is probably sufficient.

The cost trajectory for Q3 and beyond

With GPT-5.6 Luna launching at $1 per million input tokens, Gemini 3.5 Pro targeting lower enterprise pricing, and open-weight alternatives improving monthly, AI costs are on a consistent downward trajectory. For UK businesses using AI on subscriptions today, this is less immediately relevant — subscription pricing is less volatile than API pricing. But for any business considering building AI-assisted workflows at scale (processing every customer enquiry, generating every quote, summarising every appointment), the cost picture will be materially more favourable within 12 months.

Operator action: audit your AI spend in 20 minutes

Step 1 — List every AI tool you pay for (5 minutes). Include subscriptions, annual plans, API costs. Total the monthly spend. Many business owners underestimate this number because tools are spread across cards and departments.
Step 2 — For each tool, name one specific task it does well for your business (10 minutes). If you cannot name a specific task, that tool is likely experimental spend rather than operational spend. Decide whether to drop it or commit to testing it on a real workflow within 30 days.
Step 3 — Identify the 20% of your AI use that drives 80% of the value (5 minutes). That is where to deepen investment. Everything else is either experimental or should be rationalised. The efficiency pivot is not about spending less on AI — it is about concentrating spend where the returns are clearest.