Back to Blog

The $90M Wake-Up Call: How AT&T Cut AI Costs by 90% While Everyone Else Burns Cash

Notion
4 min read
NewsAILLMBig-Tech

The $90M Wake-Up Call: How AT&T Cut AI Costs by 90% While Everyone Else Burns Cash

Here's a number that should terrify every CTO: 8 billion tokens per day.

That's what AT&T was processing through their AI systems. At typical enterprise rates, we're talking about potentially $90 million+ annually just in token costs. Think about that for a second—more than most startups raise in their entire lifetime, burned through just asking questions to AI.

AT&T AI orchestration architecture

The Problem Every Enterprise Is About to Hit

Most companies are still in the "throw everything at GPT-4" phase of their AI journey. It's like using a sledgehammer to hang a picture frame. Sure, it works, but you're destroying your wall (and your budget) in the process.

AT&T's chief data officer Andy Markus realized they couldn't keep routing every simple query through massive reasoning models. Not every question needs a PhD to answer it.

The Multi-Agent Stack That Changed Everything

Here's where it gets interesting. Instead of one giant model doing everything, AT&T built a hierarchy:

┌─────────────────────────────────┐

│ LLM "Super Agent" │

│ (Routes & Orchestrates) │

└──────────┬──────────────────────┘

┌─────┴─────┬─────────┬─────────┐

▼ ▼ ▼ ▼

SLM-1 SLM-2 SLM-3 SLM-4

(Simple) (Specific) (Fast) (Cheap)

They used LangChain to build "super agents" that direct traffic to smaller, specialized models. The result? A 90% cost reduction. From potential nine-figure annual costs down to something actually sustainable.

Meanwhile, The Open Source Revolution Just Went Nuclear

While AT&T was optimizing costs, Alibaba's Qwen team dropped a bomb: four new models that match Claude Sonnet 3.5 performance but run locally on your hardware.

Alibaba Qwen models

Three of the four models (Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B) are Apache 2.0 licensed. Translation: free for commercial use, no API costs, complete control.

Do you see where this is going?

The Pattern Emerging Across Enterprise AI

Look at what else dropped this week:

Anthropic launched Claude Cowork, admitting that "the hype around enterprise AI agents in 2025 turned out to be mostly premature." Their new approach? Focus on narrow, specific use cases instead of trying to boil the ocean.

Gong released Mission Andromeda with open Model Context Protocol connections—even to their rivals. Why? Because enterprises don't want vendor lock-in anymore.

Guidde is training AI agents on video instead of documentation, because let's be honest—nobody reads the manual anyway.

The theme? Pragmatism over hype. Orchestration over raw power. Economics over capabilities.

What This Means for You

If you're building with AI right now, here's the brutal truth: the "use the biggest model for everything" strategy is about to bankrupt you.

The winners will be those who:

  • Route intelligently (like AT&T)
  • Mix proprietary and open source (like the Qwen adopters will)
  • Optimize for cost per task, not just accuracy
  • Build multi-agent systems instead of monolithic solutions Old Approach: Every Query → GPT-4 → $$$$

New Approach: Router → Right-Sized Model → $

Cost Difference: 90%

The Uncomfortable Question

Here's what keeps me up at night: How many companies are currently burning millions on AI without even knowing it?

AT&T had the data infrastructure to catch this. Most companies don't. They're just watching their cloud bills explode and assuming "that's the cost of doing AI."

Hot take: Within 18 months, "AI cost optimization" will be a bigger market than AI development itself. The gold rush is over. Now comes the accounting.

Are you measuring your tokens? Do you know what each AI interaction actually costs you? Or are you just hoping the magic happens before the money runs out?

Because if AT&T—with virtually unlimited resources—had to make this change, what does that say about the rest of us?