Inference Economics: How the Training-to-Inference Shift Changes Everything
The AI industry is shifting from a training-dominated phase (where the biggest spend goes to training new models) to an inference-dominated phase (where the biggest spend goes to running models for users at scale). This shift fundamentally changes the semiconductor demand profile: training favors massive GPUs with maximum compute and memory bandwidth, while inference favors efficiency, throughput per watt, and cost per query. The companies that win the training phase (NVIDIA) may not be the same ones that dominate inference. Custom ASICs (Google TPU, AWS Trainium), inference-optimized architectures (Groq, Cerebras), and edge inference chips (Qualcomm, MediaTek) all gain relevance.
Training vs. Inference: Different Economics
Training is a fixed cost: you train a model once (or periodically retrain) using massive GPU clusters. The economics favor raw compute power — whoever has the most FLOPs wins. This is NVIDIA's domain: A100, H100, B200 are all optimized for training throughput.
Inference is a variable cost: every user query, every API call, every agent action requires inference compute. As AI adoption scales, inference volume grows exponentially while training stays relatively flat. The economics shift from "maximum compute" to "minimum cost per query" and "maximum throughput per watt."
This distinction matters enormously for investors. In a training-dominated world, you buy NVIDIA and NVIDIA's supply chain. In an inference-dominated world, the competitive landscape fragments and the value chain shifts.
Who Benefits from the Inference Shift
Custom ASICs gain share because hyperscalers (Google, Amazon, Microsoft) can design chips optimized for their specific inference workloads at lower cost per query than general-purpose GPUs. Google's TPU v5, AWS Trainium2, and Meta's MTIA are all inference-focused.
NVIDIA adapts by releasing inference-optimized configurations (L40S, H100 NVL) and pushing software moats (TensorRT-LLM, Triton). NVIDIA won't lose inference entirely — but their market share will be lower than in training.
Edge inference becomes relevant as models shrink enough to run on-device. Qualcomm, MediaTek, and Apple's custom silicon benefit from running AI locally rather than in the cloud.
What This Means for the Functional Index
Closelook tracks the training-to-inference ratio through the Compute layer of the Functional Index. As inference dominates, the weight of custom ASIC and inference-focused companies increases relative to pure GPU plays. The index adapts to reflect this structural shift.