April 2025

April 2025

From: Brian, Tobias, and Gaby

Subject: Token exponentials.

Welcome to the first ever two-month edition of our Infrastructure newsletter: Token Exponentials & Compute Revolutions.

As always, we look forward to hearing from you: corrections, suggestions, questions, and, most of all, connections to builders focused on these themes.

Machines of Loving Grace by Anthropic CEO Dario Amodei defines the attributes of “powerful AI.” Then he explores what such an AI could herald for biology, neuroscience, politics, the economy and human meaning. As the name suggests, he imagines a glorious outcome for the AI Revolution.

We’re now accustomed to hints of powerful AI: a Nobel Prize for protein discovery, 2M+ new materials discovered, pervasive labor automation (most apparently in software engineering), and the latest reasoning models. For Dario, powerful AI will be a step-function from this: AI that is smarter than Nobel prize winners in all fields, capable of spending years on a single task, accessible through various modalities, able to collaborate effortlessly with other AI, and take action on the world. While avoiding techno-utopian tropes, he explores concretely how this AI could extend life, improve mental and physical health while solving our most vexing psycho-political-economic problems. The essay reflects his optimism about the potential of intelligence and AI to become very, very smart. He envisions powerful AI being akin to a “country of geniuses in a datacenter.”

I took the essay in during a vacation, when the mind is a bit free to wander, and I started imagining what said datacenters would need to be like. The compute required to reach this land of geniuses felt unimaginable or perhaps unreachable, barring fantastical breakthroughs.

With this macro backdrop, we looked to the micro: the token. Satya Nadella said during Microsoft’s latest earnings call: “We processed over 100 trillion tokens this quarter, up 5X year-over-year – including a record 50 trillion tokens last month alone.” If Microsoft continues their pace for five years, they’ll process nearly 1.25 quintillions tokens in 2030.

‍

If we assume Microsoft is 50% of the market (they have OpenAI compute afterall) and growth continues at a 5x clip, we’ll process 2.5 quintillion tokens in 2030. That’s 2.5 trillion million. For the visual learners in the group:

As we were writing this newsletter and getting excited about token exponentials, Google I/O happened. Sundar Pichai had a mic drop moment when this slide came to the screen –

‍

Jaw-dropping growth and scale: 480T tokens/month, up 49x! While we knew we were being conservative in our estimates and aggressive on assuming Microsoft was 50% of the market, we had grossly underestimated Google’s introduction of AI into their product suite.

So, let’s make an update. In March, Microsoft processed 50T tokens. Google processed ~400T (480T in April). Let’s say Google plus Microsoft is 50% of the market, processing 450T tokens a month. If we still use a conservative 5x (a 10th of Google growth) year over year growth estimate, we’d grow to 168 quintillion tokens processed in 2030. Our earlier estimate was 2.5 quintillion!

We’ve entered a phase of astounding and accelerating growth in token usage, and we are wildly underprepared.

Compute will try to keep up, but there’s no way. At 100 tokens/second, a single fully-utilized GPU can process 3.1B tokens a year. While this year’s 800T tokens could be processed with 250K GPUs (and NVIDIA is going to ship 4M+ chips), 168 quintillion tokens in 2030 would require 53B GPUs. A 100x improvement in the efficiency of the underlying models or hardware would require 535M GPUs - 100x the number of GPUs NVIDIA will ship in 2025.

Breakthroughs are needed.

‍

Why tokens matter:

The token is the atomic unit of the AI era. At the end of the day, models consume and produce tokens. A token is a word, sub-word, or symbol that the model turns into a number for computation and then back from numbers into readable text, images or sounds. Tokens underpin the economics of AI, with model providers charging on a per-token basis and cost/token being a north-star metric in price compression, especially on inference.

Token throughput per GPU—how many tokens a chip can crunch each second—plus the watts each token consumes, sets the unit economics of inference. Hardware improvements and smarter software have already collapsed the cost of processing 1 million tokens from $180 to $0.75 in 18 months from 2023-24; Sam Altman expects a 10x drop in price every twelve months. Importantly, that’s for the same task and there are powerful forces driving token volumes and costs up that are leaving efficiency gains in tokenomics in the dust.

‍

Fueling the token exponential:

There are four key inputs to token growth, and they are compounding and accelerating one another:

Human user growth: AI adoption has hit escape velocity and we will see a majority of the human population using AI within five years. Today, ~10% of the world is using OpenAI, but 7.2 billion people have smartphones. By 2030, all new smartphones coming to market will have robust AI functionality.
Agent user growth: A single person might orchestrate 100 agents acting on their behalf across their personal and professional lives. Machines/service accounts outnumber human identities 82:1 today, and while we don’t see that ratio yet with agents, we’re not far from that world.
Token load per task: Reasoning models use 20x more tokens and are up to 150x more expensive than non-reasoning models. Prompts themselves are also getting longer – Google’s latest earnings call confirmed their AI Mode queries are twice as long as traditional Search queries.
Tasks being done by AI: First came copywriting and then coding, but we’re seeing signs of more complex labor automation across legal, accounting, logistics, and finance. Furthermore, media, robotics, and scientific research are all undergoing a process of AI infiltration. Teenagers are now consulting ChatGPT on life questions; my three year old daughter is now accustomed to getting answers to questions from ChatGPT! And Gaby’s dad, seven decades away from Lila, is writing a novel with AI that has 65,000 words already!

These forces will continue to grow because of model improvements, product breakthroughs and the natural process of tech adoption. Perhaps, in the near future, all of these tokens will power a “country full of geniuses in a data center,” if we could get the chips…and the power.

‍

Turning electrons into tokens:

In a literal sense, a GPU and even an entire data center carries out the function of converting electricity into tokens. And AI is already taxing the grid. If the trends we outlined above continue, with 53B GPUs required to process 169 quintillion tokens in 2030, we’d need over 37.5K GW to power them (assuming 700 watts per H100). Today, we have ~1,200 GW of power in the US in total. Right now, about 4.4% of electricity is consumed by data centers. At 37.5K GW, we’d need 100% of a grid that is 30x bigger than today’s infrastructure. Impossible, unbelievable, and in need of radical change.

In early 2023, chip shortages crippled the industry and companies were buying GPUs off black markets. The market stabilized through increased supply, better utlization and improved accessibility resulting in a steady decrease in price per token. These dynamics exposed the power bottleneck.

In response, hyperscalers are purchasing large scale power contracts as existing power infrastructure cannot meet current data center demand. Microsoft is restarting Three Mile Island, Meta is buying geothermal power, and Google is partnering with startups to accelerate nuclear power deployment. Power challenges explain, in part, the success of the neo-cloud. Companies like Coreweave, Nebius, and Crusoe were already in the business of power optimization for bitcoin mining. They’re all now playing the AI token game.

‍

Alleviating the token exponential:

To be clear, there extremely powerful levers to move to keep apace with the token exponential:

Improved hardware: GPUs are improving (FLOPs/second/GPU improving 1.35x a year), and breakthrough hardware will deliver astounding token throughput.
Smaller models: Models with fewer parameters that still deliver desired outputs can deliver more tokens with fewer FLOPs.
Model architecture: New designs like Mixture of Experts, which enable less work per token but the same overall model capacity, will drive efficiency.
On-device/edge inference: We are moving to a world of local by default and cloud when necessary. This will require both model and hardware improvements, including breakthroughs in model compression, edge-optimized runtimes, and low-power hardware.
Improved GPU utilization: Currently, utilization rates sit between 30-50%. Tech that virtualizes and orchestrates GPU resources, such as RDMA-based networking and workload scheduling, can enable multiple jobs/users to share GPUs and reduce idle time.

All of this will help. But 462 quadrillion tokens a day is no joke.

‍

The token exponential becomes an S-curve… or a bell curve?

Compute capacity will lag behind token demand for the foreseeable future, but we will hit AI saturation, eventually. Token growth drivers will slow: use cases, model advancements, user adoption and usage rates will all plateau. We’re in an era of radical exuberance that feels both rational and irrational, but the frenetic experimentation and investment will all pass. As Carlota Perez explains in her theory of technological disruption, we’ll eventually reach a steady state, in a new world. At that point, compute will be able to catch up.

While our S-curve envisions token demand climbing for years before compute catches up, Yann LeCun is skeptical. He views today’s transformer models as essentially high-powered guessing machines that cannot, by definition, reason like geniuses; transformer-based intelligence will eventually underwhelm, and we’ll have overbuilt compute. Then perhaps model breakthroughs that are more efficient will emerge. In that scenario, our S-curve falls into a bell-curve.

While no one knows how this will play out, booms and busts are likely. But even quantum leaps in model architectures will require many years (the Transformer came in 2017) to work their way into daily usage. With that runway, the existing backlog of transformer-based workloads will keep today’s GPUs busy, while the industry figures out fresh hardware.

Any ways you cut the deck, we are at the beginning of the AI era, and we need specialized compute solutions.

‍

Opportunities for startups

When we invested in Etched just two years ago, we made a bet that the inference market would grow dramatically faster than expected, making specialized hardware economically essential, nay essential, for AI at scale. While ASICs like Sohu are fundamental to solving the demand problem today, we need more.

To get to a world where powerful AI can build a “country of geniuses in a data center”, we’ll need radical innovations across the entire compute value chain. When people have access to a team of PhDs working for them, digital companions, personalized media, robots, etc. and AI adoption is pervasive, the compute needs will be astronomical. This will require more chips, manufacturing capacity, more energy, more talent, and a ruthless focus on building for the flood that’s coming.

Investing in this space at seed is complex: capital intensive, massive incumbents, unpredictable supply chains, and many moving targets around compute need driven by model architectures and specific use cases. Most VCs stay away. Indeed, US venture funding into the semiconductor industry dropped from 8% in 1995 to 4% in 2005 and sub 1% in 2015. This is starting to change, and we predict it will change radically over the next decade to meet the moment.

We’re looking for opportunities aligned with our thesis that radical breakthroughs are needed:

New materials: There will be a generation of materials beyond silicon; we’re particularly interested in gallium nitride for power, alternative materials for substrates, and photonics for memory.
New architectures: New model architectures, like the distillation architectures that accelerated DeepSeek, and particularly architectures best suited for multimodal and physical AI models, are a huge opportunity.
Infrastructure to reduce the strain on compute: There will be more improvements in interconnection, power accessibility, data center reliability, utilization, and the operation of data centers.
Compute at the edge: Enabling technology that moves workloads away from GPUs and closer to where data is being collected at the edge will reduce the need for reliance on hyperscaler compute.
Orchestration solutions: As compute becomes more heterogeneous, and quality of compute varies dramatically depending on how workloads are managed, there will be emergent solutions in workload orchestration.
Radically novel solutions that don’t fit neatly into any of these buckets: We’re thinking about harnessing energy from the ocean for underwater data centers and putting GPUs in space. Quantum. And technologies that looks crazy today will not in a few years.

‍

Compute will look radically different in a decade if we are to realize the potential of powerful AI, and we’re looking for founders bold enough to take on this daunting challenge. We’re also looking for network nodes who are passionate about this space. Please forward this note to your nerdiest compute friend!

Many thanks to Phil Brown (Meta), Ben Chess (ex-OpenAI), Rob Wachen (Etched), Max Hjelm (Coreweave) and many others who helped shape our perspectives as thought partners for this newsletter.

‍

Best,

Brian, Gaby & Tobias

‍

‍