The Data Center Moves to Your Machine

The right goal for an AI system is to deliver the most token value per watt, for each user. That sounds simple. It isn't, because three things pull against each other.

Accuracy demands the most capable models, which are expensive to run. Privacy demands that some work never leave your machine. Cost and energy demand that you don't spend a frontier model's compute on a task a smaller one can handle. You cannot maximize one without respecting the others.

Balancing all three is an orchestration problem. And orchestration is the thing Perplexity has always done.

Today we announced the next step for Personal Computer: the first hybrid local-server inference orchestrator. It reasons about what work should run on your device and what work should go to agents in the cloud, and it routes each part of a task to the right place automatically.

From models to compute

Perplexity began by orchestrating tools and sources to produce accurate, cited answers. Computer expanded that to a harness for hundreds of agents, spun up in more than twenty frontier models, choosing the right one for each task. Now we can extend the same idea to compute itself: which model, where it runs, and why.

Hybrid agentic inference is for work that includes sensitive data but needs powerful AI. Things like financial records, health information, and personal files. The compact model runs locally on your device to determine when sensitive data should also be kept locally.

Meanwhile, work that needs a frontier model's full capability runs on the server. Most real tasks are a mix, so Personal Computer splits them and coordinates the parts. Unlike tools that ask you to pick local or cloud up front, this happens on its own, task by task.

The device is the data center

For years, more capable silicon mostly meant faster apps and longer battery life. That changes when the chip can run real inference. The better the local hardware, the more the orchestrator can keep on your machine, and the more it can reserve the server for the work that genuinely requires it.

We unveiled this with Intel, and the same model-agnostic harness runs across other local silicon, including NVIDIA's RTX Spark. The race for local compute is on. As chips advance, only Perplexity has the agentic harness and applied inference engineering for truly seamless orchestration.

This also changes the math for everyone watching the compute shortage. When sensitive work and routine work move onto the devices people already own, you don't need to build as much centralized infrastructure to serve it. It changes what sovereignty looks like, too: important data can stay in its own jurisdiction without a country standing up a data center to keep it there.

People would rather own a data center in their laptop than build on one they don't control.

The right architecture for efficiency

There's a reason this architecture fits Perplexity. Our business has always been accurate AI, not maximizing the tokens we sell. That's the right incentive for optimizing value per watt: we win when the answer is right and the work gets done, not when it consumes more compute.

Hybrid AI has been an industry ambition for a long time. Personal Computer with local inference, coming in July, is the first product that makes it real, and the first to treat compute as one more thing to orchestrate intelligently, across your machine and the cloud.

Share this article