Same introduction as the previous post.
Should We Move Data and Computation to the Cloud, and When?
Introduction
In the last decade we have seen an extraordinary jump in both raw computation power and affordability on consumer devices. Three-nanometre system-on-chips, with integrated GPUs and dedicated neural engines, make current laptops and even tablets fast enough to train small neural networks by exploiting unified memory architectures that let CPU and GPU share up to half a terabyte of RAM. At the same time, flagship smartphones are shipping with dedicated NPUs to run AI models locally using accelerators that rival the efficiency of past data-centre hardware.
At the same time, an even more dramatic shift has taken place away from the device and into large-scale data centres. Cloud providers now expose seemingly unbounded pools of CPUs, GPUs and TPUs, surrounded by petabytes of storage and high-bandwidth networks, and rent them by the second through APIs. Tasks that were once unthinkable outside of supercomputers—training foundation models, running billion-parameter language models, performing real-time fraud detection on global transaction data—are now accessible to anyone willing to pay a cloud bill. In many cases this is more cost-effective than buying the hardware outright, because accelerators improve so quickly that the hardware becomes obsolete in just a few years and because expensive training is carried out only occasionally rather than continuously. On top of this, even non-expert users already rely on this “remote brain” without noticing: search, maps, photo storage, email, video streaming and collaborative editing are all thin interfaces on top of remote computation.
This raises a question: if both ends of the spectrum keep improving, where should data and computation actually live? One vision does not negate the other, but I argue that most computation (including many lightweight tasks) and most data will eventually migrate to the cloud, turning personal devices into increasingly slim, almost disposable clients: screens, sensors and a bit of cache. This essay argues for this predominantly cloud-centred future and clarifies when it still makes sense to keep computation and data local.
A Cloud-Centred Future
Andy Jassy, long-time CEO of AWS, has argued for almost a decade that in the “fullness of time” very few companies will own their own data centres, with most workloads moving to the cloud and only a small fraction remaining on-premise for special cases. The logic is straightforward: accelerators are expensive and age fast, so it is more efficient for a handful of large platforms to buy them in bulk and keep them busy across thousands of customers than for every company to own a few GPUs that sit idle and must be replaced after a few years.
A similar argument appears in a bearish thesis on Nvidia by Michael Burry, who points out that hyperscalers are replacing GPUs increasingly quickly while simultaneously extending depreciation schedules in their financial reports. In his view, this mismatch inflates current earnings and may be a signal of a valuation bubble.
AI strengthens this argument. On-device neural engines are impressive, but they can only run relatively small models with limited context. In contrast, cloud providers can host hundred-billion-parameter models, continuously updated with fresh data and fine-tuned for different domains, and share them across millions of users. Industry whitepapers on generative-AI smartphones explicitly assume this device–cloud split as the default pattern for powerful assistants. Sundar Pichai has described this transition as a move from a “mobile-first” world to an “AI-first” one, where the computer behaves more like a persistent assistant following the user across devices. Such an assistant realistically needs to live where the data, compute and coordination capabilities are: in the cloud.
We are already partway there. When we use search, maps, email or collaborative documents, almost all meaningful work happens on remote servers; personal devices act mainly as interfaces. Cloud gaming and remote desktop services extend this logic to the extreme by streaming final pixels from a data centre. Elon Musk has made this view explicit by predicting that smartphones will become edge nodes for AI video inference: essentially screens and radios connected to large cloud models that decide what to display. If this vision is even approximately correct, insisting that personal devices remain powerful, self-contained general-purpose computers begins to look more like inherited habit than technical necessity.
The Limits of Offloading
Arguing that most computation will move to the cloud does not imply that all of it should. Physical and social constraints require some intelligence to remain close to the user. The first is latency and reliability. The speed of light is non-negotiable: even with ideal networks, a round trip to a distant data centre takes tens of milliseconds, often more in practice. For some tasks this is acceptable; for others it is disastrous. An augmented-reality headset that waits hundreds of milliseconds before rendering a visual cue causes discomfort. A vehicle that loses connectivity in a tunnel cannot afford to forget how to brake. These constraints motivate edge computing: smaller clusters placed near users, such as at cell towers, factories or hospitals, replicating some cloud economies of scale while meeting real-time requirements.
The second constraint is privacy and sovereignty. Not all data is equal. Entertainment preferences are fundamentally different from health records or financial transactions. Regulations increasingly impose residency and processing constraints that require sensitive data to remain on-device or within specific jurisdictions. As a result, many modern systems adopt hybrid designs: pre-process and anonymise data locally, send only compressed representations to the cloud, and keep raw data under user or organisational control. This does not contradict a cloud-centred future, but it makes it layered rather than absolute.
A related issue is concentration of power. Today, access to the most capable language models is mediated by a handful of large companies, none primarily organised around data privacy. Even when free tiers exist, incentives favour retaining prompts and outputs for training future models. This creates real risks: accidental leakage of proprietary code, overfitting to living artists’ styles, or resurfacing sensitive information. In response, a counter-movement is pushing powerful AI back onto personally owned machines. Projects such as George Hotz’s tiny corp, which sells the tinybox as a personal compute cluster for local training and inference, explicitly position themselves as alternatives to cloud oligopolies. Whether this model can ever rival hyperscale infrastructure is unclear, but the willingness to spend significant sums on local AI hardware signals that privacy concerns are not merely theoretical.
A third constraint is energy. Data centres are far more efficient than billions of scattered devices at converting electricity into computation, but they are not free. Recent estimates suggest that data centres and transmission networks already account for around one percent of global electricity use, with AI workloads expected to increase this share. At the same time, pushing heavy models entirely onto devices would drain batteries and accelerate hardware wear for work that could be centralised and shared. The relevant question is therefore not “cloud or local?”, but where in the stack it makes sense to spend energy for a given task under carbon and efficiency constraints.
Conclusion
A productive way to approach the original question is to avoid ideology and focus on trade-offs. For any task, three variables matter: data sensitivity, latency and reliability requirements, and workload size or burstiness. Together, these determine not only where computation should live, but also when it should migrate.
Under this view, a cloud-centred future does not imply that phones and laptops become useless shells. Instead, their role shifts. Immediate interaction, local context and privacy-critical logic remain near the user, while heavy computation—large models, long-term state and cross-user learning—moves into shared infrastructure that can scale and upgrade independently. As networks improve and edge deployments proliferate, more workloads will cross the threshold where offloading is not just cheaper but better. At that point, the idea that personal devices are “just screens, sensors and a bit of cache” stops being science fiction and becomes a reasonable extrapolation of trends already under way.