Your robot cannot be intelligent, quick, and costless all at once. Evolution has already addressed that issue.
Here is a constraint that is rarely articulated by builders of physical AI, although all are struggling with it in silence. A robot's intelligence seeks three objectives simultaneously: it desires to be intelligent, meaning it can reason like a sophisticated model regarding an unfamiliar environment; it wants to be quick, signifying it can respond within the stringent, predictable timing required by physical control loops; and it aims to be autonomous, meaning it continues to function when the network fails, the warehouse Wi-Fi is unavailable, or the machine operates in areas without signal.
Achieving all three in a single computing unit is impossible. You must choose any two.
To be clear, bounded autonomy can already function effectively. Industrial arms, drones, and limited autonomy systems can be quick and offline due to their narrow task definitions. The trilemma becomes significant at the frontier: you cannot integrate frontier-level general reasoning, deterministic real-time response, and complete offline autonomy within the same power-constrained system, nor for the same control loop.
A frontier-scale model is intelligent, and if its sensors are streamed to a data center, it can also be fast, but then it is reliant on a network and lacks freedom. Reduce that model so it fits on a 15-watt embedded device, and it becomes quick and free, yet it sacrifices intelligence. Running the large model in the cloud and querying it only occasionally allows for intelligence and freedom, but it is never quick. You can only have two of the three corners at a time. I refer to this as the embodied trilemma, which is the underlying reason why the edge/cloud decision is the most challenging architectural choice in robotics. Many teams consider it merely a deployment issue, but it is more like a fundamental principle.
You cannot bypass the trilemma.
This trilemma is not just a trend or a temporary hardware limitation that can be ignored; it arises directly from physics and power constraints.
Frontier reasoning quality is currently found in models that demand tens of gigabytes of memory and data center-grade accelerators—hardware that is impractical for a mobile robot to carry on a battery. Thus, to be deemed "smart" necessitates a choice: either connect the data center to the robot through a network, forfeiting freedom, or use a smaller onboard model, forfeiting intelligence.
Real-time control is even less negotiable. A wide-area network round trip incurs 30 to 100 milliseconds of latency, with its variability being more crucial than the average. An occasionally stalling control loop is worse than one that is consistently subpar because controllers are calibrated for deterministic timing. The moment “fast” relies on a network, you have also given up “free,” as the network now impacts your control loop.
Thus, the triangle remains intact. Techniques like quantization, distillation, and improved accelerators may adjust the corners, but they do not eliminate them. Any assertion to the contrary typically obscures the corner that was sacrificed.
Quantifying the triangle.
Quantifying the constraint is helpful because once you outline the timing, the corners become tangible.
Begin with latency. The total delay of a perception-to-action decision made in the cloud comprises several components:
Lcloud = tcapture + tencode + tuplink + tinference + tdownlink + tdecode.
In contrast, if the same decision is made onboard, much of that delay diminishes:
Ledge = tcapture + tinference,local.
The difference here lies not in inference time, which may actually be shorter in the cloud with better hardware; the distinction comes from the network components, tuplink + tdownlink, and, more significantly, its variance. In a measured cloud robotics setup over a fast wired connection, round trips averaged about 30 milliseconds, while real-world applications often see 100 to 300 milliseconds, with wireless connections yielding even higher latency. Edge processing, conversely, reduces round trips to approximately 1 to 5 milliseconds since no data is transmitted beyond the machine.
Now articulate the rule that determines the viability of a loop on a particular path. A control loop with a timing budget of Lbudget can operate on a specific compute path only if:
Lpath + k·σjitter ≤ Lbudget,
where σjitter is the standard deviation of the path’s latency and k is the safety factor necessary for deterministic performance. The k·σjitter term is particularly insidious. Studies on teleoperation indicate that a link maintaining a steady 100 milliseconds is workable, but one that fluctuates between 30 and 200 milliseconds causes erratic, unpredictable movement, as the controller cannot adapt to an unpredictable delay. For reflex loops, the budget is 1 to 10 milliseconds. No wide-area path can meet this condition. The mathematics is what forbids it, not the designer.
Control loop -
Timing budget -
Onboard path (~1-5 ms) -
Wide-area path (~30-300 ms) -
Reflex (motor control, emergency
Other articles
Your robot cannot be intelligent, quick, and costless all at once. Evolution has already addressed that issue.
A robot's intelligence can possess attributes like being smart, fast, or independent of network reliance, but it cannot achieve all three simultaneously. This embodied trilemma is rooted in physics, and the design that addresses it was shaped by evolution over the past half a billion years.
