NVIDIA Introduces Vera CPU, Boasts 88 Cores Engineered for Running Independent AI Systems at Scale

NVIDIA engineers built the Vera CPU on 88 unique cores they dubbed Olympus. These cores are based on Arm v9.2 technology, with NVIDIA-specific modifications. Instead of dividing everything across numerous smaller chips, they’ve put it all on a single piece of silicon, reducing signal delays caused by needing to transport data across multiple dies.
Spatial multithreading allows each Olympus core to do two full-fledged tasks at the same time, rather than switching back and forth in little slices. They’ve physically divided up the pipelines, caches, and registers such that each thread has its own dedicated resources, while the second thread runs alongside without disturbance. That combination results in 176 threads out of 88 cores. They also have instructions decoded in batches of ten at a time, a neural predictor that predicts which conditional branches the code will take twice per cycle, and a prefetch unit that has been optimized for graph data, so it can pull in exactly what the code needs before the code even knows what it is asking for.
Memory-wise you’re talking 1.5 terabytes using LPDDR5X modules. Under full load, each core averages 13.6 gigabytes per second, resulting in a per-chip aggregate bandwidth of 1.2 terabytes. When compared to a normal data-center CPU, this configuration moves twice as much data while consuming half the power for the memory subsystem. Again, all of this is connected via a second-generation coherency fabric, so you don’t have to worry about NUMA zones since all cores always have the same view of memory.
When it comes to connecting to GPUs, you’re looking at NVLink-C2C lines, which can transmit 1.8 terabytes per second in coherent communication. That’s more than seven times quicker than the most recent PCIe generation, while normal PCIe 6.0 and CXL 3.1 connectors remain available for other devices. The Vera is also the first CPU to natively support FP8 math precision, which speeds up AI calculations without losing accuracy.
Rack-level scaling takes this to an entirely new level. With 256 liquid-cooled Vera processors packed into a single cabinet, plus networking and storage accelerators, total memory across the rack exceeds 400 terabytes. With that many processors, you’re looking at an aggregate memory throughput of 300 gigabytes per second, not to mention supporting a colossal number of threads (45,056 to be exact) all running simultaneously. That means you can have more than 22,500 different environments running at the same time, all at full speed and without any interference. Real-world tests revealed that scripting, compilation, data processing, and other tasks ran 1.8 to 2.2 times quicker than NVIDIA’s previous Grace processors in the same rack, with certain situations seeing a six-fold increase in CPU throughput compared to prior designs.
Production is currently running at full capacity, and partners plan to begin shipping systems based on Vera in the second half of 2026. Early deployments are aimed at hyperscale clouds, enterprise analytics clusters, and high-performance computing facilities, which require a dependable CPU to keep up with the demands of all those large GPU arrays.
[Source]
NVIDIA Introduces Vera CPU, Boasts 88 Cores Engineered for Running Independent AI Systems at Scale
#NVIDIA #Introduces #Vera #CPU #Boasts #Cores #Engineered #Running #Independent #Systems #Scale