Every six months, the TOP500 team releases a list of the 500 most powerful supercomputers in the world based on their results in the long-running Linpack benchmark. These machines are typically owned by governments or are built as public-private partnerships between various government and industry partners who share costs and computer time. This year, Nvidia has made its own entry — and no, I don’t mean that Nvidia is powering someone else’s system, or that the company collaborated with a different firm. Nvidia built its own frickin’ supercomputer.
The new DGX SaturnV contains 60,512 CPU cores (the machine relies on Intel’s Xeon E5-2698v4 for its CPUs) and 63,488GB of RAM. The machine is actually a cluster of 125 DGX-1 systems — that’s the AI processing “supercomputer in a box” that Nvidia unveiled last year, and the first machine to feature the company’s Pascal GPUs (the full GP100 configuration). According to Nvidia, the new machine is 2.3x more energy efficient than the closest Xeon Phi system of equivalent performance and it delivers 9.46GFLOPS/watt, a 42% improvement over the most efficient system unveiled last June. That’s a huge improvement in a relatively short period of time, though I do want to note an important caveat to these kinds of figures. One thing we’ve covered before in our previous discussions of exascale computing is how ramping compute clusters upwards creates very different constraints than we typically consider when talking about desktops or even servers. Factors like interconnect power consumption, total memory loadout, and memory architecture all play a significant part in how these metrics play out.
In other words: Nvidia’s new performance/watt metrics are great for Pascal and a huge achievement, but I don’t think we can read much about the potential power efficiency of Xeon Phi without seeing something more closely akin to an apples-to-apples comparison. It’s also interesting that Nvidia chose to use Intel Xeons for its own power efficiency push than OpenPOWER, despite being a fairly vocal supporter of OpenPOWER and NVLink. Given that the new supercomputer relies on Nvidia’s DGX-1, however, it probably made more sense to build its own server clusters towards the x86 ecosystem rather than trying to launch a new AI and compute platform around Power at this time.
As for what it intends to do with its new supercomputer, Nvidia writes: “We’re also training neural networks to understand chipset design and very-large-scale-integration, so our engineers can work more quickly and efficiently. Yes, we’re using GPUs to help us design GPUs.” Are Xzibit memes too old to share these days? We recently dug into the history of the G80, Nvidia’s first unified shader architecture and modern GPU on its tenth anniversary last week, so check it out if you want some information on how Nvidia has gone from a gaming-focused company to a player in the HPC and compute spaces.