Nvidia’s Pascal GP100 GPU: massive bandwidth, enormous double-precision performance

For the past year, enthusiasts have been chomping at the bit waiting for the next generation of graphics cards to arrive. The 28nm node has persisted for far longer than any previous generation, and while both AMD and Nvidia have introduced multiple products on that node, customers have clearly wanted the power efficiency and performance improvements that the 14/16nm node could provide. Today, Nvidia showcased the full HPC version of Pascal and detailed what the card would offer compared with its previous Maxwell and Kepler products.

Pascal’s renewed focus on high-speed compute

When Nvidia designed Maxwell, it made the design to remove much of the double-precision floating point capabilities that were baked into its previous Kepler architecture. The old Tesla K40, based on the GK110 GPU, was capable of up to 1.68 TFLOPS/s, while the Tesla M40, which used the Maxwell GM200, could only reach 213 GFLOPs. The M40 still had an advantage over the K40 in single-precision floating point, but double-precision floating point performance was sharply curtailed. As we discussed last week, when AMD launched its FirePro S9300 x2, this limited the kinds of workloads where the M40 could excel.

Pascal’s current GP100 variant adds back all the double-precision floating point that Maxwell was missing — then stuffs some more in, just for good measure. The chart below compares Kepler, Maxwell, and Pascal. Note that the dev blog post states that Pascal can include up to 60 SMs, while the variant described below has just 56.

Pascal Chart 060420161259

One interesting aspect of Pascal’s design is that Nvidia has again reduced the number of streaming cores in each processing block, or SM and adopted the same ratio that AMD uses, with each compute block containing 64 processors. The total number of streaming processors has increased 17%, as has the number of texture processors. There’s no word yet on ROP counts, but assuming Nvidia followed its historic pattern, the GP100 should have at least 96 ROPS and possibly 128. Base clock is also up 40% over Maxwell, and while Tesla clocks are typically more conservative than their desktop counterparts, the fact that Nvidia squeezed a 40% clock jump out of this silicon suggests we can look forward to similar gains when Pascal comes to the consumer market.

The memory interface is the largest generational upgrade. HBM2 offers a 4096-bit bus and 720 GB/s of memory bandwidth, compared with 336GB/s of bandwidth available on the highest-end Titan X.

gp100 SM diagram 060420161259

Pascal also utilizes a simpler datapath organization, superior scheduling with better power efficiency, overlapped load/store instructions, support for Nvidia’s NVLink interface, support for 16-bit floating point (half precision), and improved atomic functions. GP100 also supports ECC memory natively, meaning there’s no performance or storage penalty for activating the feature.

8 GPU hybrid cube mesh 060420161259

One note on NVLink: There’s been confusion over where and how this bus is used. For the most part, NVLink is a method of connecting multiple GPUs to each other, especially cross-connections in a multi-socket system, where forcing GPUs attached to two different CPUs to talk to each other would significantly degrade performance.

NVLink can be used to connect the GPU to the CPU directly, but Nvidia’s blog post specifies that this is only applicable to POWER processors.

4 GPU CPU Quad 060420161259

The diagram above is described as follows: “The [above] figure highlights an example of a four-GPU system with dual NVLink-capable CPUs connected with NVLink. In this configuration, each GPU has 120 combined GB/s bidirectional bandwidth to the other 3 GPUs in the system, and 40 GB/s bidirectional bandwidth to a CPU.”

Nvidia is also claiming that Pascal will offer “Compute Preemption” with a significantly improved computing model. This is one area where Team Green has notably lagged AMD, whose asynchronous compute performance has been much stronger than anything NV has brought to bear. Asynchronous compute and compute pre-emption are not the same thing — we’ll have to wait for shipping hardware to see how this compares with AMD’s implementation and what the differences are.

An impressive leap forward for HPC, but no consumer launch date yet

It’s obvious that Pascal will significantly improve Nvidia’s HPC position, and that’s important since the company has huge plans for deep learning, self-driving cars, and other HPC workloads. Pascal looks like it’ll be a potent match for Xeon Phi, Nvidia’s primary competitor in this space.

Nvidia has remained mum on consumer launch dates, however, so we’ll have to wait and see when this tech makes it to the mass market. Rumors we’ve heard in other contexts suggest that HBM2 hardware won’t hit the consumer market until later this year due to high initial prices for first run equipment. It’s entirely possible that Nvidia is using GP100 to fill out its initial high-end products, but will only move to the HBM2 standard for upper-end consumer tiers in the back half of 2016.

When those cards do arrive, they should be a significant upgrade over Maxwell. The core counts on Pascal aren’t much higher than Maxwell, but the improved clock speeds will drive performance higher as well, and that’s before any improvement from efficiency gains. If you’re in the market for a new GPU this year, I strongly advise waiting to see what NV and AMD ship in the consumer space if that’s possible.



Source: extremetech.com

Last modified on 06/04/2016

Share this article

About Author

Samer Hmouda

I was Borne and raised in Kuwait, I take my degree in low from Lebanese university in 1994.

Technology is my passion so I read and teach myself a lot of things related to networking and technology. Internet was one of the most help tools in my learning. I watch many videos and learn from many articles which I read on the web, so now I will try to help others same as I got help from others without waiting for any thanks.

Samer H.

Website: www.middnet.net/
Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.

About us

Middle East Network "middnet.net" is not responsible for the content of external sites.


For your Advertise, call us on:
+961 3 247 341
+961 788 70 6 70

Last posts


Join our Newsletter for latest news and update about technology, phones, software, hardware and security.