Well today at 9:00AM EST I bought the new NVidia Pascal Titan X. According wccftech they think at times it may hit 12 Tflops, due to something similar to INTEL Turbo boost. It should arrive tomorrow so we should see what it does. Will the new Titian work out of the box with GPU GIRD or will I have to wait for CUDA 8?
|ID: 44082 | Rating: 0 | rate: / Reply Quote|
You (like all of us with lesser Pascal GPUs) have to wait for the public release of CUDA 8.
|ID: 44084 | Rating: 0 | rate: / Reply Quote|
I was afraid as much, I was hoping someone found a work around :)
|ID: 44085 | Rating: 0 | rate: / Reply Quote|
You can expect 25-30% more performance from a new Titan X versus a GTX1080 for about a ~70% higher price. That's actually "kinf od reasonable" in the Titan world, as the previous ones were even more expensive relative to the best high-end GPU.
|ID: 44089 | Rating: 0 | rate: / Reply Quote|
It was the case that the GTX980Ti didn't scale (if not the 980) and the GTX1080 (and possibly 1070) outperform the GTX980Ti in most trials, so it's likely that the GTX1080 won't scale well for here, never mind the Titan X (Pascal).
GPU ns/day % Watts Perf/W % Gflops (sp) GFlops % Titan X ? ? 250 ? 10974 boost 124 GTX1080 135.6 100 180 100 8873 boost 100 GTX1070 125.8 92.8 150 111.4 6463 boost 72.8 GTX1060 84.8 62.5 120 93.75 4372 boost 49.3 GTX980Ti 119.4 88.1 250 63.4 6060 boost 68.3 GTX980 94.1 69.4 165 75.7 4981 boost 56.1 GTX970 76.7 56.6 145 70.3 3920 boost 44.2 GTX960 47.0 34.7 120 51.6 2413 boost 27.2
Observed performance per Watt as a relative percentage:
Observed relative performance=92.8% (1070 vs 1080)
150/180=83.333W (relative power usage)
GFlops (SP) = 2*shaders*clock speed. Reference boost frequencies used!
Note that both series boost higher than reference values but boost varies by model, conditions and can be controlled and constrained.
While this is based on actual observed performances, it’s still somewhat theoretical. To be accurate you would need to use actual observed power usages and actual boosted GFlops (calculated from reference). That said it’s still a good indicator.
Numbers taken from AnandTech’s GPU 2016 Benchmarks, http://www.anandtech.com/bench/product/1715?vs=1714
Although the primary observation is that the GTX1070 offers best performance/Watt, it's likely that both it and the 1080 could be significantly tweaked for performance/Watt by capping the power &/or temps, and it's also possible to run 2 apps on the one big GPU to improve overall throughput (when there is an abundance of apps).
With more basic apps such as Einstein (CUDA 3.2 & 5.5) and MW you may see a more linear performance relative to the Pascals GFlops (as these apps don’t utilize the more complex CUDA instruction sets).
GPUGrid is more similar to Folding but the app is different so it may bottleneck in different places. For that reason a performance chart will likely look similar but the choicest card(s) might be different...
Other hardware factors. The Titan has 3MB L2 cache whereas the GTX1080 has 2MB. The Titan’s bus width (& ROPs ratio) are slightly (7%) higher, so there are less potential hardware bottlenecks. Should help it scale but it is still 24% bigger than a 1080.
Note that the GTX1060’s (GP106) cache is only 1.5MB, which might explain the slightly poorer performance at Folding. While 1.5MB is likely to be a factor at GPUGrid too, how significant that is remains to be seen.
PS the Titan X (Pascal) isn’t full-fat; the Quadro P6000 has two more SM’s for 3840 CUDA cores (not that I recommend either card for here – both are far too costly).
- Opt out of Beta Tests
- Ask for Help
|ID: 44090 | Rating: 0 | rate: / Reply Quote|
That is a great comparison. Thanks.
|ID: 44091 | Rating: 0 | rate: / Reply Quote|
With more basic apps such as Einstein (CUDA 3.2 & 5.5)
Don't let the CUDA version fool you: Einstein uses complex and carefully optimized code. They're not using advanced library functions from nVidia; instead they're doing the complicated stuff on their own or with other libraries.
And currently they're streaming through their complete array(s) with each operation, in the way a GPU is classically supposed to work. This makes their code significantly dependent on GPU memory bandwidth (my GTX970 runs at >80% memory controller utilization at 3500 MHz), which means any bigger GPU doesn't scale as well as its GFlops suggest, but is slowed down according to its memory bandwidth. And some other factors.. e.g. AMD Fury is not the home run at Eisntien one would expect due to its massive bandwidth, because a driver bug prohibits them from running more than 1 task concurrently, which is not enough to saturate a fast GPU.
Pascals are OK at Einstein, especially with eco tuning, but are not the homeruns which their raw GFlops suggest.
Scanning for our furry friends since Jan 2002
|ID: 44097 | Rating: 0 | rate: / Reply Quote|
So far I have found scaling to be pretty good in Folding@home when they are using there newer core 21, I get about 1.07 million Points per day about double what I got with my 980ti. However, on their older core 18 the scaling isn’t nearly as good. Although no one at F@H has ever confirmed this I have always found that every 100k PPD oddly enough is very similar to how many teraflops I should be getting.
|ID: 44100 | Rating: 0 | rate: / Reply Quote|