CUDA-Z Info+Bench (Nvidia only) - updated Dec 2015

mitch_de · June 26, 2011

Netkas found out:

In short: That’s trash, not drivers!

Gringo Vermelho · July 1, 2011

CUDA works on Fermi again on 10.6.8 with the latest CUDA 4.0.19 drivers + June 24 Quadro 4000 drivers. No modifications needed.

http://www.insanelymac.com/forum/index.php...t&p=1705325

CUDA-Z works, but the CUDA preference pane didn't tell me about the 4.0.19 drivers.

Haven't tried anything OpenCL yet.

Carstiman · July 1, 2011

the CUDA preference pane didn't tell me about the 4.0.19 drivers.

Hi, CUDA preference works for me with 4.0.19

Graebags · July 4, 2011

Worked for me with 4.0.19 till the first reboot. now wants update! update! update! damn!

Think its about the same with 511 as DP4, despite new Nvidia GPU and Cuda drivers.

meroy · July 19, 2011

Hi all. My GTX 295 Co-op edition is up and running in Snow Leopard 10.6.8 using Apple's native drivers in 10.6.8 and the OpenGL framework. There are 2 GPUs on this card, therefore x2 the results.

I added a post to a thread on insanelymac to give solution for GTX 295 owners who want to update to 10.6.8:

http://www.insanelymac.com/forum/index.php...p;#entry1716804

Basically, there are changes made in 10.6.8 which render the existing EFI string useless unless 2 extensions are installed in /Extra/Extensions.

The following is the result of combining both GPUs on the GTX 295 card for the N-Body demo in CUDA 4.0.

./nbody -numdevices=2 -n=61440 -benchmark

[nbody] starting...
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64	   (use double precision floating point values for simulation)
-numdevices=N (use first N CUDA devices for simulation)

> Windowed mode
> Simulation data stored in system memory
> Single precision floating point simulation
> 2 Devices used for simulation
> Compute 1.3 CUDA device: [GeForce GTX 295]
> Compute 1.3 CUDA device: [GeForce GTX 295]

61440 bodies, total time for 10 iterations: 697.011 ms
= 54.158 billion interactions per second
= 1083.161 single-precision GFLOP/s at 20 flops per interaction

[nbody] test results...
PASSED

mitch_de · July 19, 2011

Great nbody result!

Here my 9600GT:

nbody -n=61440 -benchmark

[nbody] starting...

Run "nbody -benchmark [-n=]" to measure perfomance.

-fullscreen (run n-body simulation in fullscreen mode)

-fp64 (use double precision floating point values for simulation)

-numdevices=N (use first N CUDA devices for simulation)

> Windowed mode

> Simulation data stored in video memory

> Single precision floating point simulation

> 1 Devices used for simulation

> Compute 1.1 CUDA device: [GeForce 9600 GT]

61440 bodies, total time for 10 iterations: 4812.126 ms

= 7.845 billion interactions per second

= 156.890 single-precision GFLOP/s at 20 flops per interaction

[nbody] test results...

PASSED

GA_EP35:nbody ami$

For others: compiled nbody attatched!

use nbody -n=61440 -benchmark for comparing our results

without parameter -benchmark you will get an window which shows what happens (simulated)

I added the nbody resuls and nbody to first posting.

Happy nbody benching - which is much more usefull than the small (bandwithtest) CUDA-Z performance values.

nbody.zip

mm67 · July 19, 2011

GTX 460 results :

[nbody] starting...

Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.

-fullscreen (run n-body simulation in fullscreen mode)

-fp64 (use double precision floating point values for simulation)

-numdevices=N (use first N CUDA devices for simulation)

> Windowed mode

> Simulation data stored in video memory

> Single precision floating point simulation

> 1 Devices used for simulation

> Compute 2.1 CUDA device: [GeForce GTX 460]

61440 bodies, total time for 10 iterations: 2008.823 ms

= 18.791 billion interactions per second

= 375.829 single-precision GFLOP/s at 20 flops per interaction

[nbody] test results...

PASSED

meroy · July 19, 2011

My EVGA GTX 295 Co-op edition is over-clocked and has been running like that since several years now. The settings has allowed it to match a GTX 285 (times 2) in CUDA performance. I'm going to restore the 2 ROMs (one per GPU) back to the factory settings and report back the N-body results for the standard clocks.

Using the factory clocks, the N-body results against both GPUs running simultaneously are: (via -numdevices=2)

./nbody -n=61440 -numdevices=2 -benchmark
[nbody] starting...
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64	   (use double precision floating point values for simulation)
-numdevices=N (use first N CUDA devices for simulation)

> Windowed mode
> Simulation data stored in system memory
> Single precision floating point simulation
> 2 Devices used for simulation
> Compute 1.3 CUDA device: [GeForce GTX 295]
> Compute 1.3 CUDA device: [GeForce GTX 295]
61440 bodies, total time for 10 iterations: 847.679 ms
= 44.532 billion interactions per second
= 890.638 single-precision GFLOP/s at 20 flops per interaction
[nbody] test results...
PASSED

And the N-body result against one GPU on the GTX 295:

./nbody -n=61440 -benchmark
[nbody] starting...
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64	   (use double precision floating point values for simulation)
-numdevices=N (use first N CUDA devices for simulation)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
> Compute 1.3 CUDA device: [GeForce GTX 295]
61440 bodies, total time for 10 iterations: 1687.764 ms
= 22.366 billion interactions per second
= 447.322 single-precision GFLOP/s at 20 flops per interaction
[nbody] test results...
PASSED

mitch_de · July 19, 2011

I will collect all nbody results in first posting (bottom of first posting) for easy look+compare :worried_anim:

myrorym · July 19, 2011

10.6.7 : CUDA 4.0.17 : Driver 256.02.05.f1

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
> Compute 2.0 CUDA device: [GeForce GTX 470]
61440 bodies, total time for 10 iterations: 1393.516 ms
= 27.089 billion interactions per second
= 541.777 single-precision GFLOP/s at 20 flops per interaction
[nbody] test results...
PASSED

Soon test 10.6.8 or 10.7 with updated drivers.

TY

meroy · July 19, 2011

Hi all,

Just wanted to chime in to say that I finally got sleep working on my system with the GTX 295 card.

http://www.insanelymac.com/forum/index.php...t&p=1717189

Note: The culprit was was with the device-type. Each GPU is different. The one having display connections is set to NVDA,Parent and the other to NVDA,GeForce.

mitch_de · July 20, 2011

CUDA Speed should only depends on CUDA driver versions, OS X Version (OpenGL Version) should not have any speed diff effect.

meroy · July 20, 2011

One can even benchmark double-precision via the N-Body CUDA demo. GTX 4xx owners will be able to see a much larger increase in performance when comparing to GTX 2xx cards.

Here are my results:

./nbody -fp64 -n=30720 -benchmark

GTX 295 OC'ed:

-- One GPU: 66.966 double-precision GFLOP/s

-- Two GPUs via -numdevices=2: 128.672 double-precision GFLOP/s

GTX 295 (standard clocks):

-- One GPU: 55.010 double-precision GFLOP/s

-- Two GPUs via -numdevices=2: 106.478 double-precision GFLOP/s

This is where a GTX 4xx can shine over a GTX 2xx variant.

Try -n=15360 if -n=30720 reports unspecified launch failure.

Not all cards support double-precision.

meroy · July 20, 2011

Lion is soon out and folks will be able to benchmark GTX 5xx series.

The following is taken from my Windows 7 box running a pair of GTX 560's. They came factory OC'd at 900/1800/2004 (4008) 1.012 volts. However, I under-clocked/under-volted them down to 855/1710/2100 (4200) 0.987 volts.

Single-Precision: ./nbody -n=61440 -benchmark

-- One GPU: 548.804 single-precision GFLOP/s

-- Two GPUs via -numdevices=2: 1068.541 single-precision GFLOP/s

Double-Precision: ./nbody -fp64 -n=30720 -benchmark

-- One GPU: 89.702 double-precision GFLOP/s

-- Two GPUs via -numdevices=2: 166.125 double-precision GFLOP/s

I wish that NVIDIA will one day make a single-PCB card containing 2 GTX 560's to have a good balance for compute-power and electric power utilization.

hiphopboy · July 21, 2011

Hope have 4.0.0.20 today for support Lion Final

Wayang-NT · July 22, 2011

new CUDA 4.0.21 ....

hiphopboy · July 22, 2011

oh ! Thanks WaYang ! Wait 1 week for this

Lord_Jeremy · January 17, 2012

When I run nbody I get the following:

dyld: Library not loaded: @rpath/libcudart.dylib
 Referenced from: /Users/lord_jeremy/Applications/./nbody
 Reason: image not found

I've got the nVidia CUDA package 4.1.25 installed and CUDA-Z shows benchmark info so I presume it's functioning correctly. Any thoughts?

Gringo Vermelho · January 17, 2012

Not sure, I think you have to install the CUDA SDK or tools (or whatever) as well in order to use nbody. Everything is available on the CUDA download page.

Lord_Jeremy · January 18, 2012

Yep, that was it. Thanks!

Cavendish Qi · August 6, 2012

Thanks for the info:
10.8, NVidia GT 540M, 1G, CUDA Driver 5.0.17

CUDA-Z results:
nbody result:
https://gist.github.com/3278149
https://gist.github.com/3278246

mitch_de · October 4, 2012

NEW (beta) Sep-2012 Version available!

http://sourceforge.n...es/cuda-z/Beta/

Select the .dmg for download OS X version.

RobertX · October 5, 2012

...ahhh, in comes the smell of something sweet and new... :rolleyes:

mitch_de · October 5, 2012

Performance Tab : Device to Device Speed shows VRAM Speed.

In your case, GT 520 , you can see that that value is not good, because of limited (Bits) vram bandwidth of GT 520 (GT 420, GT 620) and others using 64/128 Bit instead of 256/384 Bit.

Host to Device or Device to host values (VRAM copy/access over PCI-E) are mostly limited by PCI-E Speed and way less than transferspeed onboard VRAM.

maximus · October 17, 2014

I am having problems with CUDA. On Yosemite PB5.

CUDA-Z Info+Bench (Nvidia only) - updated Dec 2015

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites