mitch_de Posted June 26, 2011 Author Share Posted June 26, 2011 Netkas found out: http://netkas.org/?p=846 In short: That’s trash, not drivers! Link to comment Share on other sites More sharing options...
Gringo Vermelho Posted July 1, 2011 Share Posted July 1, 2011 CUDA works on Fermi again on 10.6.8 with the latest CUDA 4.0.19 drivers + June 24 Quadro 4000 drivers. No modifications needed. http://www.insanelymac.com/forum/index.php...t&p=1705325 CUDA-Z works, but the CUDA preference pane didn't tell me about the 4.0.19 drivers. Haven't tried anything OpenCL yet. Link to comment Share on other sites More sharing options...
Carstiman Posted July 1, 2011 Share Posted July 1, 2011 the CUDA preference pane didn't tell me about the 4.0.19 drivers. Hi, CUDA preference works for me with 4.0.19 Link to comment Share on other sites More sharing options...
Graebags Posted July 4, 2011 Share Posted July 4, 2011 Worked for me with 4.0.19 till the first reboot. now wants update! update! update! damn! Think its about the same with 511 as DP4, despite new Nvidia GPU and Cuda drivers. Link to comment Share on other sites More sharing options...
meroy Posted July 19, 2011 Share Posted July 19, 2011 Hi all. My GTX 295 Co-op edition is up and running in Snow Leopard 10.6.8 using Apple's native drivers in 10.6.8 and the OpenGL framework. There are 2 GPUs on this card, therefore x2 the results. I added a post to a thread on insanelymac to give solution for GTX 295 owners who want to update to 10.6.8: http://www.insanelymac.com/forum/index.php...p;#entry1716804 Basically, there are changes made in 10.6.8 which render the existing EFI string useless unless 2 extensions are installed in /Extra/Extensions. The following is the result of combining both GPUs on the GTX 295 card for the N-Body demo in CUDA 4.0. ./nbody -numdevices=2 -n=61440 -benchmark [nbody] starting... Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -numdevices=N (use first N CUDA devices for simulation) > Windowed mode > Simulation data stored in system memory > Single precision floating point simulation > 2 Devices used for simulation > Compute 1.3 CUDA device: [GeForce GTX 295] > Compute 1.3 CUDA device: [GeForce GTX 295] 61440 bodies, total time for 10 iterations: 697.011 ms = 54.158 billion interactions per second = 1083.161 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED Link to comment Share on other sites More sharing options...
mitch_de Posted July 19, 2011 Author Share Posted July 19, 2011 Great nbody result! Here my 9600GT: nbody -n=61440 -benchmark [nbody] starting... Run "nbody -benchmark [-n=]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -numdevices=N (use first N CUDA devices for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation > Compute 1.1 CUDA device: [GeForce 9600 GT] 61440 bodies, total time for 10 iterations: 4812.126 ms = 7.845 billion interactions per second = 156.890 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED GA_EP35:nbody ami$ For others: compiled nbody attatched! use nbody -n=61440 -benchmark for comparing our results without parameter -benchmark you will get an window which shows what happens (simulated) I added the nbody resuls and nbody to first posting. Happy nbody benching - which is much more usefull than the small (bandwithtest) CUDA-Z performance values. nbody.zip Link to comment Share on other sites More sharing options...
mm67 Posted July 19, 2011 Share Posted July 19, 2011 GTX 460 results : [nbody] starting... Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -numdevices=N (use first N CUDA devices for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation > Compute 2.1 CUDA device: [GeForce GTX 460] 61440 bodies, total time for 10 iterations: 2008.823 ms = 18.791 billion interactions per second = 375.829 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED Link to comment Share on other sites More sharing options...
meroy Posted July 19, 2011 Share Posted July 19, 2011 My EVGA GTX 295 Co-op edition is over-clocked and has been running like that since several years now. The settings has allowed it to match a GTX 285 (times 2) in CUDA performance. I'm going to restore the 2 ROMs (one per GPU) back to the factory settings and report back the N-body results for the standard clocks. Using the factory clocks, the N-body results against both GPUs running simultaneously are: (via -numdevices=2) ./nbody -n=61440 -numdevices=2 -benchmark [nbody] starting... Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -numdevices=N (use first N CUDA devices for simulation) > Windowed mode > Simulation data stored in system memory > Single precision floating point simulation > 2 Devices used for simulation > Compute 1.3 CUDA device: [GeForce GTX 295] > Compute 1.3 CUDA device: [GeForce GTX 295] 61440 bodies, total time for 10 iterations: 847.679 ms = 44.532 billion interactions per second = 890.638 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED And the N-body result against one GPU on the GTX 295: ./nbody -n=61440 -benchmark [nbody] starting... Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -numdevices=N (use first N CUDA devices for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation > Compute 1.3 CUDA device: [GeForce GTX 295] 61440 bodies, total time for 10 iterations: 1687.764 ms = 22.366 billion interactions per second = 447.322 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED Link to comment Share on other sites More sharing options...
mitch_de Posted July 19, 2011 Author Share Posted July 19, 2011 I will collect all nbody results in first posting (bottom of first posting) for easy look+compare Link to comment Share on other sites More sharing options...
myrorym Posted July 19, 2011 Share Posted July 19, 2011 10.6.7 : CUDA 4.0.17 : Driver 256.02.05.f1 > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation > Compute 2.0 CUDA device: [GeForce GTX 470] 61440 bodies, total time for 10 iterations: 1393.516 ms = 27.089 billion interactions per second = 541.777 single-precision GFLOP/s at 20 flops per interaction [nbody] test results... PASSED Soon test 10.6.8 or 10.7 with updated drivers. TY Link to comment Share on other sites More sharing options...
meroy Posted July 19, 2011 Share Posted July 19, 2011 Hi all, Just wanted to chime in to say that I finally got sleep working on my system with the GTX 295 card. http://www.insanelymac.com/forum/index.php...t&p=1717189 Note: The culprit was was with the device-type. Each GPU is different. The one having display connections is set to NVDA,Parent and the other to NVDA,GeForce. Link to comment Share on other sites More sharing options...
mitch_de Posted July 20, 2011 Author Share Posted July 20, 2011 CUDA Speed should only depends on CUDA driver versions, OS X Version (OpenGL Version) should not have any speed diff effect. Link to comment Share on other sites More sharing options...
meroy Posted July 20, 2011 Share Posted July 20, 2011 One can even benchmark double-precision via the N-Body CUDA demo. GTX 4xx owners will be able to see a much larger increase in performance when comparing to GTX 2xx cards. Here are my results: ./nbody -fp64 -n=30720 -benchmark GTX 295 OC'ed: -- One GPU: 66.966 double-precision GFLOP/s -- Two GPUs via -numdevices=2: 128.672 double-precision GFLOP/s GTX 295 (standard clocks): -- One GPU: 55.010 double-precision GFLOP/s -- Two GPUs via -numdevices=2: 106.478 double-precision GFLOP/s This is where a GTX 4xx can shine over a GTX 2xx variant. Try -n=15360 if -n=30720 reports unspecified launch failure. Not all cards support double-precision. Link to comment Share on other sites More sharing options...
meroy Posted July 20, 2011 Share Posted July 20, 2011 Lion is soon out and folks will be able to benchmark GTX 5xx series. The following is taken from my Windows 7 box running a pair of GTX 560's. They came factory OC'd at 900/1800/2004 (4008) 1.012 volts. However, I under-clocked/under-volted them down to 855/1710/2100 (4200) 0.987 volts. Single-Precision: ./nbody -n=61440 -benchmark -- One GPU: 548.804 single-precision GFLOP/s -- Two GPUs via -numdevices=2: 1068.541 single-precision GFLOP/s Double-Precision: ./nbody -fp64 -n=30720 -benchmark -- One GPU: 89.702 double-precision GFLOP/s -- Two GPUs via -numdevices=2: 166.125 double-precision GFLOP/s I wish that NVIDIA will one day make a single-PCB card containing 2 GTX 560's to have a good balance for compute-power and electric power utilization. Link to comment Share on other sites More sharing options...
hiphopboy Posted July 21, 2011 Share Posted July 21, 2011 Hope have 4.0.0.20 today for support Lion Final Link to comment Share on other sites More sharing options...
Wayang-NT Posted July 22, 2011 Share Posted July 22, 2011 new CUDA 4.0.21 .... Link to comment Share on other sites More sharing options...
hiphopboy Posted July 22, 2011 Share Posted July 22, 2011 oh ! Thanks WaYang ! Wait 1 week for this Link to comment Share on other sites More sharing options...
Lord_Jeremy Posted January 17, 2012 Share Posted January 17, 2012 When I run nbody I get the following: dyld: Library not loaded: @rpath/libcudart.dylib Referenced from: /Users/lord_jeremy/Applications/./nbody Reason: image not found I've got the nVidia CUDA package 4.1.25 installed and CUDA-Z shows benchmark info so I presume it's functioning correctly. Any thoughts? Link to comment Share on other sites More sharing options...
Gringo Vermelho Posted January 17, 2012 Share Posted January 17, 2012 Not sure, I think you have to install the CUDA SDK or tools (or whatever) as well in order to use nbody. Everything is available on the CUDA download page. Link to comment Share on other sites More sharing options...
Lord_Jeremy Posted January 18, 2012 Share Posted January 18, 2012 Yep, that was it. Thanks! Link to comment Share on other sites More sharing options...
Cavendish Qi Posted August 6, 2012 Share Posted August 6, 2012 Thanks for the info:10.8, NVidia GT 540M, 1G, CUDA Driver 5.0.17CUDA-Z results: nbody result:https://gist.github.com/3278149https://gist.github.com/3278246 Link to comment Share on other sites More sharing options...
mitch_de Posted October 4, 2012 Author Share Posted October 4, 2012 NEW (beta) Sep-2012 Version available! http://sourceforge.n...es/cuda-z/Beta/ Select the .dmg for download OS X version. 1 Link to comment Share on other sites More sharing options...
RobertX Posted October 5, 2012 Share Posted October 5, 2012 ...ahhh, in comes the smell of something sweet and new... Link to comment Share on other sites More sharing options...
mitch_de Posted October 5, 2012 Author Share Posted October 5, 2012 Performance Tab : Device to Device Speed shows VRAM Speed. In your case, GT 520 , you can see that that value is not good, because of limited (Bits) vram bandwidth of GT 520 (GT 420, GT 620) and others using 64/128 Bit instead of 256/384 Bit. Host to Device or Device to host values (VRAM copy/access over PCI-E) are mostly limited by PCI-E Speed and way less than transferspeed onboard VRAM. Link to comment Share on other sites More sharing options...
maximus Posted October 17, 2014 Share Posted October 17, 2014 I am having problems with CUDA. On Yosemite PB5. 1 Link to comment Share on other sites More sharing options...
Recommended Posts