mitch_de Posted November 21, 2015 Author Share Posted November 21, 2015 Yep, the faster the gpu can compute (more + faster compute units) you can set numbodies higher and get more GFLOPS, because fast gpus are not under full load by "only" 32K bodies. Lowend or midrange gpus will not get higher GFLOPS by using more bodies as 32K. For all gpus same : more numbodies = less FPS. I think if you get more than 10 FPS with 32K you can try to bench with 64K bodies and look for perhaps more GFLOPS. Even 128K bodies should work - i dont know - for usage with very fast gpus. EDIT: Yep, 128K works also on my lowend GT 740. Same GFLOPS as with 64K, around 332 GFLOPS - but only 1.0 FPS, with 256K 0.2 FPS So, if you have any gpu which is not lowend (GT 2/4/5/610,..20,..30) better start with 64K bodies to get close to the max. GFLOPS. 32768 = 32K 65536 = 64K 131072 = 128K 262144 = 256K (maybe for highend gpus like GTX 960+ usable) , my GT 740 gpu slows/stalls the whole OS X GUI running 256K bodies. Link to comment Share on other sites More sharing options...
gils83 Posted November 21, 2015 Share Posted November 21, 2015 Yes for medium GPU (GTX 950/60) 64k=10 fps Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 Très intéressant Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 test cuda-Z Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 test cuda-Z the problem for me with Cuda-Z I can only test one card at a time Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 the problem for me with Cuda-Z I can only test one card at a time why ? post screen Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 why ? post screen it is either not both at the same time Link to comment Share on other sites More sharing options...
mitch_de Posted November 23, 2015 Author Share Posted November 23, 2015 Yep, nbody cuda can use > 1 gpu by adding numdevices= parameter 2,3,4... Great to see first 2+ gpus compute nbdody result getting 2400 GFlops. Try to use -benchmark to compare GFLOPS without any cpu/gpu work for OpenGL rendering. Very fast gpus didnt show much diff - at least running 64K+ bodies. Lowend gpus or lowend cpus will show differences, because combined OpenGL / gpu compute task slows down the GFLOPS for gpu computing. Also older, highend GPUs (fermi, kepler) which are even faster in OpenGL than newer midrange kepler (vs fermi) / maxwell(vs fermi, kepler) gpus are often much slower in gpu computing (OpenCL, CUDA). My GT 740(kepler) DDR3 for example ist only 5-10% faster in OpenGL to my older GT 440 DDR5 (fermi) gpu. But much faster in CUDA, OpenCL- up to 2 times faster, average 30% faster. Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 Yep, nbody cuda can use > 1 gpu by adding numdevices= parameter 2,3,4... Great to see first 2+ gpus compute nbdody result getting 2400 GFlops. Try to use -benchmark to compare GFLOPS without any cpu/gpu work for OpenGL rendering. Very fast gpus didnt show much diff - at least running 64K+ bodies. Lowend gpus or lowend cpus will show differences, because combined OpenGL / gpu compute task slows down the GFLOPS for gpu computing. Also older, highend GPUs (fermi, kepler) which are even faster in OpenGL than newer midrange kepler (vs fermi) / maxwell(vs fermi, kepler) gpus are often much slower in gpu computing (OpenCL, CUDA). My GT 740(kepler) DDR3 for example ist only 5-10% faster in OpenGL to my older GT 440 DDR5 (fermi) gpu. But much faster in CUDA, OpenCL- up to 2 times faster, average 30% faster. Link to comment Share on other sites More sharing options...
mitch_de Posted November 23, 2015 Author Share Posted November 23, 2015 Use at least 64K numbodies. Otherwise , like 16K with 2 cuda devices or 8K the gpus will not get full work load. like 16K 1900 GFlops vs 64K 2400 even using OpenGL. 64K (or 128K) will may give much higher = same or little higher GFLOPS as the 64K non benchmark (OPenGL window) 2400 GFlops. 65536 = 64K 131072 = 128K Less than 64K (like 32K ....8K) bodies may only outperform lowend gpus! Less than 64K (midrange+ gpu) is more an OpenGL Bench as an gpu compute bench. Reduced FPS by more bodies doesn´t matter (running non benchmark, OpenGL runs) - Nbody CUDA an gpu compute bench. Running very less numbodies, like 2K or 8K - is 90% cpu+OpenGL bench (GFLOPS only 1/3 - 1/2 of max. GFLOPS), 64K+ 90% gpu compute bench, running in -benchmark mode 95%. And the focus is only on the GFLOPS, not OpenGL FPS. Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 Use at least 64K numbodies. Otherwise , like 16K with 2 cuda devices or 8K the gpus will not get full work load. like 16K 1900 GFlops vs 64K 2400 even using OpenGL. 64K (or 128K) will may give much higher = same or little higher GFLOPS as the 64K non benchmark (OPenGL window) 2400 GFlops. 65536 = 64K 131072 = 128K Less than 64K (like 32K ....8K) bodies may only outperform lowend gpus! Reduced FPS by more bodies doesn´t matter (running non benchmark, OpenGL runs) - Nbody CUDA an gpu compute bench. And the focus is only on the GFLOPS, not OpenGL FPS. Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 it is either not both at the same time ok , clic on "performance" for GTX 960 Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 ok , clic on "performance" for GTX 960 1 Link to comment Share on other sites More sharing options...
mitch_de Posted November 23, 2015 Author Share Posted November 23, 2015 Great : now using 128K bodies you get 2529 GFLOPS (using both gpus) in -benchmark mode I think thats the max. for that gpus - 1400 + 1100 Gflops (running each alone) Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 Great : now using 128K bodies you get 2529 GFLOPS (using both gpus) in -benchmark mode I think thats the max. for that gpus - 1400 + 1100 Gflops (running each alone) Bildschirmfoto 2015-11-23 um 13.44.14.jpg it's powerful Link to comment Share on other sites More sharing options...
mitch_de Posted November 23, 2015 Author Share Posted November 23, 2015 Yep, and dont worry about different GFLOPS shown in CUDA-z(OpenSource vs Nbody CUDA(by Nvidia). Differnet compute code (Nbody much more complex), different GFLOPS. Seems that Nbody Cuda (from Nvidia) likes/benefit more from the modern maxwell gpu vs kepler gpu than CUDA-Z: Nbody 1446 / 1150 GFLOPS = maxwell GTX 960 is 1,25 times faster than kepler GTX 660 TI CUDA Z : 2709 / 2312 GFLOPS = maxwell GTX 960 is "only" 1,17 times faster than kepler GTX 660 TI Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 good job for Adobe Première Pro "mercury" Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 good job for Adobe Première Pro "mercury" you just declarrer your card in adobe premiere pro so that they are supported Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 you just declarrer your card in adobe premiere pro so that they are supported Yes for you Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 Yes for you yes because they are not in the list Link to comment Share on other sites More sharing options...
gils83 Posted November 23, 2015 Share Posted November 23, 2015 http://best-mac-tips.com/2014/08/21/enable-cuda-hardware-rendering-adobe-premier/ Link to comment Share on other sites More sharing options...
Fljagd Posted November 23, 2015 Share Posted November 23, 2015 http://best-mac-tips.com/2014/08/21/enable-cuda-hardware-rendering-adobe-premier/ Thank you but done Fortunately Fred 1 Link to comment Share on other sites More sharing options...
Micky1979 Posted November 23, 2015 Share Posted November 23, 2015 Intel HD4000, 2097152 particles at 47 fps, it's ok? 1 Link to comment Share on other sites More sharing options...
mitch_de Posted November 23, 2015 Author Share Posted November 23, 2015 looks good/normal. Only some AMDs have problems with Metal particles. Someone contacted the dev of Metal particles to change some code for discrete gpu usage - i dont know if sucess. 1 Link to comment Share on other sites More sharing options...
MattsCreative Posted November 26, 2015 Share Posted November 26, 2015 https://twitter.com/TechnezReview/status/669948158504386561no issues with any test amd radeon 290x Link to comment Share on other sites More sharing options...
Recommended Posts