Asgorath Posted November 15, 2015 Share Posted November 15, 2015 I made some changes to the particle demo, would you like me to send you the diffs? - Use MTLStorageModeManaged for all buffers. - Render to a texture directly, instead of the drawable. - Every N frames (I used 32), copy the texture to the drawable so the results are visible. - Every M frames (I used 8), update the FPS display. I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit. Link to comment Share on other sites More sharing options...
MattsCreative Posted November 15, 2015 Share Posted November 15, 2015 60 Fps on all tests 290x do you have anymore? Link to comment Share on other sites More sharing options...
Asgorath Posted November 16, 2015 Share Posted November 16, 2015 Mid 2014 Retina MBP with GeForce GTX 750M, using my modified particles demo. 2M particles gets 123 FPS when all the buffers are in framebuffer memory and the vsync cap is removed. Link to comment Share on other sites More sharing options...
mitch_de Posted November 16, 2015 Author Share Posted November 16, 2015 I made some changes to the particle demo, would you like me to send you the diffs? - Use MTLStorageModeManaged for all buffers. - Render to a texture directly, instead of the drawable. - Every N frames (I used 32), copy the texture to the drawable so the results are visible. - Every M frames (I used 8), update the FPS display. I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit. Diff to what source? I used this source : https://github.com/FlexMonkey/MetalKit-Particles for my little changes / tests. Link to comment Share on other sites More sharing options...
Asgorath Posted November 16, 2015 Share Posted November 16, 2015 Diff to what source? I used this source : https://github.com/FlexMonkey/MetalKit-Particles for my little changes / tests. Yes, that's the source I modified. Are you not the author of that project? Edit: If not, I'll contact the author directly and see if they will merge my changes in. Link to comment Share on other sites More sharing options...
mitch_de Posted November 16, 2015 Author Share Posted November 16, 2015 Yep, i was not the author. Link to comment Share on other sites More sharing options...
gils83 Posted November 18, 2015 Share Posted November 18, 2015 Yes HD 7950 Link to comment Share on other sites More sharing options...
blacksheep Posted November 18, 2015 Share Posted November 18, 2015 I made some changes to the particle demo, would you like me to send you the diffs? - Use MTLStorageModeManaged for all buffers. - Render to a texture directly, instead of the drawable. - Every N frames (I used 32), copy the texture to the drawable so the results are visible. - Every M frames (I used 8), update the FPS display. I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit. Is it available somewhere for DL? 1 Link to comment Share on other sites More sharing options...
mitch_de Posted November 18, 2015 Author Share Posted November 18, 2015 Yes HD 7950 AMD seems to probs also with this Metal demo. Link to comment Share on other sites More sharing options...
MattsCreative Posted November 18, 2015 Share Posted November 18, 2015 AMD seems to probs also with this Metal demo. not had any issues with metal Link to comment Share on other sites More sharing options...
Ciro82 Posted November 18, 2015 Share Posted November 18, 2015 AMD HD7970: Link to comment Share on other sites More sharing options...
mitch_de Posted November 18, 2015 Author Share Posted November 18, 2015 Yep, to be more excat i meaned the AMD (shown by Ciro82) have probs with both of that METAL demos particles/Nbody. That doenst mean that METAL in general has probs with AMD. The autor of both source code has only Nvidia Macbook GPU so perhaps there are some bugs for AMD gpus in the code. At least NVidia works with both demos without showing wreid colours (particles) or very less stars (Nbody) 1 Link to comment Share on other sites More sharing options...
Asgorath Posted November 18, 2015 Share Posted November 18, 2015 Is it available somewhere for DL? Yeah let me clean some things up and I'll package it up ASAP. I'm going to reach out to the original author and see if he'll merge my changes into the original version as well. 2 Link to comment Share on other sites More sharing options...
Slice Posted November 18, 2015 Share Posted November 18, 2015 AMD 6670 Process: OSXMetalParticles [796] Path: /Users/USER/Downloads/OSXMetalParticles_2Mill_final.app/Contents/MacOS/OSXMetalParticles Identifier: uk.co.flexmonkey.OSXMetalParticles Version: 1.0 (1) Code Type: X86-64 (Native) Parent Process: ??? [1] Responsible: OSXMetalParticles [796] User ID: 501 Date/Time: 2015-11-18 22:42:09.613 +0300 OS Version: Mac OS X 10.11.2 (15C47a) Report Version: 11 Anonymous UUID: 43065937-BBF0-8F2F-C339-5635BC71CE03 Time Awake Since Boot: 4200 seconds System Integrity Protection: disabled Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_INSTRUCTION (SIGILL) Exception Codes: 0x0000000000000001, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 uk.co.flexmonkey.OSXMetalParticles 0x0000000101395c68 _TFC17OSXMetalParticles11ParticleLabcfMS0_FT5widthSu6heightSu12numParticlesOS_13ParticleCount_S0_ + 1880 1 uk.co.flexmonkey.OSXMetalParticles 0x00000001013923d8 _TFC17OSXMetalParticles18GameViewController11viewDidLoadfS0_FT_T_ + 120 2 uk.co.flexmonkey.OSXMetalParticles 0x0000000101392776 _TToFC17OSXMetalParticles18GameViewController11viewDidLoadfS0_FT_T_ + 22 3 com.apple.AppKit 0x00007fff8aee45bc -[NSViewController _sendViewDidLoad] + 97 4 com.apple.CoreFoundation 0x00007fff889cb33f -[NSSet makeObjectsPerformSelector:] + 223 5 com.apple.AppKit 0x00007fff8ad93eb2 -[NSIBObjectData nibInstantiateWithOwner:options:topLevelObjects:] + 1142 Link to comment Share on other sites More sharing options...
MattsCreative Posted November 18, 2015 Share Posted November 18, 2015 Yep, to be more excat i meaned the AMD (shown by Ciro82) have probs with both of that METAL demos particles/Nbody. That doenst mean that METAL in general has probs with AMD. The autor of both source code has only Nvidia Macbook GPU so perhaps there are some bugs for AMD gpus in the code. At least NVidia works with both demos without showing wreid colours (particles) or very less stars (Nbody) my 290x had no issues with any of the tests plus i work with metal in the ue4 engine all the time Link to comment Share on other sites More sharing options...
mitch_de Posted November 19, 2015 Author Share Posted November 19, 2015 Then the AMD prob with those demos is same as OpenCL probs : depends on GPU subtype. Some work, some not. And as i said, that doesnt mean Metal has an general prob with AMD. Like OpenCL, more complex code, some gpus may fail even OpenCL works in general. Link to comment Share on other sites More sharing options...
blacksheep Posted November 19, 2015 Share Posted November 19, 2015 Yeah let me clean some things up and I'll package it up ASAP. I'm going to reach out to the original author and see if he'll merge my changes into the original version as well. Would be great. In the meantime Nbody of R9 280X (MSI crappy one) in an original, old Mac Pro 2006 with all available bodies count. Link to comment Share on other sites More sharing options...
mitch_de Posted November 19, 2015 Author Share Posted November 19, 2015 AMD has Pole Position with 1,5 TFlops Nbody Same in OpenCL would be a bit faster. Link to comment Share on other sites More sharing options...
mitch_de Posted November 20, 2015 Author Share Posted November 20, 2015 Nbody (by NVDIA) - CUDA only. Very interesting functions (command line parameters)! -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) - benchresults (GFLOPS) shown without viewing nbody window = more valide vs much OpenGL & higher cpu usage beside CUDA compute tasks -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) - if you have more than one CUDA device you can select the benched device -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) - if you have more than one CUDA device you can use ALL CUDA devices together -cpu (run n-body simulation on the CPU) - cpu GFLOPS nbody -numbodies=32768 -benchmark > Compute 3.0 CUDA device: [GeForce GT 740] number of bodies = 32768 32768 bodies, total time for 10 iterations: 645.157 ms = 16.643 billion interactions per second = 332.862 single-precision GFLOP/s at 20 flops per interaction with window - which has no 60 FPS limit - GTX 6xx/ 9xx and/ or 2 CUDA GPUs (using -numdevices=2) will show that nbody -numbodies=32768 DL: Nbody_CUDA_only.zip Link to comment Share on other sites More sharing options...
gils83 Posted November 20, 2015 Share Posted November 20, 2015 Nbody (by NVDIA) - CUDA only. Very interesting functions (command line parameters)! -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) - benchresults (GFLOPS) shown without viewing nbody window = more valide vs much OpenGL & higher cpu usage beside CUDA compute tasks -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) - if you have more than one CUDA device you can select the benched device -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) - if you have more than one CUDA device you can use ALL CUDA devices together -cpu (run n-body simulation on the CPU) - cpu GFLOPS nbody -numbodies=32768 -benchmark > Compute 3.0 CUDA device: [GeForce GT 740] number of bodies = 32768 32768 bodies, total time for 10 iterations: 645.157 ms = 16.643 billion interactions per second = 332.862 single-precision GFLOP/s at 20 flops per interaction with window - which has no 60 FPS limit - GTX 6xx/ 9xx and/ or 2 CUDA GPUs (using -numdevices=2) will show that nbody -numbodies=32768 Bildschirmfoto 2015-11-20 um 11.34.26.jpg DL: Nbody_CUDA_only.zip hello works for Yosemite ? thanks Link to comment Share on other sites More sharing options...
JahStories Posted November 20, 2015 Share Posted November 20, 2015 > Compute 3.5 CUDA device: [GeForce GTX 780] number of bodies = 32768 32768 bodies, total time for 10 iterations: 120.707 ms = 88.954 billion interactions per second = 1779.087 single-precision GFLOP/s at 20 flops per interaction Link to comment Share on other sites More sharing options...
gils83 Posted November 20, 2015 Share Posted November 20, 2015 bon ben........ with cudaZ Sans titre.html Link to comment Share on other sites More sharing options...
gils83 Posted November 20, 2015 Share Posted November 20, 2015 euh ? limited at 6144 bodies ? no screen Last login: Fri Nov 20 23:22:36 on ttys000 Mac-Pro-de-gils:~ gils$ /Users/gils/Downloads/Nbody\ CUDA\ only nbody -numbodies=32768 -bash: /Users/gils/Downloads/Nbody CUDA only: is a directory Mac-Pro-de-gils:~ gils$ /Users/gils/Downloads/Nbody\ CUDA\ only/nbody nbody -numbodies=32768 -benchmark Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "Graphics Device" with compute capability 5.2 > Compute 5.2 CUDA device: [Graphics Device] number of bodies = 32768 32768 bodies, total time for 10 iterations: 234.380 ms = 45.812 billion interactions per second = 916.239 single-precision GFLOP/s at 20 flops per interaction Mac-Pro-de-gils:~ gils$ 916 GFLOP/s ?? Link to comment Share on other sites More sharing options...
mitch_de Posted November 20, 2015 Author Share Posted November 20, 2015 Diff in GFLOPS between -benchmark (no OpenGL tasks for GPU+CPU) and window : Doing OpenGL speeds down GFLOPS because GPU has a lot of OpenGL tasks beside the gpu computing to do. Nbody´s main task is benching the compute power of the gpu, so -benchmark (no OpenGL output) shows much better the compute performance. "GPU Device 0: "Graphics Device" with compute capability 5.2" Your GTX 950? Link to comment Share on other sites More sharing options...
gils83 Posted November 21, 2015 Share Posted November 21, 2015 Yes GTX 950 (Yosemite) no comprend Yes now GTX 950 with 65536 bodies Link to comment Share on other sites More sharing options...
Recommended Posts