Metal Particles (as demo /bench) new Nbody-Metal (demo/bench)

Asgorath · November 15, 2015

I made some changes to the particle demo, would you like me to send you the diffs?

- Use MTLStorageModeManaged for all buffers.

- Render to a texture directly, instead of the drawable.

- Every N frames (I used 32), copy the texture to the drawable so the results are visible.

- Every M frames (I used 8), update the FPS display.

I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit.

MattsCreative · November 15, 2015

60 Fps on all tests 290x do you have anymore?

Asgorath · November 16, 2015

Mid 2014 Retina MBP with GeForce GTX 750M, using my modified particles demo. 2M particles gets 123 FPS when all the buffers are in framebuffer memory and the vsync cap is removed.

mitch_de · November 16, 2015

I made some changes to the particle demo, would you like me to send you the diffs?

- Use MTLStorageModeManaged for all buffers.

- Render to a texture directly, instead of the drawable.

- Every N frames (I used 32), copy the texture to the drawable so the results are visible.

- Every M frames (I used 8), update the FPS display.

I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit.

Diff to what source?

I used this source : https://github.com/FlexMonkey/MetalKit-Particles for my little changes / tests.

Asgorath · November 16, 2015

Diff to what source?

I used this source : https://github.com/FlexMonkey/MetalKit-Particles for my little changes / tests.

Yes, that's the source I modified. Are you not the author of that project?

Edit: If not, I'll contact the author directly and see if they will merge my changes in.

mitch_de · November 16, 2015

Yep, i was not the author.

gils83 · November 18, 2015

Yes

HD 7950

blacksheep · November 18, 2015

I made some changes to the particle demo, would you like me to send you the diffs?

- Use MTLStorageModeManaged for all buffers.

- Render to a texture directly, instead of the drawable.

- Every N frames (I used 32), copy the texture to the drawable so the results are visible.

- Every M frames (I used 8), update the FPS display.

I bumped the particle count up to 16M and as you can see, my GeForce GTX 980 can handle that without breaking a sweat (i.e. around 180 FPS). By rendering to a texture instead of the current drawable, the animation isn't blocked by the CoreAnimation 60 FPS limit.

Is it available somewhere for DL?

mitch_de · November 18, 2015

Yes

HD 7950

AMD seems to probs also with this Metal demo.

MattsCreative · November 18, 2015

AMD seems to probs also with this Metal demo.

not had any issues with metal

Ciro82 · November 18, 2015

AMD HD7970:

mitch_de · November 18, 2015

Yep, to be more excat i meaned the AMD (shown by Ciro82) have probs with both of that METAL demos particles/Nbody.

That doenst mean that METAL in general has probs with AMD.

The autor of both source code has only Nvidia Macbook GPU so perhaps there are some bugs for AMD gpus in the code.

At least NVidia works with both demos without showing wreid colours (particles) or very less stars (Nbody)

Asgorath · November 18, 2015

Is it available somewhere for DL?

Yeah let me clean some things up and I'll package it up ASAP. I'm going to reach out to the original author and see if he'll merge my changes into the original version as well.

Slice · November 18, 2015

AMD 6670

Process:               OSXMetalParticles [796]
Path:                  /Users/USER/Downloads/OSXMetalParticles_2Mill_final.app/Contents/MacOS/OSXMetalParticles
Identifier:            uk.co.flexmonkey.OSXMetalParticles
Version:               1.0 (1)
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           OSXMetalParticles [796]
User ID:               501

Date/Time:             2015-11-18 22:42:09.613 +0300
OS Version:            Mac OS X 10.11.2 (15C47a)
Report Version:        11
Anonymous UUID:        43065937-BBF0-8F2F-C339-5635BC71CE03


Time Awake Since Boot: 4200 seconds

System Integrity Protection: disabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_INSTRUCTION (SIGILL)
Exception Codes:       0x0000000000000001, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   uk.co.flexmonkey.OSXMetalParticles	0x0000000101395c68 _TFC17OSXMetalParticles11ParticleLabcfMS0_FT5widthSu6heightSu12numParticlesOS_13ParticleCount_S0_ + 1880
1   uk.co.flexmonkey.OSXMetalParticles	0x00000001013923d8 _TFC17OSXMetalParticles18GameViewController11viewDidLoadfS0_FT_T_ + 120
2   uk.co.flexmonkey.OSXMetalParticles	0x0000000101392776 _TToFC17OSXMetalParticles18GameViewController11viewDidLoadfS0_FT_T_ + 22
3   com.apple.AppKit              	0x00007fff8aee45bc -[NSViewController _sendViewDidLoad] + 97
4   com.apple.CoreFoundation      	0x00007fff889cb33f -[NSSet makeObjectsPerformSelector:] + 223
5   com.apple.AppKit              	0x00007fff8ad93eb2 -[NSIBObjectData nibInstantiateWithOwner:options:topLevelObjects:] + 1142

MattsCreative · November 18, 2015

Yep, to be more excat i meaned the AMD (shown by Ciro82) have probs with both of that METAL demos particles/Nbody.

That doenst mean that METAL in general has probs with AMD.

The autor of both source code has only Nvidia Macbook GPU so perhaps there are some bugs for AMD gpus in the code.

At least NVidia works with both demos without showing wreid colours (particles) or very less stars (Nbody)

my 290x had no issues with any of the tests plus i work with metal in the ue4 engine all the time

mitch_de · November 19, 2015

Then the AMD prob with those demos is same as OpenCL probs : depends on GPU subtype. Some work, some not.

And as i said, that doesnt mean Metal has an general prob with AMD. Like OpenCL, more complex code, some gpus may fail even OpenCL works in general.

blacksheep · November 19, 2015

Yeah let me clean some things up and I'll package it up ASAP. I'm going to reach out to the original author and see if he'll merge my changes into the original version as well.

Would be great.

In the meantime Nbody of R9 280X (MSI crappy one) in an original, old Mac Pro 2006 with all available bodies count.

mitch_de · November 19, 2015

AMD has Pole Position with 1,5 TFlops Nbody

Same in OpenCL would be a bit faster.

mitch_de · November 20, 2015

Nbody (by NVDIA) - CUDA only.

Very interesting functions (command line parameters)!

-fp64 (use double precision floating point values for simulation)

-hostmem (stores simulation data in host memory)

-benchmark (run benchmark to measure performance) - benchresults (GFLOPS) shown without viewing nbody window = more valide vs much OpenGL & higher cpu usage beside CUDA compute tasks

-numbodies=<N> (number of bodies (>= 1) to run in simulation)

-device=<d> (where d=0,1,2.... for the CUDA device to use) - if you have more than one CUDA device you can select the benched device

-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) - if you have more than one CUDA device you can use ALL CUDA devices together

-cpu (run n-body simulation on the CPU) - cpu GFLOPS

nbody -numbodies=32768 -benchmark

> Compute 3.0 CUDA device: [GeForce GT 740]

number of bodies = 32768

32768 bodies, total time for 10 iterations: 645.157 ms

= 16.643 billion interactions per second

= 332.862 single-precision GFLOP/s at 20 flops per interaction

with window - which has no 60 FPS limit - GTX 6xx/ 9xx and/ or 2 CUDA GPUs (using -numdevices=2) will show that

nbody -numbodies=32768

DL: Nbody_CUDA_only.zip

gils83 · November 20, 2015

Nbody (by NVDIA) - CUDA only.

Very interesting functions (command line parameters)!

-fp64 (use double precision floating point values for simulation)

-hostmem (stores simulation data in host memory)

-benchmark (run benchmark to measure performance) - benchresults (GFLOPS) shown without viewing nbody window = more valide vs much OpenGL & higher cpu usage beside CUDA compute tasks

-numbodies=<N> (number of bodies (>= 1) to run in simulation)

-device=<d> (where d=0,1,2.... for the CUDA device to use) - if you have more than one CUDA device you can select the benched device

-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) - if you have more than one CUDA device you can use ALL CUDA devices together

-cpu (run n-body simulation on the CPU) - cpu GFLOPS

nbody -numbodies=32768 -benchmark

> Compute 3.0 CUDA device: [GeForce GT 740]

number of bodies = 32768

32768 bodies, total time for 10 iterations: 645.157 ms

= 16.643 billion interactions per second

= 332.862 single-precision GFLOP/s at 20 flops per interaction

with window - which has no 60 FPS limit - GTX 6xx/ 9xx and/ or 2 CUDA GPUs (using -numdevices=2) will show that

nbody -numbodies=32768

Bildschirmfoto 2015-11-20 um 11.34.26.jpg

DL: Nbody_CUDA_only.zip

hello

works for Yosemite ?

thanks

JahStories · November 20, 2015

> Compute 3.5 CUDA device: [GeForce GTX 780]

number of bodies = 32768

32768 bodies, total time for 10 iterations: 120.707 ms

= 88.954 billion interactions per second

= 1779.087 single-precision GFLOP/s at 20 flops per interaction

gils83 · November 20, 2015

bon ben........ with cudaZ

Sans titre.html

gils83 · November 20, 2015

euh ?

limited at 6144 bodies ?

no screen

Last login: Fri Nov 20 23:22:36 on ttys000

Mac-Pro-de-gils:~ gils$ /Users/gils/Downloads/Nbody\ CUDA\ only nbody -numbodies=32768

-bash: /Users/gils/Downloads/Nbody CUDA only: is a directory

Mac-Pro-de-gils:~ gils$ /Users/gils/Downloads/Nbody\ CUDA\ only/nbody nbody -numbodies=32768 -benchmark

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.

-fullscreen (run n-body simulation in fullscreen mode)

-fp64 (use double precision floating point values for simulation)

-hostmem (stores simulation data in host memory)

-benchmark (run benchmark to measure performance)

-numbodies=<N> (number of bodies (>= 1) to run in simulation)

-device=<d> (where d=0,1,2.... for the CUDA device to use)

-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)

-compare (compares simulation results running once on the default GPU and once on the CPU)

-cpu (run n-body simulation on the CPU)

-tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode

> Simulation data stored in video memory

> Single precision floating point simulation

> 1 Devices used for simulation

GPU Device 0: "Graphics Device" with compute capability 5.2

> Compute 5.2 CUDA device: [Graphics Device]

number of bodies = 32768

32768 bodies, total time for 10 iterations: 234.380 ms

= 45.812 billion interactions per second

= 916.239 single-precision GFLOP/s at 20 flops per interaction

Mac-Pro-de-gils:~ gils$

916 GFLOP/s ??

mitch_de · November 20, 2015

Diff in GFLOPS between -benchmark (no OpenGL tasks for GPU+CPU) and window : Doing OpenGL speeds down GFLOPS because GPU has a lot of OpenGL tasks beside the gpu computing to do.

Nbody´s main task is benching the compute power of the gpu, so -benchmark (no OpenGL output) shows much better the compute performance.

"GPU Device 0: "Graphics Device" with compute capability 5.2"

Your GTX 950?

gils83 · November 21, 2015

Yes GTX 950 (Yosemite)

:yes: :hysterical:

no comprend

Yes now

GTX 950

with 65536 bodies

Metal Particles (as demo /bench) new Nbody-Metal (demo/bench)

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites