mitch_de Posted August 30, 2009 Share Posted August 30, 2009 All OpenCL GPU FPS Benches: NEW OpenCL Raytraycing Benchmark (updated 1. Posting) smallluxGPU Does raytraycing by GPU, GPU+CPU or CPU only Very complex (real life) computing, so less advantage for weak GPU than running more low level OpenCL Demos. Does much better hybrid (CPU+GPU) than Galaxies. Uses ALL openCL GPUs (up to 4) parallel which it find. Also works with ATI 48XX GPUs . Update to V170 (always same link )[/b] http://www.macupdate.com/info.php/id/33632/smallluxgpu Major update with console tab (you can see informations the gui also shows but now even more + errors) Happy benching (times type 0 no changes, type 1 maybe little faster) ************ Older stuff / mostly not much real world like smalllux ! Galaxies32K V2 + Galaxies 8K V2 + Grass + Displacement +AO (raytraycing CPU/GPU) + Transpose Bandwith [/size] Snow Leopard + Intel Macs ONLY ! ATI OpenCL GPUs (4850&4870) not really working! - i am in contact with ATI DEVs -problems with OpemCL Drivers/Framework - must+will be fixed with 10.6.1 or an ATI Driver Update HOT NEWS - always updated here - New Galaxies OpenCL Bench V2 build - Apple updated / fixed some OpenCL API usage (maybe help ATI) - little speed up (10% on my GT 9600) Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs. If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time. DL Links on 1 post - OS X 10.6.1 updated ATI/Intel+Nvidia OpenGL drivers, but the OpenCL Framework stays same. So ATI will fail (i am sure) also with 10.6.1 running Apple OpenCL Demos (here listed). Apple does an rewrite of the GALAXIES OpenCL Demo/Bench - Nvidia GTX285 will rise from 280 GigaFlops to around 400 GigaFlops The CPU GigaFlops stays same around 28 GigaFlops C2D /100 GigaFlops MacPro 2009 . This new Version will compiled and shared here like the last one. Slower OpenCL GPUs, like mine 9600 GT should not expect such an big GigaFlop boost with that new Galaxies (N-Body) Apple Demo. OPENCL - Good to know : - OpenCL is an API for universal GPU(CPU) computing - main difference to CUDA / ATI STEAM is: both only working with their "own" gpu. an CUDA (NV) app like badaboom(h264 on GPU) cant work on an ATI gpu and vice versa - OpenCL is universal for different gpu vendors means: - Xcode / GCC compiles an code which includes the source (in C as an string) for the gpu programm that c source is , different to CUDA/ATI STEAM , is compiled later by OpenCL Framework at runtime ! So same App can run on complete different gpus and also , without/less codechange om CPU if no OpemCL gpu (newer ones) is found The source (example below) for the gpu programm will be really compiled at runtime, not only interpreted. So little differences between run of my bench may happen because of that compile on the run NEWS: form An Information from ATI OS X OpenCL divison dev team: Thank you for the quick response and I hope you extend the benchmark application since it’s a really good idea. Regarding the sample applications posted on the developer.apple.com website (eg. Galaxies, Qjulia, etc), we are aware that some fail (or even crash) on AMD hardware and working to track down all these issues. We suspect that most of these issues will be resolved for the next graphics driver update in Snow Leopard. BTW, I ran the demos on a iMac with a Radeon 4850 and I get the following results: OpenCL Benches: G A L A X I E S - an CPU vs GPU GigaFlop Bench UPDATE: Due to an hint from ATI, i increased the count of thing to compute from 4K to 16K New 16K version is available , shows 16K at the legend. Apple reduces that count for GPUs without discrete VRAM = 9400M / 8600M, so it is set to 4K, even legend shows 16K. You cant compare 9400M / 8600M 1:1 with the other, run 16K count Using of GALAXIES: Start Galaxy key s = switch compute Modes >CPU>Single/Multi, CPU-Vector/SSE Single/Multi>GPU> GPU+CPU> (bold=start Mode) key SPACE = Pause/go on key 6 = Reset Szene key Q = QUIT DOWNLOADs each 6 MB: NEW: http://rapidshare.com/files/286234291/Galaxies_32K_V2.zip NEW: http://rapidshare.com/files/286235157/Galaxies_8K_V2.zip mitch (C2D 3GHZ, NV 9600 GT , 1280x1024) V1 /V2 24 Gigaflops : CPU ( SIM: Vector Multi-Core CPU. Mode) 112 Gigaflops[/b] : NV 9600 GT 16K V2 142 Gigaflops : NV 9600 GT 32K[/b] Users results (new 32K V2 Version): CPU 18 G Nvidia 9600gt gpu 149 GigaFlops 32k v2.0 at 1680x1050 Gigabyte Ga-eg45m-ud2r, Intel e6750, Mac Pro Nehalem 8 core 2.93GHz:http://www.barefeats.com/index.html All tests 2500x1600! Nvidia GTX 285 ...... soon!! Mac Pro Nehalem 8 core CPU = ..... New OpenCL Transpose Bandwith - measures Bandwith of Matrix-Transpose DL Link at the end of posting (very small, run like all other terminal OpenCL Bench apps) mitch: Nvidia 9600GT: around 39 Gigabyte/Sec Mac Pro (1,1) 2.66Ghz 4GB RAM, 4870 1GB sapphire Performing Matrix Transpose [256 x 4096]... Bandwidth Achieved = 3.160816 GB/sec MacPro 2009, NVidia GTX 285 Mac Bandwidth Achieved = around 80 GB/sec UPDATE: New OpenCL AO Bench (512*512 insted of 256*256 barefeat = barefeat results / 2) DL Link at the end (very small, SSE4 optimized, 512*512 Window ) mitch NV 9600 GT : 8 FPS (512*512) C2D 3 GHZ : 0.8 FPS So ATI users may try the new compiled Procedural Geometric Displacement FPS Bench Download for ATI + Nvidia USERS: OpenCL_Displacement_Bench.zip - with step by step HOW TO RUN readme - Update: An new compiled Displacement (the app only) which was build with GCC 4.2 very less optimzed compiler settings seems to run on ATI 4870 more stable / reliable. If you have such problems with displacement, I attatched the small dl at the first post as displacement_ATi for overwrite+usage with the whole (normal) 7 MB dl. QJulia1024 Results (the qJulia with 1024x1024 window size) 9600 GT , around 13 FPS static - please let the bench first (wait a few seconds) show static FPS before you switch to animate (SPACE) 8-16 FPS when animating Rob GTX 285 Mac 1024x1024 = 44 fps eVGA GTX-285 1024MB 46.70fps 8800 GT 22.46 FPS qJulia Results (800x800 window) 9600GT : around 29 FPS static, 16 - 60 FPS when animating (key SPACE) eVGA GTX-285 1024MB 98.79 fps Rob GTX 285 Mac 800x800 = 93 fps 2.4GHz C2D, 4GB RAM, 8600M GT (256MB) qJulia (800x800): static shows 10 - 11 FPS animated shows 9 - 11 FPS MacBook Pro 13", GPU GeForce 9400M running at 6,25 fps (6-6,50) OpenCL Displacement FPS results (ATI should work !) mitch 9600 GT 80 FPS first (white background+shadow) / 102 FPS second (with texture in backround) ATI 4870 both around 90 FPS - but only 1/3 of start the bench was successfull - so also that bench didnt work 100% well - wait for OS X 10.6.1 Geforce GTX285 Mac : around 220 FPS, second shader test Radeon 4870 1GB (sapphire) Mac Pro (1,1) 2.66Ghz quad core, 4GB RAM both szenes near 90 FPS Most 4870 are near together between 84 and 90 FPS - but some test fail and some get bad result window graphics GRASS simulates an scene grass sticks moving in the wind Grass Results 4 Meg triangles + 170.000 Sticks to compute - big szene! ( 1024x1024 window size) 9600 GT , around 53 FPS 2.4GHz C2D, 4GB RAM, 8600M GT (256MB) 27 - 29 FPS 8800 GT 56.97 fps i920 Overclocked to 4.4Ghz, 1760Mhz DDR3, PCIE-100Mhz, eVGA GTX-285 1024MB 95.50 fps Rob ( barefeats! Test mule is Mac Pro Nehalem 2.93 Octo) GeForce GTX 285 Mac = 88 - 91 fps Quadro FX 4800 = 77 fps steady GeForce 8800 GT = 54 fps steady GeForce GT 120 = 35 fps steady DL for qJulia + Grass (has GUI) (at the end) Read the readme - you will ger an file not found error (loads the qjulia.cl OpemCL source, if you didnt changed terminal directory to the app folder before running the command line app. For all GLUT (Terminal Apps, Transpose+qJluia+AO) check the app preferences of SYNC is OFF (screenshoot OpenCL AO preferences) OpenCL_Transpose__Bandwidh.zip displacement_ATI.zip opencl_aobench.zip Grass.zip qJulia1024.zip OpenCL_Qjulia_GPU.zip Link to comment Share on other sites More sharing options...
GLXOZ Posted August 30, 2009 Share Posted August 30, 2009 GALAXY OpenCL-Speed Bench-Demo !Snow Leopard + Intel Macs ONLY ! LLMV GCC 4.2 Compiler compiliert = high optimized Code Using: Start Galaxy your link is dead %( Link to comment Share on other sites More sharing options...
GLXOZ Posted August 30, 2009 Share Posted August 30, 2009 ??? What excat happens ??? I can download that RS Link - tested it right now. I added an second DL Link. With your second link all OK, thank you - great work ! Link to comment Share on other sites More sharing options...
mitch_de Posted August 30, 2009 Author Share Posted August 30, 2009 ...- great work ! Was Apple Work - i only compiled it ! Link to comment Share on other sites More sharing options...
GLXOZ Posted August 30, 2009 Share Posted August 30, 2009 Was Apple Work - i only compiled it ! No matter, you compile it - you do work. Link to comment Share on other sites More sharing options...
Ruben-P Posted August 30, 2009 Share Posted August 30, 2009 1600x1200 20 Gigaflops / around 59 U/sec : CPU ( SIM: Vector Multi-Core CPU. Mode) E6600 @ 3GHz 170 Gigaflops / around 505 U/sec : Nvidia 9800 GTX+ Link to comment Share on other sites More sharing options...
Ai Haibara Posted August 30, 2009 Share Posted August 30, 2009 1280x800 20 Gigaflops - around 60 U/sec (CPU: Core2Duo 2,53 ghz SIM Vector Multi-core CPU) 10 Gigaflops - around 30 U/sec (GPU: SIM GeForce 9400M) 5 Gigaflops - around 60 U/sec (CPU Multi-core + GPU) Testing on latest MacBook Pro 13". Sherry Haibara Link to comment Share on other sites More sharing options...
blackosx Posted August 30, 2009 Share Posted August 30, 2009 Mitch, this is lovely - Thanks blackosx (C2D 2.66GHZ, NV 8800 GT , 1680x1050) 19 Gigaflops / around 57 U/sec : CPU ( SIM: Vector Multi-Core CPU. Mode) 60 Gigaflops / around 178 U/sec : Nvidia 8800 GT Link to comment Share on other sites More sharing options...
Chaz_UK Posted August 30, 2009 Share Posted August 30, 2009 I just did a little trial on my 20 inch iMac and it seems that the lack of OpenCL support for the ATi 2x00 cards is disappointing. Thanks for sharing it! I'm not sure if I ran it right but here is a screengrab: Link to comment Share on other sites More sharing options...
AppleIIGuy Posted August 31, 2009 Share Posted August 31, 2009 9800GT Core i7 920 stock speed 290 Updates/sec 98 Gigaflops 1900x1200 Sim: GeForce 9800 GT Link to comment Share on other sites More sharing options...
radov4n Posted August 31, 2009 Share Posted August 31, 2009 Great tool 1600x1200, i7 http://i28.tinypic.com/ml2i6o.jpg http://i31.tinypic.com/35hel9h.jpg http://i30.tinypic.com/2m5av6w.jpg http://i30.tinypic.com/nd68a1.jpg Link to comment Share on other sites More sharing options...
johan Posted August 31, 2009 Share Posted August 31, 2009 why do result go up and down. in same session? i go up and down from 80 to 135 gflop with 8800 gtx i expected results to be stable within a session Link to comment Share on other sites More sharing options...
mitch_de Posted August 31, 2009 Author Share Posted August 31, 2009 why do result go up and down. in same session?i go up and down from 80 to 135 gflop with 8800 gtx i expected results to be stable within a session Because ist an simulation of stars. They move by gravity so a lot changes (less stars/ more stars) of things to compute over the time. Thats make the changes in GigaFlops also - its a dynamic simulation, not very static. Link to comment Share on other sites More sharing options...
Cyberdog ! Posted August 31, 2009 Share Posted August 31, 2009 don't work on my iMac Core Duo 1.83 Ghz + ATI X1600 10.6 32 bits. Link to comment Share on other sites More sharing options...
proengin Posted September 1, 2009 Share Posted September 1, 2009 W3520 overclocked to 4.1Ghz Turbo, PCIE 102Mhz, 1280x1024x75hz, standard scene GTX-285 - 469 updates/sec, 157 Gigaflops Vector Multi core - 262 updates/sec, 87 Gigaflops Link to comment Share on other sites More sharing options...
mitch_de Posted September 1, 2009 Author Share Posted September 1, 2009 W3520 overclocked to 4.1Ghz Turbo, PCIE 102Mhz, 1280x1024x75hz, standard scene GTX-285 - 469 updates/sec, 157 Gigaflops Vector Multi core - 262 updates/sec, 87 Gigaflops Thanks. What exact is an W3520 (Xeon 4 core, Xeon i7 4 core + 4 core) ? 157 Gigaflops looks a bit less, compard to an other posted 170 GigaFlops with his 9800GTX. Can you run the App again and compare results ? But maybe the 9800GTX values arent "real". HINT: Also try the new qJulia OpenCL FPS Bench - high gpu usage of complex OpenCL code. http://www.insanelymac.com/forum/index.php?showtopic=183237 Maybe you see there more difference to other gpus. My 9600GT getswith qJulia OpenCL FPS Bench around 30 FPS. Galaxy with my 9600GT was around 70 Gigaflops at 1280x1024 (60 Gigafops 1600x1200). So in Galaxy your GTX285 is 2,2 times faster (GPU) than my 9600GT. I am very ?? if qJuilia is scaling on your GTX 285 GPU. Perhaps qJulia GTX 285 is more than 2,2 times faster : my 9600 GT get there around 30 FPS in the 800x800 window. Link to comment Share on other sites More sharing options...
rob-ART Posted September 1, 2009 Share Posted September 1, 2009 I will be posting some results on BareFeats.com today showing various GPUs running Galaxy and two other OpenCL benchmarks. When I ran Galaxy, I noticed that the rates dropped after running for a while. So I'm posting the "ending" numbers on updates per second and gigaflops. For some reason, with the Radeon HD 4870 installed, Galaxy hung on startup. I've sent the dump to mitch_de and to ATI engineering to see if there's an easy fix. Link to comment Share on other sites More sharing options...
JBeed Posted September 1, 2009 Share Posted September 1, 2009 I'm a bit worried here. My MacBook Pro (2.4GHz C2D, 4GB RAM, 8600M GT) is getting only 10 Gflops (30 Updates/s) when using the GPU. If I'm using the Vector Multi-core setting, I get about 20 Gflops from the CPU. I've tried to shut down all the windows, and I've tried with the machine newly restarted. I've also tried installing CUDA (and the CUDA-files are there, checked), but there is no difference at all. It's stuck at exactly 10 Gflops and 30 Updates/s, no change at all. The weird thing, though, is that even if I'm running a game in the background (World of Warcraft), I get the exact same performance in this application; isn't the performance supposed to be lower if there's another application using the GPU? Anyone else here with a MacBook Pro running 8600M GT that can post their results, so I have something to compare against? Link to comment Share on other sites More sharing options...
mitch_de Posted September 1, 2009 Author Share Posted September 1, 2009 Your results are normal - dont worry. The 8600M GT isnt an good OpenCL gpu. OpenCL uses the programming features (OpenCL works a bit like shaders in games - but much more flexibel+universal) which are "undersized" in GPUs before 2009 - even 2008 build desktop gpus normally arent good OpenCL gpus. So dont worry about that, because OpenCL usage will start in 2010 - not before, because most people wont have an gpu which can do things much faster than cpu. Only an less cpu load may the result of OpenCL usage on low end OpenCL gpus. Mobile GPUs (MacBook, MacMini, iMac also!) maybe must wait longer than 2010, because they are not build for high end number crunching jobs. Poweful (for OpenCL) & low power usage mobile gpus will bot be there until end of 2010 ! Its simple impossible to build an GTX285 / 9800GTX+ speed into an mobile mac or Imac - overheats (iMac) or mobile (akku) time less then 10 minutes ?!. Link to comment Share on other sites More sharing options...
Ruben-P Posted September 3, 2009 Share Posted September 3, 2009 All my results are with cuda drivers. Link to comment Share on other sites More sharing options...
JBeed Posted September 5, 2009 Share Posted September 5, 2009 Well, a small update. As of the new version, running SSE4 and no VSync, I'm getting: 70 Updates/s and 23 Gflops Link to comment Share on other sites More sharing options...
mitch_de Posted September 7, 2009 Author Share Posted September 7, 2009 UPDATE: ATI devs give me an hint to increase the count from 4K to 16K. CPU GigaFlops will stay the same. But GPU can show more performance, because in "only" 4K (count of things to compute) the faster GPUs like GTX285 ar not on the limit ! Even my 9600 GT go from 97 Giga(4K) to 112 Giga(16K) - the CPU cant compute more (white flag , so 16K CPU Giga = 4K CPU Giga!! New Version shows 16K in the result legends, is also no vsync & SSE4 optimized DL Link on 1 post Link to comment Share on other sites More sharing options...
proengin Posted September 10, 2009 Share Posted September 10, 2009 Here are my OpenCL_GALAXIES_16K_SSE4_VSYNC_OFF benchmarks for 1280 x 1024 pixels: 1. i920 overclocked to 4.4GHZ - 87 Gigaflops 2. eVGA GTX-285 (100MHz pcie) - 306 Gigaflops Link to comment Share on other sites More sharing options...
macguitarm Posted September 16, 2009 Share Posted September 16, 2009 Not sure if this is the correct topic / thread Former Apple Final Cut Pro engineer, Very interested in OpenCL and Final Cut Studio 3, Compressor specifically. I have done a bunch of tests on Compressor 3.5 / Qmaster and Leopard 10.5.8 and the new Mac Pro Nehalem's. I have tested 14 Instances (Cores) in Qmaster, Compressor 3.5 and Submitted a 40 minute DVCPRO HD clip to be batch / parallel converted to H.264 in 5 separate queues. It took only 1 hour to do a 40 minute clip, this is pretty good to output 5 separate clips. Now Snow Leopard of course has Grand Central to make this even better, and I will eventually test that. My main interest is to test OpenCL, and OpenCL specifically with Compressor/ FCP I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it. Or I will have to get to my colleagues new Nehalem MacPro with dual nVidia GT 120's, although it seems from Barefeats that the GT 120 is the weakest OpenCL card Very interested in developing this thread / conversation along these lines of Compressor 3.5 and Final Cut Pro 7 and OpenCL, it could be awesome stuff saving a ton of time. thanks in advance Link to comment Share on other sites More sharing options...
mitch_de Posted September 16, 2009 Author Share Posted September 16, 2009 ATI Apple Dev told me that they didnt reached time limit for 10.6.1 update to fix OpenCl on ATI GPUs. They will fix that as soon as possible, but that can be not before 10.6.2 (some weeks to wait). OpenCl only works with Nvidia GPUs today. And also OpenCl will only get "into work" if an application uses the OpenCL framework - so only new develepoed apps will use OpenCL. "Old" Apps, meas your already installed apps not. Link to comment Share on other sites More sharing options...
Recommended Posts