mitch_de Posted March 31, 2010 Share Posted March 31, 2010 EDIT : DL link for newest slg version always at macupdate,com. EDIT 30.07. PerFinal V171_3 http://rapidshare.com/files/410151278/smallluxGPU171_V3.zip http://www.macupdate.com/info.php/id/33632/smallluxgpu Needed : all NVs >= 8800 Select the luxball (standard scene) and the Benchmark GPU only modes with 2,3 and 4 gpu threads and post your kSamles/Sec in that gpu only modes. My gpu only results (8800GTX) are shown in the screenshoot. GTX 260++ will perform much faster. 9400M much slower. EDIT: after i while i find that the GRASS OpenCL Demo also is an good OPENCL Bench. i get 54 FPS with 9600GT. Grass_OPENCL.zip Link to comment Share on other sites More sharing options...
machinist Posted April 1, 2010 Share Posted April 1, 2010 Running 10.6.3 Intriguingly this test divides the workload across the cores of the 9800 GX2, and uses both G92 chips in concert. Cinebench 11.5 opengl test yields 26.17 fps in 10.6.3 and 34.32 with Win7(64). Openglviewer produced lower scores in 10.6.3 then the ~3200+ fps scores with 10.6.2. It reports it is only using 16 compute units. I would note opengl 3.0 was only at 65% with 10.6.2 while it's at 91% with 10.6.3. two_threads.tiff three_threads.tiff four_threads.tiff Link to comment Share on other sites More sharing options...
mitch_de Posted April 6, 2010 Author Share Posted April 6, 2010 Thanks ! Can you please try the new 1.5.2 version, which shows better comparable xy Sec as Speed in the new benchmark gpu Mode ? 8800GTX needs 28 sec, 9400M 156 sec Link to comment Share on other sites More sharing options...
machinist Posted April 7, 2010 Share Posted April 7, 2010 Thanks !Can you please try the new 1.5.2 version, which shows better comparable xy Sec as Speed in the new benchmark gpu Mode ? 8800GTX needs 28 sec, 9400M 156 sec And the 9800 GX2 needs 17.8 seconds. Small matters: "title bar" and pull down menu is Deutsch; guessed to go to macupdate to download the program as you neglected to link to it here. That aside, this is becoming an interesting little utility. Link to comment Share on other sites More sharing options...
wetzel Posted April 12, 2010 Share Posted April 12, 2010 MSI GTX260 192 core on 10.6.3 using NVenabler. with 2 threads I had 668K/sec, 3 threads 678K/sec average after 128 samples. I used version 1.5.3 and "benchmark midrange CPU" resulted in 16.9 seconds, highend benchmark in 31.2 seconds. Hope this helps with whatever you're doing. Link to comment Share on other sites More sharing options...
mitch_de Posted April 15, 2010 Author Share Posted April 15, 2010 Thanks ! Perhaps an GTX 285 or 2*GTX 260 user can get closer to ATI 4850 (High Benchmark 17 sec) or ATI 4870 (15 sec) ? GTX 260 in High around 29 sec (my 8800GTX=59 sec, 9600GT=80 sec) is fastest GTX gpu until now, but far away from the units speed of the 48xx. Also shader unit MHZ may give little speed boost some GTX 260 showed 1348 MHz, some 1408 Mhz in the benchmark mode result window! Thanks for the multi GPU card 9800X2 test ! Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.) http://www.macupdate.com/info.php/id/33632/smallluxgpu Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu. I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec. Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result. EDIT: I got Results from iMac 27" ATI 4850M : 21 sec in High Benchmark mode. Slower than 4870 (15 sec) but even faster than GTX 260. Shaderspeed (lots of units) of ATI 48xx cant get cracked by older Geforces. But Fermi will do - i am sure. For sure, in overall gaming speed isnt so much different as in OpenCL speed ! ATI 4870 is not 4 times faster than 8800GTX running an game! Link to comment Share on other sites More sharing options...
mitch_de Posted April 16, 2010 Author Share Posted April 16, 2010 Updated to 1.5.5. Added Ultra highend Benchmarkmode ! 8800 GTX = 101 sec GTX 285 (Mac) = 44,7 sec As before , the ATI 48xx cards (even the mobile Imac 4850m) will outperform that Link to comment Share on other sites More sharing options...
machinist Posted April 17, 2010 Share Posted April 17, 2010 "Thanks for the multi GPU card 9800X2 test ! Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.) Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu. I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec. Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result." Newer slg in High Benchmark Mode = 36.7 secs. Ultrahighend Benchmark Mode = 53.7 secs. Sponza scene with 48 samples, 3 threads, GPU only = 35k samples/sec. (Using version 1.5.5) Link to comment Share on other sites More sharing options...
mitch_de Posted April 17, 2010 Author Share Posted April 17, 2010 Thanks ! barefeat (Rob) uses now smallluxGPU as Bench beside Geekbench + Cinebench 11.5 http://www.barefeats.com/mbpp18.html Link to comment Share on other sites More sharing options...
cfhuk Posted April 22, 2010 Share Posted April 22, 2010 Cheers. Benched GTX 260 on its own before I eventually work out how to stick the second one in. Midrange GPU - 16 seconds High End GPU - 25 seconds UltraHybrid Sponza - 22 seconds Link to comment Share on other sites More sharing options...
mitch_de Posted April 23, 2010 Author Share Posted April 23, 2010 Thanks ! Could you also compare High Hybrid vs High CPU only and Ultra Hybrid vs Ultra CPU only(both in the middle section of the screen, not the CPU only on the right - newest V 1.5.7 needed) ? http://www.macupdate.com/info.php/id/33632/smallluxgpu high hybrid vs high cpu only on my 8800GTX = 16 sec vs 31 sec - GPU boosts good = 100% time saving (faster cpu, same gpu = less time saving %) ultra hybrid vs ultra cpu only = much less GPU boost ("only" 20% time saving), because C2D CPUs are overloaded/ near full load already with the cpu tasks and cant feed the GPU fast enough with data. So CPUs with equal/more than 4 cpu cores (real not virt) will get higher boost % also in ultra hybrid. But also will not get same big boost as with high hybrid. Link to comment Share on other sites More sharing options...
mitch_de Posted May 5, 2010 Author Share Posted May 5, 2010 SLG updated to 1.5.8 ! Benchmark result times cant be compared to old versions - some benches have siginificant diff settings = diff times to old version. Link to comment Share on other sites More sharing options...
mitch_de Posted May 7, 2010 Author Share Posted May 7, 2010 Ultra High GPU only was an Bug. Now 1.6.0 available ! I added OpenCL Pixel Filter benches and cleanded up the gui. Now all gpu only benches ware beside cpu only and hybrid and use same settings. Before the gpu only benches had own settings compared to hybrid + cpu only. Now its more clear and should be bugfree. Ready to collect references again (will hold next versions). Att pixelfilter Mega Samples/Sec of 8800GTX and Ultra GPU only (4870 will perform much faster, but not anymore 1,6 sec ) Link to comment Share on other sites More sharing options...
mitch_de Posted May 12, 2010 Author Share Posted May 12, 2010 Wow , ATI 4870 gets really fast MegaSamples/Sec in the new pixelfilter bench ! Any GTX 2xx users here which can get a bit closer than my old 8800GTX ?! Link to comment Share on other sites More sharing options...
machinist Posted May 15, 2010 Share Posted May 15, 2010 I may be getting anomalous results with the Open CL Benchmark test using version 1.6.2 The 9800GX2 is only processing at two-thirds the speed of your 8800GTX, yet is a third faster in the Ultrahigh GPU only Benchmark? Link to comment Share on other sites More sharing options...
mitch_de Posted May 15, 2010 Author Share Posted May 15, 2010 8800GTX is much faster than 8800GT. In 8800GT vs 9800X2 the X2 would be looking better 9800x2 cant get near 2* 8800GTX. Also the cpu maybe "to slow" to feed both OpenCL cpus fast enough. Try High end CPU only vs hybrid - you may get better advantadge to my 8800GTX high end values. I got also GT120 Results (MacPro 2009) Ultra GPU only 280 sec - so dont worry about 9800x2 You even can see her, that OpenCL with very fast cpus (MacPro 2009) and slow GPU is worst case - hybrid even slower than cpu only. Overhead of OpenCL in hybrid makes slow gpus with very fast cpus (4 cores+) useless. But most of us will NOT have scuh an combination of 2*XEON + GT120 - i hope PS: I also got ATI 5870 (Win) OpenCL Pixelfilter values ! AddSample[FILTER_NONE] Benchmark [CypressPixel][Samples/sec 1669.42M] AddSample[FILTER_PREVIEW] Benchmark [CypressPixel][Samples/sec 369.56M] AddSample[FILTER_GAUSSIAN] Benchmark [CypressPixel][Samples/sec 217.81M] Link to comment Share on other sites More sharing options...
machinist Posted May 15, 2010 Share Posted May 15, 2010 8800GTX is much faster than 8800GT... Mitch: Thanks for the reply, but I guess I wasn't quite clear. It's the Open CL Pixelfilter test which produces results that appear inconsistent or anomalous. In all the other tests the 9800GX2 predictably "bests" the 8800GTX. In the Pixelfilter run the 9800GX2 only processes two thirds the information in the 30 secs that the 8800GTX does in the same time. It is as if the Pixelfilter test does not use both cores of the 9800GX2. This may be a bug? Link to comment Share on other sites More sharing options...
mitch_de Posted May 17, 2010 Author Share Posted May 17, 2010 Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus. But for sure in benchpixel the usage of the vram is much more / more often than raytraycing benches. I dont know if on older 2 gpu cards it may happen a slowdown in case of concurrented vram usage (read/write) which reduces vram overallspeed of 2gpu card vs 1 gpu card. For an closer look start benchpixel in terminal and post the output - here we can see how may gpu devices are used. Compare the infos of devices with mine. 8800GTX Device 0,1 = cpu cores Device 2 = GPU (single 8800GTX) mitch:~ ami$ /Users/ami/Desktop/benchpixel LuxRays Simple PixelDevice Benchmark v0.1alpha7dev Usage (easy mode): /Users/ami/Desktop/benchpixel OpenCL Platform 0: Apple Device 0 NativeThread name: NativeThread-000 Device 1 NativeThread name: NativeThread-001 Device 2 OpenCL name: GeForce 8800 GTX Device 2 OpenCL type: GPU Device 2 OpenCL units: 16 Device 2 OpenCL max allocable memory: 192MBytes Device 3 OpenCL name: Intel® Core2 Duo CPU E7300 @ 2.66GHz Device 3 OpenCL type: CPU Device 3 OpenCL units: 2 Device 3 OpenCL max allocable memory: 1024MBytes Selected pixel device: GeForce 8800 GTXCreating 1 pixel device(s) Allocating pixel device 0: GeForce 8800 GTX (Type = OPENCL) benchpixel.zip Link to comment Share on other sites More sharing options...
machinist Posted May 17, 2010 Share Posted May 17, 2010 Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus... It appears the test is using both gpus and all memory. The 9800gx2 does better then the 8800gtx in every other test. May be a bug in card design with just this test, or could be a bug in the test? In WinWorld I've run many tests on the 9800gx2 while considering overclocking its bios. Watching proc temps and gpu usage I have noticed some benchmark and stress programs do not actually use both gpus, though they see both. Has this test run on other two gpu cards or multiple card setups? Let me know how it goes. I am curious. terminal_pixel.rtf Link to comment Share on other sites More sharing options...
mitch_de Posted May 17, 2010 Author Share Posted May 17, 2010 Yep. benchpixel uses both gpus. Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core. Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ? Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better. Link to comment Share on other sites More sharing options...
machinist Posted May 17, 2010 Share Posted May 17, 2010 Yep. benchpixel uses both gpus.Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core. Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ? Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better. As you can see, each C2D of the Q has 1 MB more L2 available then your C2D. (Disregard the bus speed indicated. CPU-X just reports what it is told. The Q6600 runs at 9x360.) Link to comment Share on other sites More sharing options...
mitch_de Posted May 19, 2010 Author Share Posted May 19, 2010 I got answer from the dev team: benchpixel filtering uses only one GPU. SLG (the raytracing) all gpus. So its clear that dual gpu results are lower than slg compared to single gpu card. I got some MacPro 2009 ATI 4870 / GTX 285 results (slg 1.6.2) GTX 285 performs better i guessed ! Bench UltraHigh GPU Only Radeon HD 4870 = 54 sec GeForce GTX 285 = 32 sec!! // GT120 = 280 sec!!!! , 8800GTX=100 sec Bench UltraHigh Hybrid Radeon HD 4870 = 27 sec GeForce GTX 285 = 25 sec Bench GPU with OpenCL pixel filtering none Radeon HD 4870 = 1072Ms/s GeForce GTX 285 = 945Ms/s preview Radeon HD 4870 = 219Ms/s GeForce GTX 285 = 298Ms/s gaussian Radeon HD 4870 = 96Ms/s GeForce GTX 285 = 167Ms/s Link to comment Share on other sites More sharing options...
Pencs Posted May 27, 2010 Share Posted May 27, 2010 Core i7 920 @ 2.66Ghz + GTX275 Ultrahigh GPU only = 36.2 sec Highend GPU only = 17.6 sec Midrange GPU only = 10.4 sec Ultrahigh Hybrid = 29.9 sec Highend Hybrid = 15.8 sec Midrange Hybrid = 6.4 sec Ultrahigh CPU only = 52.9 sec Highend CPU only = 54.9 sec (?) Midrange CPU only = 27.9 sec Open CL Filtering None = 333.30M/sec Preview = 216.22M/sec Gaussian = 140.13M/sec Hope that's helpful at all. Let me know if there's anything else you want me to bench. Link to comment Share on other sites More sharing options...
Tewan Posted May 28, 2010 Share Posted May 28, 2010 Using a GTX280 with a Core i5-750 2.66ghz (2gb single channel memory.... yeah I know, I'm getting another stick soon). CPU Only Midrange: 41.3sec CPU Only Highend: 83.4sec CPU Only Ultra: 74.4sec Hybrid Midrange: 7.0sec Hybrid Highend: 14.7sec Hybrid Ultra: 40.4sec GPU Only Midrange: 11.2sec GPU Only Highend: 16.4sec GPU Only Ultra: 39.7sec FILTER NONE: 875.99M FILTER PREVIEW: 272.68M FILTER GAUSSIAN: 142.42M Man.. my {censored} is all over the place. Link to comment Share on other sites More sharing options...
mitch_de Posted May 28, 2010 Author Share Posted May 28, 2010 Yep - GTX 280 has much benefits compared to the other Nvidias running OpenCL. "Ultrahigh CPU only = 52.9 sec Highend CPU only = 54.9 sec (?) Midrange CPU only = 27.9 sec " In CPU only (and Hybrid) benches more CPU cores are used by running more threads than in Mid and Highend benches. So on 4 core CPUs UltraHigh profits of more cpu power and may run even faster than Highend CPU only. On C2D CPUs UltraHigh CPU runs much slower. Link to comment Share on other sites More sharing options...
Recommended Posts