Jump to content

Nvidia Fermi GTX 4xx, GTX2xx (+ others) Users for Benchmark WANTED


mitch_de
 Share

62 posts in this topic

Recommended Posts

EDIT :

DL link for newest slg version always at macupdate,com.

EDIT 30.07. PerFinal V171_3

http://rapidshare.com/files/410151278/smallluxGPU171_V3.zip

 

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

Needed : all NVs >= 8800

Select the luxball (standard scene) and the Benchmark GPU only modes with 2,3 and 4 gpu threads and post your kSamles/Sec in that gpu only modes.

My gpu only results (8800GTX) are shown in the screenshoot.

GTX 260++ will perform much faster. 9400M much slower.

 

 

EDIT: after i while i find that the GRASS OpenCL Demo also is an good OPENCL Bench.

i get 54 FPS with 9600GT.

Bildschirmfoto_2010_03_21_um_21.44.16.jpg

Grass_OPENCL.zip

Bildschirmfoto_2010_11_17_um_11.38.53.jpg

Link to comment
Share on other sites

Running 10.6.3

 

Intriguingly this test divides the workload across the cores of the 9800 GX2, and uses both G92 chips in concert.

 

Cinebench 11.5 opengl test yields 26.17 fps in 10.6.3 and 34.32 with Win7(64).

 

Openglviewer produced lower scores in 10.6.3 then the ~3200+ fps scores with 10.6.2. It reports it is only using 16 compute units.

 

I would note opengl 3.0 was only at 65% with 10.6.2 while it's at 91% with 10.6.3.

two_threads.tiff

three_threads.tiff

four_threads.tiff

Link to comment
Share on other sites

Thanks !

Can you please try the new 1.5.2 version, which shows better comparable xy Sec as Speed in the new benchmark gpu Mode ?

8800GTX needs 28 sec, 9400M 156 sec

And the 9800 GX2 needs 17.8 seconds.

 

Small matters: "title bar" and pull down menu is Deutsch; guessed to go to macupdate to download the program as you neglected to link to it here.

 

That aside, this is becoming an interesting little utility.

post-249157-1270599622_thumb.png

Link to comment
Share on other sites

MSI GTX260 192 core on 10.6.3 using NVenabler.

 

with 2 threads I had 668K/sec, 3 threads 678K/sec average after 128 samples.

 

I used version 1.5.3 and "benchmark midrange CPU" resulted in 16.9 seconds, highend benchmark in 31.2 seconds.

 

Hope this helps with whatever you're doing.

Link to comment
Share on other sites

Thanks !

Perhaps an GTX 285 or 2*GTX 260 user can get closer to ATI 4850 (High Benchmark 17 sec) or ATI 4870 (15 sec) ?

GTX 260 in High around 29 sec (my 8800GTX=59 sec, 9600GT=80 sec) is fastest GTX gpu until now, but far away from the units speed of the 48xx.

Also shader unit MHZ may give little speed boost some GTX 260 showed 1348 MHz, some 1408 Mhz in the benchmark mode result window!

 

Thanks for the multi GPU card 9800X2 test !

Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.)

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu.

I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec.

Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result.

 

EDIT: I got Results from iMac 27" ATI 4850M : 21 sec in High Benchmark mode. Slower than 4870 (15 sec) but even faster than GTX 260.

Shaderspeed (lots of units) of ATI 48xx cant get cracked by older Geforces.

But Fermi will do - i am sure.

 

For sure, in overall gaming speed isnt so much different as in OpenCL speed !

ATI 4870 is not 4 times faster than 8800GTX running an game!

sponzagpu.jpg

Link to comment
Share on other sites

Updated to 1.5.5.

Added Ultra highend Benchmarkmode !

8800 GTX = 101 sec

GTX 285 (Mac) = 44,7 sec

As before , the ATI 48xx cards (even the mobile Imac 4850m) will outperform that :)

Link to comment
Share on other sites

"Thanks for the multi GPU card 9800X2 test !

Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.)

 

Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu.

I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec.

Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result."

 

 

 

Newer slg in High Benchmark Mode = 36.7 secs.

 

Ultrahighend Benchmark Mode = 53.7 secs.

 

Sponza scene with 48 samples, 3 threads, GPU only = 35k samples/sec.

 

(Using version 1.5.5)

Link to comment
Share on other sites

Cheers.

 

Benched GTX 260 on its own before I eventually work out how to stick the second one in.

 

Midrange GPU - 16 seconds

High End GPU - 25 seconds

UltraHybrid Sponza - 22 seconds

Link to comment
Share on other sites

Thanks !

Could you also compare High Hybrid vs High CPU only and Ultra Hybrid vs Ultra CPU only(both in the middle section of the screen, not the CPU only on the right - newest V 1.5.7 needed) ?

 

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

high hybrid vs high cpu only on my 8800GTX = 16 sec vs 31 sec - GPU boosts good = 100% time saving (faster cpu, same gpu = less time saving %)

ultra hybrid vs ultra cpu only = much less GPU boost ("only" 20% time saving),

because C2D CPUs are overloaded/ near full load already with the cpu tasks and cant feed the GPU fast enough with data.

So CPUs with equal/more than 4 cpu cores (real not virt) will get higher boost % also in ultra hybrid. But also will not get same big boost as with high hybrid.

Link to comment
Share on other sites

  • 2 weeks later...

SLG updated to 1.5.8 !

Benchmark result times cant be compared to old versions - some benches have siginificant diff settings = diff times to old version.

Link to comment
Share on other sites

Ultra High GPU only was an Bug.

Now 1.6.0 available !

I added OpenCL Pixel Filter benches and cleanded up the gui.

Now all gpu only benches ware beside cpu only and hybrid and use same settings. Before the gpu only benches

had own settings compared to hybrid + cpu only.

Now its more clear and should be bugfree.

Ready to collect references again (will hold next versions).

 

Att pixelfilter Mega Samples/Sec of 8800GTX and Ultra GPU only (4870 will perform much faster, but not anymore 1,6 sec :rolleyes: )

pixelfilt.jpg

Bildschirmfoto_2010_05_07_um_11.48.01.jpg

Link to comment
Share on other sites

I may be getting anomalous results with the Open CL Benchmark test using version 1.6.2

 

The 9800GX2 is only processing at two-thirds the speed of your 8800GTX, yet is a third faster in the Ultrahigh GPU only Benchmark?

 

post-249157-1273918522_thumb.png

Link to comment
Share on other sites

8800GTX is much faster than 8800GT. In 8800GT vs 9800X2 the X2 would be looking better :)

9800x2 cant get near 2* 8800GTX.

Also the cpu maybe "to slow" to feed both OpenCL cpus fast enough.

Try High end CPU only vs hybrid - you may get better advantadge to my 8800GTX high end values.

 

I got also GT120 Results (MacPro 2009)

Ulta_GT120.jpg

 

Ultra GPU only 280 sec - so dont worry about 9800x2 ;)

You even can see her, that OpenCL with very fast cpus (MacPro 2009) and slow GPU is worst case - hybrid even slower than cpu only.

Overhead of OpenCL in hybrid makes slow gpus with very fast cpus (4 cores+) useless.

But most of us will NOT have scuh an combination of 2*XEON + GT120 - i hope ;)

 

PS: I also got ATI 5870 (Win) OpenCL Pixelfilter values !

 

AddSample[FILTER_NONE] Benchmark

[CypressPixel][Samples/sec 1669.42M]

 

AddSample[FILTER_PREVIEW] Benchmark

[CypressPixel][Samples/sec 369.56M]

 

AddSample[FILTER_GAUSSIAN] Benchmark

[CypressPixel][Samples/sec 217.81M]

Link to comment
Share on other sites

8800GTX is much faster than 8800GT...

 

Mitch:

 

Thanks for the reply, but I guess I wasn't quite clear. It's the Open CL Pixelfilter test which produces results that appear inconsistent or anomalous. In all the other tests the 9800GX2 predictably "bests" the 8800GTX. In the Pixelfilter run the 9800GX2 only processes two thirds the information in the 30 secs that the 8800GTX does in the same time. It is as if the Pixelfilter test does not use both cores of the 9800GX2. This may be a bug?

Link to comment
Share on other sites

Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus.

But for sure in benchpixel the usage of the vram is much more / more often than raytraycing benches. I dont know if on older 2 gpu cards it may happen a slowdown in case of concurrented vram usage (read/write) which reduces vram overallspeed of 2gpu card vs 1 gpu card.

For an closer look start benchpixel in terminal and post the output - here we can see how may gpu devices are used. Compare the infos of devices with mine.

 

8800GTX

Device 0,1 = cpu cores

Device 2 = GPU (single 8800GTX)

 

 

mitch:~ ami$ /Users/ami/Desktop/benchpixel

LuxRays Simple PixelDevice Benchmark v0.1alpha7dev

Usage (easy mode): /Users/ami/Desktop/benchpixel

OpenCL Platform 0: Apple

Device 0 NativeThread name: NativeThread-000

Device 1 NativeThread name: NativeThread-001

Device 2 OpenCL name: GeForce 8800 GTX

Device 2 OpenCL type: GPU

Device 2 OpenCL units: 16

Device 2 OpenCL max allocable memory: 192MBytes

Device 3 OpenCL name: Intel® Core™2 Duo CPU E7300 @ 2.66GHz

Device 3 OpenCL type: CPU

Device 3 OpenCL units: 2

Device 3 OpenCL max allocable memory: 1024MBytes

Selected pixel device: GeForce 8800 GTXCreating 1 pixel device(s)

Allocating pixel device 0: GeForce 8800 GTX (Type = OPENCL)

benchpixel.zip

Link to comment
Share on other sites

Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus...

 

It appears the test is using both gpus and all memory. The 9800gx2 does better then the 8800gtx in every other test. May be a bug in card design with just this test, or could be a bug in the test? In WinWorld I've run many tests on the 9800gx2 while considering overclocking its bios. Watching proc temps and gpu usage I have noticed some benchmark and stress programs do not actually use both gpus, though they see both. Has this test run on other two gpu cards or multiple card setups?

 

Let me know how it goes. I am curious.

 

 

terminal_pixel.rtf

Link to comment
Share on other sites

Yep. benchpixel uses both gpus.

Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core.

Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ?

Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better.

Link to comment
Share on other sites

Yep. benchpixel uses both gpus.

Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core.

Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ?

Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better.

 

As you can see, each C2D of the Q has 1 MB more L2 available then your C2D.

 

post-249157-1274104206_thumb.png

 

(Disregard the bus speed indicated. CPU-X just reports what it is told. The Q6600 runs at 9x360.)

Link to comment
Share on other sites

I got answer from the dev team: benchpixel filtering uses only one GPU.

SLG (the raytracing) all gpus.

So its clear that dual gpu results are lower than slg compared to single gpu card.

 

 

I got some MacPro 2009 ATI 4870 / GTX 285 results (slg 1.6.2)

GTX 285 performs better i guessed !

 

Bench UltraHigh GPU Only

Radeon HD 4870 = 54 sec

GeForce GTX 285 = 32 sec!! // GT120 = 280 sec!!!! , 8800GTX=100 sec

Bench UltraHigh Hybrid

Radeon HD 4870 = 27 sec

GeForce GTX 285 = 25 sec

 

Bench GPU with OpenCL pixel filtering

none

Radeon HD 4870 = 1072Ms/s

GeForce GTX 285 = 945Ms/s

preview

Radeon HD 4870 = 219Ms/s

GeForce GTX 285 = 298Ms/s

gaussian

Radeon HD 4870 = 96Ms/s

GeForce GTX 285 = 167Ms/s

Link to comment
Share on other sites

Core i7 920 @ 2.66Ghz + GTX275

 

Ultrahigh GPU only = 36.2 sec

Highend GPU only = 17.6 sec

Midrange GPU only = 10.4 sec

 

Ultrahigh Hybrid = 29.9 sec

Highend Hybrid = 15.8 sec

Midrange Hybrid = 6.4 sec

 

Ultrahigh CPU only = 52.9 sec

Highend CPU only = 54.9 sec (?)

Midrange CPU only = 27.9 sec

 

Open CL Filtering

None = 333.30M/sec

Preview = 216.22M/sec

Gaussian = 140.13M/sec

 

Hope that's helpful at all. Let me know if there's anything else you want me to bench. ;)

Link to comment
Share on other sites

Using a GTX280 with a Core i5-750 2.66ghz (2gb single channel memory.... yeah I know, I'm getting another stick soon).

 

CPU Only Midrange: 41.3sec

CPU Only Highend: 83.4sec

CPU Only Ultra: 74.4sec

 

Hybrid Midrange: 7.0sec

Hybrid Highend: 14.7sec

Hybrid Ultra: 40.4sec

 

GPU Only Midrange: 11.2sec

GPU Only Highend: 16.4sec

GPU Only Ultra: 39.7sec

 

FILTER NONE: 875.99M

FILTER PREVIEW: 272.68M

FILTER GAUSSIAN: 142.42M

 

Man.. my {censored} is all over the place.

post-269528-1275028742_thumb.jpg

Link to comment
Share on other sites

Yep - GTX 280 has much benefits compared to the other Nvidias running OpenCL.

 

"Ultrahigh CPU only = 52.9 sec

Highend CPU only = 54.9 sec (?)

Midrange CPU only = 27.9 sec

"

In CPU only (and Hybrid) benches more CPU cores are used by running more threads than in Mid and Highend benches.

So on 4 core CPUs UltraHigh profits of more cpu power and may run even faster than Highend CPU only.

On C2D CPUs UltraHigh CPU runs much slower.

Link to comment
Share on other sites

 Share

×
×
  • Create New...