Jump to content

OpenCL Oceanwave Bench and (new) CompuBench CL


mitch_de
 Share

367 posts in this topic

Recommended Posts

*First Kepler score*

 

892.2 FPS

 

Brand new Mountain Lion 10.8.2 install

Nvidia 304.00.05f02 driver, CUDA 5.0.37

OpenCL unpatched

No AGPM edits

GraphicsEnabler=n - no injection!

MacPro3,1 system definition

EVGA Geforce GTX 660 2GB

Core 2 Duo E8500 @ 3.16GHz

Asus P5Q-E (P45 Express/ICH10R) 4GB RAM

 

And why i get only 485fps with my GTX680 :wallbash:

Link to comment
Share on other sites

And why i get only 485fps with my GTX680 :wallbash:

 

Hmmm, does show OpenCL bandwith test some bottleneck compared to similar cpu/gpu systems?

But i heared that, also Luxmark (OpenCL) speed of some GTX 6xx ( i dont remember which) was much worse than older GTX 5xx card.

Maybe it was the GTX 680 which si fast OpenGL card but way slower than possible in CUDA/OpenCL because some internal design changes

to get more FPS out in OpenGL in cost of much less OpenCL / Shader speed ?

 

attached luxmark DB results (fastest, medium scene), GTX 680 has less compute units than other highend gpus.

 

Bildschirmfoto 2013-02-04 um 11.54.57.jpg

Link to comment
Share on other sites

...and now for something completely different...

 

/Users/leslie/Downloads/oclBandwidthTest Starting...

 

Running on...

 

GeForce GT 430

 

Quick Mode

 

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 161.5

 

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 202.9

 

Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 12265.7

 

[oclBandwidthTest] test results...

PASSED

 

> exiting in 3 seconds: 3...2...1...done!

 

logout

Link to comment
Share on other sites

uuups, really low (never seen!) Bandwidth speeds of your GT 430 card. Test it again - dont use it beside any other app running , and dont move mouse as it runs.

Isnt a burner but should perform at least the CPUGPU MB/s many times faster. At least 3-4 times faster should even lowest end GPUs like GT GT 210 /220 the PCIe Slot + CPU + BUS transfer the data to/from gpu. Such PCIe Speed looks like AGP transferspeeds (old GPU slot type).

Perhaps some Interrupt problems ?

 

We should collect some similar gpu bandwidth results for that user.

 

 

background: low bandwidth speeds may NOT end in also low OpenGL/OpenCL speeds but will have an negative effect. On OpenGL for texture transfers, for OpenCL/CUDA data transfers.

Link to comment
Share on other sites

Running on...

 

GeForce GT 430

 

Quick Mode

 

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 162.4

 

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 203.1

 

Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 11875.8

 

[oclBandwidthTest] test results...

PASSED

 

> exiting in 3 seconds: 3...2...1...done!

 

system profiler says my x16 pci-e card (128bit)is running at x1 ...examining the card i've found two or more of the gold connector pins appear damaged(not full length like the others) windows8 also reports x1 lane width and it's still faster than my gt520(which is only 64bit) :worried_anim:

Link to comment
Share on other sites

rolled back my drivers... new results

 

/Users/leslie/Downloads/oclBandwidthTest Starting...

 

Running on...

 

GeForce GT 430

 

Quick Mode

 

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 10129.9

 

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 31610.5

 

Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 6642.8

 

[oclBandwidthTest] test results...

PASSED

 

> exiting in 3 seconds: 3...2...1...done!

 

 

EDIT: got it working... http://www.insanelymac.com/forum/topic/286133-asus-gt430-running-pci-e-lane-width-x1/ :smoke:

Link to comment
Share on other sites

  • 2 weeks later...

updated Tool to V1.4. added bandwidth measuring at programm start

Bandwidthes:

VRAM SPEED/ cpu speed/gpu speed = device to device MB/s

PCIe Mode (Lanes x1,8,16) /CPU/Chipset/GPU speed = host > device & device > host MB/s

 

If someone gets much less than 1000 MB/sec (1 GB /sec) in host > device and/or device > host values , than something is wrong with PCIe Speed (only used 1 Lane insted 8 or 16 lanes). CPU speed and gpu speed doenst matter in this case of

Highest possible values here will be about 8000-9000 MB/sec. Bad values are much below 1000 MB/sec.

 

VRAM speed can be seen with the device to device MB/sec. If VRAM is clocked low or much more important is only designed in 64 or 128 Bit you will get worse MB/sec here. 256/384/512 BIT VRAM shows much faster MB/sec.

Highest possible value here will be around 90000 MB/sec. Bad (indicates slow VRAM 64/128 Bit) is below 15000 MB/sec.

Bildschirmfoto 2013-02-25 um 11.12.31.jpg

Link to comment
Share on other sites

Wow got fast bandwidth results:

HACKINTOSH OS X 10.8.3 Intel® Core™ i5-3570K CPU @ 3.40GHz 3400 MHz

GPU ATI Radeon HD Pitcairn XT Prototype Compute Engine 1000 MHz 444.9 fps

 

Bandwidthes:

device>host: 12002.8 MB/s

host>device:10074.9 MB/s

device >device (VRAM): 83085.6MB/s

 

What kind of AMD 6/7xxx gpu?

Most users will be limited by PCi 2.0 with max. 8000 MB/sec.

Link to comment
Share on other sites

OpenCL OceanWave & bandwidth Benchmark V1.4.jpg

 

In other news, my Mac Developer account just expired 2 mins ago and I don't have the $ to renew right now :( Hopefully yesterdays beta was the last! :)

Link to comment
Share on other sites

PS: The two PCIe transfer speeds doenst matter much for gaming / openGL in case of 2000 vs 4000 vs 10000. Only if very bad (like AGP performance

Some gpu magazine tested that by switching from x16 Lane (up to 8000 MB/s) down to x1 Lane (up to 500 MB/s) by PCIe Slot pin manipulations. x16 > x8 or X4 was only a few % FPS speed diff. But x1 (up to 500 MB/s) was 30 % less FPS.

PCIe speed has much more diff in usage of data hungry gpu compute tasks (CUDA or AMD STEAM or OpenCL) were much more + constant huge data transfers moved over the pcie bus.

 

The 3. value, GPU/ VRAM has en direct effect for gaming speed - beside gpu performance.

Link to comment
Share on other sites

@k3nny- If 90000MB/s is max possible for 512bit VRAM, I don't think device >device (VRAM): 83085.6MB/s can be possible with 256bit GDDR5 memory on the 7870? Since your using Clover to boot, in config.plist be sure there are no values entered for CPU speed or Turbo as this can slow down the OS system clock and cause OS to think things are going faster than they really are.

Link to comment
Share on other sites

mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed?

Yes it should be 100% gpu dependend - only if the AGPM of the other pc / OS X system is setup different / works wrong and the GPU + VRAM clocks getting much different the results (all!) will be also much different even using same gpu.

 

 

Interesting that one user gets much more FPS runnig Oceanwave in fullscreen = much higher res than windowed with 500x500 res. My 9600 GT is much slower in the case of fullscreen 1400x900 vs 500x500 windowed.

Link to comment
Share on other sites

Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4647.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6425.0
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 143930.1
[oclBandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!

Using command line version, much different results?

 

Also in logs of bench see lots of

<program source>:226:26: warning: double precision constant requires cl_khr_fp64, casting to single precision

And it's using openCL 1.1 driver version 1.0, giving it far less extensions to utilize, also showing no double precision support which is a feature of this card, but requires openCL 1.2

[Device 0]
Name: ATI Radeon HD Tahiti XT Prototype Compute Engine
Vendor: AMD
Type: GPU
Device Version: OpenCL 1.1
Driver Version: 1.0
Compute Units: 32
Work Group Size: 1024
Clock: 1050 MHz
Global Memory: 1536 MB
Local Memory: 32 KB
Cache Size: 0 KB
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions:
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store

vs CPU

[Device 1]
Name: Intel(R) Core(TM) i7 CPU		 920 @ 2.67GHz
Vendor: Intel
Type: CPU
Device Version: OpenCL 1.2
Driver Version: 1.1
Compute Units: 8
Work Group Size: 1024
Clock: 3800 MHz
Global Memory (Total): 24576 MB
Global Memory (Host): 24576 MB
Global Memory (PCIe): 0 MB
Local Memory: 32 KB
Cache Size: 0.0625 KB
Cache Line Size: 8388608 Bytes
Available: Yes
Double-Precision: Yes
Extensions:
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_3d_image_writes
cl_APPLE_fp64_basic_ops
cl_APPLE_fixed_alpha_channel_orders
cl_APPLE_biased_fixed_point_image_formats

Since this info come from same oclinfo tool I won't post it's output, but Luxmark shows correctly as openCL 1.2, and of course benches very well with this GPU

Link to comment
Share on other sites

 Share

×
×
  • Create New...