eep357 Posted February 4, 2013 Share Posted February 4, 2013 I went to New Jersey and all I got was this stupid shirt......Oh, and a GTX 660! 2 Link to comment Share on other sites More sharing options...
gothic860 Posted February 4, 2013 Share Posted February 4, 2013 *First Kepler score* 892.2 FPS Brand new Mountain Lion 10.8.2 install Nvidia 304.00.05f02 driver, CUDA 5.0.37 OpenCL unpatched No AGPM edits GraphicsEnabler=n - no injection! MacPro3,1 system definition EVGA Geforce GTX 660 2GB Core 2 Duo E8500 @ 3.16GHz Asus P5Q-E (P45 Express/ICH10R) 4GB RAM And why i get only 485fps with my GTX680 Link to comment Share on other sites More sharing options...
TH3L4UGH1NGM4N Posted February 4, 2013 Share Posted February 4, 2013 That's the part that's blowing my brain cells since I saw your 680 fps and it was half that of a 660? (~_^) Link to comment Share on other sites More sharing options...
mitch_de Posted February 4, 2013 Author Share Posted February 4, 2013 And why i get only 485fps with my GTX680 Hmmm, does show OpenCL bandwith test some bottleneck compared to similar cpu/gpu systems? But i heared that, also Luxmark (OpenCL) speed of some GTX 6xx ( i dont remember which) was much worse than older GTX 5xx card. Maybe it was the GTX 680 which si fast OpenGL card but way slower than possible in CUDA/OpenCL because some internal design changes to get more FPS out in OpenGL in cost of much less OpenCL / Shader speed ? attached luxmark DB results (fastest, medium scene), GTX 680 has less compute units than other highend gpus. Link to comment Share on other sites More sharing options...
Regi Yassin Posted February 4, 2013 Share Posted February 4, 2013 here is my results hw details, in sig nvidia official driver for 10.8.2 + cuda 5.0.37 + agpm edit iMac 12,2 Link to comment Share on other sites More sharing options...
mitch_de Posted February 4, 2013 Author Share Posted February 4, 2013 Great to see GTX 650 Ti works well & fast. Link to comment Share on other sites More sharing options...
Gringo Vermelho Posted February 4, 2013 Share Posted February 4, 2013 *lol* guys!! Link to comment Share on other sites More sharing options...
RobertX Posted February 4, 2013 Share Posted February 4, 2013 ...and now for something completely different... /Users/leslie/Downloads/oclBandwidthTest Starting... Running on... GeForce GT 430 Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 161.5 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 202.9 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 12265.7 [oclBandwidthTest] test results... PASSED > exiting in 3 seconds: 3...2...1...done! logout Link to comment Share on other sites More sharing options...
mitch_de Posted February 5, 2013 Author Share Posted February 5, 2013 uuups, really low (never seen!) Bandwidth speeds of your GT 430 card. Test it again - dont use it beside any other app running , and dont move mouse as it runs. Isnt a burner but should perform at least the CPUGPU MB/s many times faster. At least 3-4 times faster should even lowest end GPUs like GT GT 210 /220 the PCIe Slot + CPU + BUS transfer the data to/from gpu. Such PCIe Speed looks like AGP transferspeeds (old GPU slot type). Perhaps some Interrupt problems ? We should collect some similar gpu bandwidth results for that user. background: low bandwidth speeds may NOT end in also low OpenGL/OpenCL speeds but will have an negative effect. On OpenGL for texture transfers, for OpenCL/CUDA data transfers. Link to comment Share on other sites More sharing options...
RobertX Posted February 5, 2013 Share Posted February 5, 2013 Running on... GeForce GT 430 Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 162.4 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 203.1 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 11875.8 [oclBandwidthTest] test results... PASSED > exiting in 3 seconds: 3...2...1...done! system profiler says my x16 pci-e card (128bit)is running at x1 ...examining the card i've found two or more of the gold connector pins appear damaged(not full length like the others) windows8 also reports x1 lane width and it's still faster than my gt520(which is only 64bit) Link to comment Share on other sites More sharing options...
RobertX Posted February 6, 2013 Share Posted February 6, 2013 rolled back my drivers... new results /Users/leslie/Downloads/oclBandwidthTest Starting... Running on... GeForce GT 430 Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 10129.9 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 31610.5 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6642.8 [oclBandwidthTest] test results... PASSED > exiting in 3 seconds: 3...2...1...done! EDIT: got it working... http://www.insanelymac.com/forum/topic/286133-asus-gt430-running-pci-e-lane-width-x1/ Link to comment Share on other sites More sharing options...
RobertX Posted February 11, 2013 Share Posted February 11, 2013 ...finally, a somewhat "Happy Hack" Link to comment Share on other sites More sharing options...
mitch_de Posted February 25, 2013 Author Share Posted February 25, 2013 updated Tool to V1.4. added bandwidth measuring at programm start Bandwidthes: VRAM SPEED/ cpu speed/gpu speed = device to device MB/s PCIe Mode (Lanes x1,8,16) /CPU/Chipset/GPU speed = host > device & device > host MB/s If someone gets much less than 1000 MB/sec (1 GB /sec) in host > device and/or device > host values , than something is wrong with PCIe Speed (only used 1 Lane insted 8 or 16 lanes). CPU speed and gpu speed doenst matter in this case of Highest possible values here will be about 8000-9000 MB/sec. Bad values are much below 1000 MB/sec. VRAM speed can be seen with the device to device MB/sec. If VRAM is clocked low or much more important is only designed in 64 or 128 Bit you will get worse MB/sec here. 256/384/512 BIT VRAM shows much faster MB/sec. Highest possible value here will be around 90000 MB/sec. Bad (indicates slow VRAM 64/128 Bit) is below 15000 MB/sec. Link to comment Share on other sites More sharing options...
RobertX Posted February 26, 2013 Share Posted February 26, 2013 ...just passin' through Link to comment Share on other sites More sharing options...
mitch_de Posted March 1, 2013 Author Share Posted March 1, 2013 update to V 1.5 - UI changes for bandwidth test results. Link to comment Share on other sites More sharing options...
mitch_de Posted March 1, 2013 Author Share Posted March 1, 2013 Wow got fast bandwidth results: HACKINTOSH OS X 10.8.3 Intel® Core™ i5-3570K CPU @ 3.40GHz 3400 MHz GPU ATI Radeon HD Pitcairn XT Prototype Compute Engine 1000 MHz 444.9 fps Bandwidthes: device>host: 12002.8 MB/s host>device:10074.9 MB/s device >device (VRAM): 83085.6MB/s What kind of AMD 6/7xxx gpu? Most users will be limited by PCi 2.0 with max. 8000 MB/sec. Link to comment Share on other sites More sharing options...
k3nny Posted March 1, 2013 Share Posted March 1, 2013 It is an XFX Raden HD 7870 DD with the hardware in my signature. Would be interesting to see a comparison to another 7xxx card. eep357, where are you? Link to comment Share on other sites More sharing options...
eep357 Posted March 1, 2013 Share Posted March 1, 2013 In other news, my Mac Developer account just expired 2 mins ago and I don't have the $ to renew right now Hopefully yesterdays beta was the last! Link to comment Share on other sites More sharing options...
mitch_de Posted March 2, 2013 Author Share Posted March 2, 2013 PS: The two PCIe transfer speeds doenst matter much for gaming / openGL in case of 2000 vs 4000 vs 10000. Only if very bad (like AGP performance Some gpu magazine tested that by switching from x16 Lane (up to 8000 MB/s) down to x1 Lane (up to 500 MB/s) by PCIe Slot pin manipulations. x16 > x8 or X4 was only a few % FPS speed diff. But x1 (up to 500 MB/s) was 30 % less FPS. PCIe speed has much more diff in usage of data hungry gpu compute tasks (CUDA or AMD STEAM or OpenCL) were much more + constant huge data transfers moved over the pcie bus. The 3. value, GPU/ VRAM has en direct effect for gaming speed - beside gpu performance. Link to comment Share on other sites More sharing options...
eep357 Posted March 2, 2013 Share Posted March 2, 2013 mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed? Link to comment Share on other sites More sharing options...
Wayang-NT Posted March 2, 2013 Share Posted March 2, 2013 12D74 ... windowed FS Link to comment Share on other sites More sharing options...
eep357 Posted March 2, 2013 Share Posted March 2, 2013 @k3nny- If 90000MB/s is max possible for 512bit VRAM, I don't think device >device (VRAM): 83085.6MB/s can be possible with 256bit GDDR5 memory on the 7870? Since your using Clover to boot, in config.plist be sure there are no values entered for CPU speed or Turbo as this can slow down the OS system clock and cause OS to think things are going faster than they really are. Link to comment Share on other sites More sharing options...
k3nny Posted March 2, 2013 Share Posted March 2, 2013 I left these settings for Clover to decide. I neither have CPU Speed, nor Turbo in my config file. I don't get the big difference either. 1 Link to comment Share on other sites More sharing options...
mitch_de Posted March 2, 2013 Author Share Posted March 2, 2013 mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed? Yes it should be 100% gpu dependend - only if the AGPM of the other pc / OS X system is setup different / works wrong and the GPU + VRAM clocks getting much different the results (all!) will be also much different even using same gpu. Interesting that one user gets much more FPS runnig Oceanwave in fullscreen = much higher res than windowed with 500x500 res. My 9600 GT is much slower in the case of fullscreen 1400x900 vs 500x500 windowed. Link to comment Share on other sites More sharing options...
eep357 Posted March 2, 2013 Share Posted March 2, 2013 Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 4647.1 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6425.0 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 143930.1 [oclBandwidthTest] test results... PASSED > exiting in 3 seconds: 3...2...1...done! Using command line version, much different results? Also in logs of bench see lots of <program source>:226:26: warning: double precision constant requires cl_khr_fp64, casting to single precision And it's using openCL 1.1 driver version 1.0, giving it far less extensions to utilize, also showing no double precision support which is a feature of this card, but requires openCL 1.2 [Device 0] Name: ATI Radeon HD Tahiti XT Prototype Compute Engine Vendor: AMD Type: GPU Device Version: OpenCL 1.1 Driver Version: 1.0 Compute Units: 32 Work Group Size: 1024 Clock: 1050 MHz Global Memory: 1536 MB Local Memory: 32 KB Cache Size: 0 KB Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store vs CPU [Device 1] Name: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Vendor: Intel Type: CPU Device Version: OpenCL 1.2 Driver Version: 1.1 Compute Units: 8 Work Group Size: 1024 Clock: 3800 MHz Global Memory (Total): 24576 MB Global Memory (Host): 24576 MB Global Memory (PCIe): 0 MB Local Memory: 32 KB Cache Size: 0.0625 KB Cache Line Size: 8388608 Bytes Available: Yes Double-Precision: Yes Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats Since this info come from same oclinfo tool I won't post it's output, but Luxmark shows correctly as openCL 1.2, and of course benches very well with this GPU Link to comment Share on other sites More sharing options...
Recommended Posts