cmf Posted August 24, 2011 Share Posted August 24, 2011 (edited) Note: This still applies for 10.7.4 and 10.8! No longer needed for 10.9!good news everyone After I bought a GTX 560 Ti, I noticed a few odd things about the OpenCL support of this card.It's telling you that it's capable of all these things, but it actually isn't and will produce compile errors like "requires .target sm_12 or higher" even though it's a sm_21 capable card. So, I started digging and from the looks of it, Apples OpenCL compiler is only (directly) supporting cards up to sm_20 (Quadro 4000, GTX 480/470/580/570). If it's higher than this it will fallback to sm_10 or sm_11.The solution: let's just pretend we have a 2.0 card So, open up a hex editor of your liking and do this:open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)on 10.7.x and <=10.8.2:find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90on 10.8.3+ (as mentioned here):find: 8B 81 1C 0C 00 00 EB 06 8B 81 20 0C 00 00replace by: B8 02 00 00 00 90 EB 06 B8 00 00 00 00 90savereboot is not required, but recommendedWhat this basically does is replacing the dynamic cc device info in clhDeviceComputeCapability with a hardcoded 2.0 "info". Note that this is x64 only for the moment (which most people are certainly using since 10.7). I will add x86 support at a later point.Also, if you have another non-sm2.0 capable nvidia card installed, this will (probably) break OpenCL support for it.Now, everything that did work before should still be working ... [Device 0] Name: GeForce GTX 560 Ti Vendor: NVIDIA Type: GPU Device Version: OpenCL 1.1 Driver Version: CLH 1.0 Compute Units: 16 Work Group Size: 1024 Clock: 0 MHz Global Memory: 1024 MB Local Memory: 48 KB Cache Size: 0 Bytes Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions (12): cl_APPLE_ContextLoggingFunctions cl_APPLE_SetMemObjectDestructor cl_APPLE_clut cl_APPLE_fp64_basic_ops cl_APPLE_gl_sharing cl_APPLE_query_kernel_names cl_khr_byte_addressable_store cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics ... but programs that are using some advanced OpenCL features (e.g. lexmark) should work now too: Edited September 17, 2013 by cmf 4 Link to comment Share on other sites More sharing options...
riprod Posted August 24, 2011 Share Posted August 24, 2011 The solution: let's just pretend we have a 2.0 card So, open up a hex editor of your liking and do this: open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo) find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02 replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90 save reboot is not required, but recommended I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice. Link to comment Share on other sites More sharing options...
cmf Posted August 24, 2011 Author Share Posted August 24, 2011 I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice. I haven't checked this on 10.7.0 or 10.7.1 yet, but I just checked the 10.7.0 nvidia drivers on 10.7.2 and it does work too (there aren't any relevant ptx/nvidia changes in OpenCL.framework, so this shouldn't matter). It's either an GTX 460 issue or another issue altogether. I would guess the latter, since the error message you get from luxmark w/o this fix is a different one ("- OpenCL ERROR: clBuildProgram(-11)"). Could you check if you have OpenCL support at all (click)? Link to comment Share on other sites More sharing options...
riprod Posted August 25, 2011 Share Posted August 25, 2011 I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says: 1 OpenCL device found! [Device 0] Name: Intel® Core2 Quad CPU Q8400 @ 2.66GHz Vendor: Intel Type: CPU Device Version: OpenCL 1.1 Driver Version: 1.1 Compute Units: 4 Work Group Size: 1024 Clock: 3000 MHz Global Memory (Total): 8192 MB Global Memory (Host): 8192 MB Global Memory (PCIe): 0 MB Local Memory: 32 KB Cache Size: 0.0625 KB Cache Line Size: 2097152 Bytes Available: Yes Double-Precision: Yes Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats Any ideas? Link to comment Share on other sites More sharing options...
montiniz Posted August 25, 2011 Share Posted August 25, 2011 works for me Link to comment Share on other sites More sharing options...
cmf Posted August 25, 2011 Author Share Posted August 25, 2011 I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says: Any ideas? which os x version? have you really applied the initial opencl fix? http://netkas.org/?p=794 (for 10.7.0 and 10.7.1) or http://netkas.org/?p=794#comment-173693 (for 10.7.2) if this didn't work, try this on the console: echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile this will at least make it work partially (but not lexmark which seems to do some other weird stuff ...). Link to comment Share on other sites More sharing options...
riprod Posted August 25, 2011 Share Posted August 25, 2011 which os x version? have you really applied the initial opencl fix? I applied the opencl fix from netkas, but I think it was when I was on 10.7.0. Now on OS X version is 10.7.1. I will try and reapply the netkas opencl fix and see if that changes anything. I'll post back with any differences. Link to comment Share on other sites More sharing options...
riprod Posted August 25, 2011 Share Posted August 25, 2011 I took a look at my /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver. I found that the netkas edits I had done were not saved or had been over written in the update from 10.7.0 to 10.7.1. I reapplied the netkas hex edits and tried luxmark v1.0 and it worked! Thank you. Link to comment Share on other sites More sharing options...
kostya82 Posted August 26, 2011 Share Posted August 26, 2011 cmf big thanks for advice how enable full OpenCL work on 560ti Link to comment Share on other sites More sharing options...
sbl03 Posted August 27, 2011 Share Posted August 27, 2011 I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice. Grr, I get the same error with the same card after doing BOTH netkas and this, and restarting. I checked that the changes were still there and they were, after the restart. What's wrong NVM! Somehow, my edits were wrong. Works now. Thanks. Link to comment Share on other sites More sharing options...
Florian U. Posted September 6, 2011 Share Posted September 6, 2011 Hey there I have a little issue here. The error is the exact same one, although I am using a GT440 card, which is a little different. Bigger issue: I also have a GTX285 installed. (mac pro) When changing that value to a fixed value, both cards get changed, and the gtx285 is a bit older and supports only sm1.3 i think. not sure about gt440, seems like it only supports sm1.0 ?! any chance I can use the patch to get them both to be recognized with sm1.3 or sm1.0 that they will both work? I tried final cut pro x with a few rendering tests, seems like gt440 + gtx285 is slower than just the gtx285 o.O any advice is appreciated. THANK YOU florian Link to comment Share on other sites More sharing options...
cmf Posted September 7, 2011 Author Share Posted September 7, 2011 sm 1.3: 31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90 sm 1.2: 31 C0 FF C0 89 06 FF C0 89 02 90 90 90 90 90 90 sm 1.1: 31 C0 FF C0 89 06 89 02 90 90 90 90 90 90 90 90 untested, but it should work 1 Link to comment Share on other sites More sharing options...
Florian U. Posted September 7, 2011 Share Posted September 7, 2011 sm 1.3:31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90 Confirmed, is working. Thank you so much. Got my GT440 and GTX285 now working in my MacPro4,1 (W3520) on Lion with ATY_init, openCL patch and CMF's patch. Still need to check if Final Cut Pro X will now render faster than with single GTX285. Link to comment Share on other sites More sharing options...
rominator Posted September 14, 2011 Share Posted September 14, 2011 I can confirm this works. Tried on GTX460, full OpenCl on Mac Pro 4,1. Link to comment Share on other sites More sharing options...
mitch_de Posted September 20, 2011 Share Posted September 20, 2011 If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark. Until now, no Fermi GPU runs that benchmark (from Apple) with success. http://www.insanelymac.com/forum/index.php?showtopic=268209 PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly). Link to comment Share on other sites More sharing options...
cmf Posted September 21, 2011 Author Share Posted September 21, 2011 If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark.Until now, no Fermi GPU runs that benchmark (from Apple) with success. http://www.insanelymac.com/forum/index.php?showtopic=268209 PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly). your binary is kinda broken, so i compiled it myself and it does work: 1 Link to comment Share on other sites More sharing options...
rominator Posted September 21, 2011 Share Posted September 21, 2011 your binary is kinda broken, so i compiled it myself and it does work: Could someone post the fixed version or explain how to compile this? Guess we know now why we have been getting 16 fps.... Link to comment Share on other sites More sharing options...
Florian U. Posted September 21, 2011 Share Posted September 21, 2011 Could someone post the fixed version or explain how to compile this? Guess we know now why we have been getting 16 fps.... i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing Link to comment Share on other sites More sharing options...
SuperHack Posted September 21, 2011 Share Posted September 21, 2011 Can someone upload the edited file please for the OpenCL fix? Link to comment Share on other sites More sharing options...
cmf Posted September 21, 2011 Author Share Posted September 21, 2011 i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing thats what i did and it just segfaulted. Could someone post the fixed version or explain how to compile this? Guess we know now why we have been getting 16 fps.... use 10.7 sdk, compile, add #include <OpenGL/OpenGL.h> and #include <OpenGL/gl.h> in the two files you get compile errors in, compile again, possibly run successfully. you'll still get lots of compile warnings in xcode and when compiling the opencl kernel though. i think this already tells you enough about the quality of this sample ... Can someone upload the edited file please for the OpenCL fix? sry, no, not using 10.7.1 any more. but if apple continues to push out a beta every 5 - 7 days and i get annoyed enough, i'll probably write a program ;P Link to comment Share on other sites More sharing options...
Thireus Posted September 29, 2011 Share Posted September 29, 2011 sm 1.3:31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90 untested, but it should work This is working for my GTX480 under 10.7.2 (11C62) Thank you cmf! Link to comment Share on other sites More sharing options...
tayshun12 Posted October 4, 2011 Share Posted October 4, 2011 good news everyone After I bought a GTX 560 Ti, I noticed a few odd things about the OpenCL support of this card. It's telling you that it's capable of all these things, but it actually isn't and will produce compile errors like "requires .target sm_12 or higher" even though it's a sm_21 capable card. So, I started digging and from the looks of it, Apples OpenCL compiler is only (directly) supporting cards up to sm_20 (Quadro 4000, GTX 480/470/580/570). If it's higher than this it will fallback to sm_10 or sm_11. The solution: let's just pretend we have a 2.0 card So, open up a hex editor of your liking and do this: open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo) find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02 replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90 save reboot is not required, but recommended What this basically does is replacing the dynamic cc device info in clhDeviceComputeCapability with a hardcoded 2.0 "info". Note that this is x64 only for the moment (which most people are certainly using since 10.7). I will add x86 support at a later point. Also, if you have another non-sm2.0 capable nvidia card installed, this will (probably) break OpenCL support for it. Hey cmf, I am about to build my new comp in the next few week after I buy all my parts. I am going to install following the asus P8P67 guide in the install forums, but I was wondering should I do this directly after install? Or should I, as I read earlier, install the netkas opencl that people were trying then do this? Thank you for the find! I was about to switch my card of choice(gtx 560 ti) to the 6850 until I decided to take a quick look over at the hardware forums LOL Link to comment Share on other sites More sharing options...
cmf Posted October 5, 2011 Author Share Posted October 5, 2011 should I do this directly after install? Or should I, as I read earlier, install the netkas opencl that people were trying then do this? every single time the file is overwritten by an update, so yes, after the install and after each 10.7.x update. and you need both opencl fixes on non-gf100/gf110 cards. This is working for my GTX480 huh? this isn't required for gtx 480. Link to comment Share on other sites More sharing options...
Thireus Posted October 5, 2011 Share Posted October 5, 2011 huh? this isn't required for gtx 480. For my config it is, without this patch I don't have OpenCL working for my GTX 480. It might be related to the fact I have two GPUs on my HackinTosh: ATI HD6870 + NVidia GTX 480 pyrit benchmark Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url] This code is distributed under the GNU General Public License v3+ Running benchmark (61279.2 PMKs/s)... - Computed 61279.17 PMKs/s total. #1: 'CUDA-Device #1 'GeForce GTX 480'': 25619.8 PMKs/s (RTT 2.9) #2: 'OpenCL-Device 'ATI Radeon Barts XT Prototype'': 31135.4 PMKs/s (RTT 2.8) #3: 'OpenCL-Device 'GeForce GTX 480'': 6744.2 PMKs/s (RTT 3.2) #4: 'CPU-Core (SSE2)': 643.6 PMKs/s (RTT 3.0) #5: 'CPU-Core (SSE2)': 625.2 PMKs/s (RTT 3.1) #6: 'CPU-Core (SSE2)': 634.1 PMKs/s (RTT 3.0) #7: 'CPU-Core (SSE2)': 619.5 PMKs/s (RTT 3.0) #8: 'CPU-Core (SSE2)': 654.0 PMKs/s (RTT 3.0) \o/ Without your patch, only CUDA is available for my GTX 480 I have combo upgraded to 11C73 today, let's see if your hack still works : EDIT: Before Hack, no more OpenCL for my GTX 480 : imac-de-thireus:Desktop thireus$ pyrit list_cores Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url] This code is distributed under the GNU General Public License v3+ The following cores seem available... #1: 'CUDA-Device #1 'GeForce GTX 480'' #2: 'OpenCL-Device 'ATI Radeon Barts XT Prototype'' #3: 'CPU-Core (SSE2)' #4: 'CPU-Core (SSE2)' #5: 'CPU-Core (SSE2)' #6: 'CPU-Core (SSE2)' #7: 'CPU-Core (SSE2)' #8: 'CPU-Core (SSE2)' [OpenCL-only Context] 2 OpenCL devices found! [Device 0] Name: Intel® Core(tm) i7-2600K CPU @ 3.40GHz Vendor: Intel Type: CPU Device Version: OpenCL 1.1 Driver Version: 1.1 Compute Units: 8 Work Group Size: 1024 Clock: 3411 MHz Global Memory (Total): 8192 MB Global Memory (Host): 8192 MB Global Memory (PCIe): 0 MB Local Memory: 32 KB Cache Size: 0.0625 KB Cache Line Size: 8388608 Bytes Available: Yes Double-Precision: Yes Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats [Device 1] Name: ATI Radeon Barts XT Prototype Vendor: AMD Type: GPU Device Version: OpenCL 1.1 Driver Version: 1.0 Compute Units: 14 Work Group Size: 1024 Clock: 970 MHz Global Memory: 512 MB Local Memory: 32 KB Cache Size: 0 KB Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes Let's hack this stuff... EDIT : Back after patching imac-de-thireus:~ thireus$ pyrit list_cores Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url] This code is distributed under the GNU General Public License v3+ The following cores seem available... #1: 'CUDA-Device #1 'GeForce GTX 480'' #2: 'OpenCL-Device 'ATI Radeon Barts XT Prototype'' #3: 'OpenCL-Device 'GeForce GTX 480'' #4: 'CPU-Core (SSE2)' #5: 'CPU-Core (SSE2)' #6: 'CPU-Core (SSE2)' #7: 'CPU-Core (SSE2)' #8: 'CPU-Core (SSE2)' [OpenCL-only Context] 3 OpenCL devices found! [Device 0] Name: Intel® Core(tm) i7-2600K CPU @ 3.40GHz Vendor: Intel Type: CPU Device Version: OpenCL 1.1 Driver Version: 1.1 Compute Units: 8 Work Group Size: 1024 Clock: 3411 MHz Global Memory (Total): 8192 MB Global Memory (Host): 8192 MB Global Memory (PCIe): 0 MB Local Memory: 32 KB Cache Size: 0.0625 KB Cache Line Size: 8388608 Bytes Available: Yes Double-Precision: Yes Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats [Device 1] Name: ATI Radeon Barts XT Prototype Vendor: AMD Type: GPU Device Version: OpenCL 1.1 Driver Version: 1.0 Compute Units: 14 Work Group Size: 1024 Clock: 970 MHz Global Memory: 512 MB Local Memory: 32 KB Cache Size: 0 KB Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes [Device 2] Name: GeForce GTX 480 Vendor: NVIDIA Type: GPU Device Version: OpenCL 1.0 Driver Version: CLH 1.0 Compute Units: 60 Work Group Size: 1024 Clock: 0 MHz Global Memory: 1536 MB Local Memory: 48 KB Cache Size: 0 KB Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops So do you have an explanation why I need your patch? Also, can you tell me what's the latest version of OpenCL that should be detected for both GPUs ? I don't understand what "sm1.3" stands for... SM = ? And I don't understand why sm2.0 patch doesn't work for my GTX 480 Little video about Galaxies benchmark : http://thireus.dareyourmind.net/OpenCL_GAL...4_VSYNC_OFF.zip Link to comment Share on other sites More sharing options...
cmf Posted October 5, 2011 Author Share Posted October 5, 2011 For my config it is, without this patch I don't have OpenCL working for my GTX 480. It might be related to the fact I have two GPUs on my HackinTosh: ATI HD6870 + NVidia GTX 480 So do you have an explanation why I need your patch? Also, can you tell me what's the latest version of OpenCL that should be detected for both GPUs ? I don't understand what "sm1.3" stands for... SM = ? And I don't understand why sm2.0 patch doesn't work for my GTX 480 k, this is weird and interesting. but yes, it is probably because you have an ati card installed as your primary card. two things you could try: 1) swap the cards, so the nvidia card is your primary card (and then try again with and without the sm 2.0 fix) 2) as i mentioned in an earlier post, type this in terminal: echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile concerning sm/cc: http://developer.nvidia.com/cuda-gpus aka "what your gpu is capable of" (e.g. double precision fp, local memory atomics, unified addressing) sm/cc 1.x will give opencl device version 1.0, 2.x will give you opencl 1.1. Link to comment Share on other sites More sharing options...
Recommended Posts