wesux Posted September 17, 2009 Share Posted September 17, 2009 I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it. The 4870 isn't well supported for OpenCL, I've got a Mac Pro '06 and it performs almost the same as my CPU and most of the times it crashes. As Mitch had mentioned, we'd have to wait on the Apple ATI dev team to fix these problems. Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely. Link to comment Share on other sites More sharing options...
sch8mid Posted September 23, 2009 Share Posted September 23, 2009 Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely. right isnt FC still 32 bit , uses only 2GB max of memory ??? come on Randy Ubillos and the FCteam ... but some interesting news : AMD /ATI seems to be really dedicated to Open CL As of today AMD released a press info that the company is awaiting Open Cl certification from the Khronos Working Group . 8th of August ATI released a beta SDK for x86-basied CPUs (certified by Khronos September 3 th) and ATI Stream SDK v2.0 will be ready this year (project book + = source forge) as of today (09/23)we will see the new DirectX - 11 cards (RV 870) (support for Directcompute) From a technical point of view these new cards with 40nm seem to be lightyears in front of their Nvidia counterparts. We all know as well , that from a historical point of view, this was always the case in the last years but we obviously very often had to deal with a very weak ATI driver support too. Lets hope that Apple is aware of this new situation soon and give us some alternatives to the green camp. As a HTPC user , only red cards will find their way into my rig. Best as Link to comment Share on other sites More sharing options...
mitch_de Posted September 29, 2009 Author Share Posted September 29, 2009 New Galaxies OpenCL Bench V2: - Apple updated / fixed some OpenCL API usage (maybe help ATI) - little speed up (10% on my GT 9600) Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs. If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time. DL Links on 1 post Link to comment Share on other sites More sharing options...
tinush Posted September 29, 2009 Share Posted September 29, 2009 New Galaxies OpenCL Bench V2:- Apple updated / fixed some OpenCL API usage (maybe help ATI) - little speed up (10% on my GT 9600) Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs. If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time. DL Links on 1 post Very nice Mitch_de Results 32k v2.0 at 1680x1050 Sim Vector S-core cpu 10 G M-core cpu 18 G 9600gt gpu 149 G Hybrid M&G 34 G Thnx T. Link to comment Share on other sites More sharing options...
mitch_de Posted September 29, 2009 Author Share Posted September 29, 2009 Thanks ! Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving. Link to comment Share on other sites More sharing options...
lamer0 Posted September 30, 2009 Share Posted September 30, 2009 Mirror for galaxies, I hate rapidshare with a passion. http://victori.uploadbooth.com/osx86/galaxies-32k-v2.zip 32k version. q8200(quad core)/8600GT - 32gflogs 8600GT - 48gflops --- okay? how is the hybrid approach slower? Link to comment Share on other sites More sharing options...
mitch_de Posted September 30, 2009 Author Share Posted September 30, 2009 q8200(quad core)/8600GT - 32gflogs 8600GT - 48gflops --- okay? how is the hybrid approach slower? Hybrid is slower as GPU only (and sometimes also CPU only) because of much more syncing + data transfers time needed between CPU + GPU as with CPU alone or GPU alone. OpenCL bootleneck is the very slow PCIe Datatransfer, compared to CPU - Main Memory datatransferspeed. 2-5 GB/sec PCIe vs upto 50 GB/Sec CPU-L2/L3-Memory. GPU itself also has very fast memory access : up to 160 GB/sec. But getting thr data to gpu and reading it back from is the problem (on fast GPUs So PCIe bandwidth limits OpenCL (and CUDA) overallperformance benefit . Some tests shows that transferspeed to and from GPU may use 80% of overalltime ! So GPU computes very fast but the time to get data to and from gpu can be the bottleneck. For example an MacPro 2009 may get higher GigaFlops CPU only than with an GT120 GPU. Reason : GPU to slow + PCIe Transfertimes Same GPU on an lowend C2D System is much faster to the C2D CPU only. The PCIe transferspeed also is on "problem" for CoreImage. This is an reason why in the past, as CI was first used on AGP Macs, CI had bad peformance and got a bit "lost". AGP bandwith is ver, very bad in the direction from GPU to CPU - less than 250 MB/s. Other direction CPU>GPU (normal gaming way) up to 1 GB/s. So they made PCIe which was much better but i think, because of upcoming very fast 5870 + GT 300 in the next 2 years they need to update PCIe again to faster speed . Link to comment Share on other sites More sharing options...
tinush Posted September 30, 2009 Share Posted September 30, 2009 Thanks !Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving. not yet, will test this later just finished a perfect retail 100% working snow install (incl auto-sleep & keyboard/mouse wake) T. Link to comment Share on other sites More sharing options...
shoarthing Posted October 6, 2009 Share Posted October 6, 2009 Thanks !Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving. . . Rapidshare link for the 8K_V2 doesn't work. Edit: Sorry - link works fine - my ISP has just now started to block Rapidshare Link to comment Share on other sites More sharing options...
osssua Posted October 6, 2009 Share Posted October 6, 2009 . . Rapidshare link for the 8K_V2 doesn't work. (tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare) Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link. TIA Galaxies now work in ATI 4870 with 10.6.2 seed http://netkas.org/?p=240 Link to comment Share on other sites More sharing options...
shoarthing Posted October 7, 2009 Share Posted October 7, 2009 Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates] CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops. . . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini Link to comment Share on other sites More sharing options...
Schenkenberg Posted October 7, 2009 Share Posted October 7, 2009 Awesome post! Galaxies 32k running on 1900x1200 in SL 10.6.1 i7 920 @ 3.4GHz GTX 260 Vector Single Core CPU: 14 Vector Multi Core: 57 (that's what I call proper multi-core scaling!) GPU: 275! Came up from 180 with the 8k benchmark! CPU+GPU: 95 I really love the 260's performance. For that pricepoint (got for €140) it really shines. Link to comment Share on other sites More sharing options...
n00b32 Posted October 9, 2009 Share Posted October 9, 2009 Hi, I added a table for better comparison of the OpenCL benchmarks: http://wiki.osx86project.org/wiki/index.php/OpenCL What would be the best benchmark for evaluation of OpenCL performance? Galaxies? @mitch_de: could you provide in this standard benchmark a build number visible while benchmarking (better comparison)? Thanks Jason Link to comment Share on other sites More sharing options...
nvidia2008 Posted October 10, 2009 Share Posted October 10, 2009 Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates] CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops. . . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini MacBook 2.0GHZ Aluminium 4GB DDR3 RAM Galaxies 8K V2 GPU 9400M mode 20 Gflops/ 15 Updates Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement) Mac OS X 10.6.1 Link to comment Share on other sites More sharing options...
shoarthing Posted October 10, 2009 Share Posted October 10, 2009 MacBook 2.0GHZ Aluminium 4GB DDR3 RAM Galaxies 8K V2 GPU 9400M mode 20 Gflops/ 15 Updates Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement) Mac OS X 10.6.1 . . thank you *very* much for posting this: I knew the GPU & Shader clocks were supposed to be the same on the MCP7x variants; but nice to have it confirmed. Surprised the Macbook's DDR3 didn't make a solid difference tho' . . . . . Link to comment Share on other sites More sharing options...
cwestpha Posted October 12, 2009 Share Posted October 12, 2009 Hmm on my 2008 Mac Pro 2.8 Ghz 8-core with 285 GTX running the 32K galaxies V2 under 10.6.1: CPU: 11 multi core: 88 GPU: 329 hybrid: 123 Link to comment Share on other sites More sharing options...
n00b32 Posted October 13, 2009 Share Posted October 13, 2009 Hi, how come CUDA on Mac OS X? Where did you get these drivers from? All my results are with cuda drivers. Link to comment Share on other sites More sharing options...
shoarthing Posted October 13, 2009 Share Posted October 13, 2009 Hi, how come CUDA on Mac OS X? Where did you get these drivers from? . . . Nvidia downloads NB: v2.3x [the current one] 32-bit only AFAIK . . to get an idea of where this is at see the relevant section of the NV forums Link to comment Share on other sites More sharing options...
n00b32 Posted October 13, 2009 Share Posted October 13, 2009 thnx, didn't know that right now only the relevant app's aren't there, yet ;-) Link to comment Share on other sites More sharing options...
MarceloDub Posted October 30, 2009 Share Posted October 30, 2009 good Link to comment Share on other sites More sharing options...
byronrock Posted November 1, 2009 Share Posted November 1, 2009 Is there a way to run the other benchmark in a hackintosh? I just can run Galaxy, (by the way i get 30G with my Athlon x4 720) I want to run displacement but says "bad cpu type in executable" logout I have a 9400gt am i doing something wrong?? Link to comment Share on other sites More sharing options...
@ROBASEFR Posted November 11, 2009 Share Posted November 11, 2009 Hi I've got success with ATI HD4850 Gainward GS 512 under OSX 10.6.2 in my HAckintosh all test with 1920x1080x32x60hz LCD HD monitor Displacement:43 fps. Galaxies 32K V2.0 and 8k V2.0 did work ! when i toggle with the S key i get: 2;13;47;69;52 Gflops OpenCL Bench V 0.20 by mitch ....CL_DEVICE_NAME: Intel® Core i7 CPU 920 @ 2.67GHz ..... CL_DEVICE_VENDOR: Intel CL_DEVICE_MAX_CLOCK_FREQUENCY: 3096 MHz CL_DEVICE_MAX_COMPUTE_UNITS: 8 Now computing - please be patient.... time used: 9.933120 Number of elements computed: 2097152 ....CL_DEVICE_NAME: Radeon HD 4870 ..... CL_DEVICE_VENDOR: AMD CL_DEVICE_MAX_CLOCK_FREQUENCY: 750 MHz CL_DEVICE_MAX_COMPUTE_UNITS: 10 Now computing - please be patient.... time used: 16.656227 Number of elements computed: 2097152 Now checking if results are valid - please be patient.... Validate results test passed - GPU=CPU logout And: Transpose bandwith test Tests/Open\ CL/OpenCL\ Tranpose\ Bandwidhttest/transpose Performing Matrix Transpose [256 x 4096]... Bandwidth Achieved = 2.755923 GB/sec Results Validated! Link to comment Share on other sites More sharing options...
dudelolchris Posted November 13, 2009 Share Posted November 13, 2009 All the OpenCL demos crash on my brand new Late 2009 iMac with the ATi 4670 graphics. This makes me sad. Link to comment Share on other sites More sharing options...
computergek80 Posted November 16, 2009 Share Posted November 16, 2009 They crash for me too, 27" iMac Radeon HD 4670. ANyone know whats up? Link to comment Share on other sites More sharing options...
mitch_de Posted November 18, 2009 Author Share Posted November 18, 2009 Be pattient. Apple will for sure fix that problems with OpenCL until spring 2010. Even after 3+ months of 10.6 there is NO Application out which needs/uses OpenCL. Also Apple didnt use OpenCL in any of its own Apps (sure, it would be an failture if they had did that). Upcoming (Spring 2010++) newer versions of iTunes , iMovie, iDVD, FCP, Logic,.... will have OpenCl speedups! So all problems didnt hurt really, if only demos+benches wil not work on your gpu. I will update the benches soon with newer versions (updated Apple OpenCL demos). Link to comment Share on other sites More sharing options...
Recommended Posts