RehabMan Posted May 21, 2013 Share Posted May 21, 2013 Because they copy packets to/from a DMA buffer and let the network stack do most of the work... Mieze It would not surprise me if Apple's SMB implementation is fundamentally flawed. I'm sure the engineers tasked with implementing it went into it kicking and screaming (NIH syndrome). If I had a recent Mac here... (I'm actually considering it)... I'd certainly test with it. I need a light, small (13") laptop to travel with but still waiting to see what interesting laptops/convertibles Haswell may bring before I make a decision. Link to comment Share on other sites More sharing options...
Mieze Posted May 21, 2013 Author Share Posted May 21, 2013 It would not surprise me if Apple's SMB implementation is fundamentally flawed. I'm sure the engineers tasked with implementing it went into it kicking and screaming (NIH syndrome). Remember how much time the SAMBA project needed to develop a SMB implementation that is suitable for serious deployment. It would be a miracle if Apple managed to get the job done within a fraction of the this time. Mieze Link to comment Share on other sites More sharing options...
Mieze Posted May 22, 2013 Author Share Posted May 22, 2013 Hello RehabMan, I've got another idea. The problem might be related to the interrupt mitigation feature. In order to test you'll have to change in file RealtekRTL8111.cpp the line WriteReg16(IntrMitigate, 0x5f51); into WriteReg16(IntrMitigate, 0x0); Let's see if it works! I had this idea while peeking into the chip's configuration of the Win7 driver and it uses 0. Mieze Link to comment Share on other sites More sharing options...
RehabMan Posted May 23, 2013 Share Posted May 23, 2013 Hello RehabMan, I've got another idea. The problem might be related to the interrupt mitigation feature. In order to test you'll have to change in file RealtekRTL8111.cpp the line WriteReg16(IntrMitigate, 0x5f51); into WriteReg16(IntrMitigate, 0x0); Let's see if it works! I had this idea while peeking into the chip's configuration of the Win7 driver and it uses 0. Mieze I tried it. If anything, it makes things marginally slower. BTW, if I force link speed to 100mbit/sec with your driver, I get 7-8MB/sec. Does that give you any ideas? Link to comment Share on other sites More sharing options...
luke70 Posted May 23, 2013 Share Posted May 23, 2013 @Mieze Sorry for my bad english. Your kext work fine but i would ask you about my card reader because have the same pci bridge and don't work. (ethernet realtek 8186, card reader). Are they linked? Link to comment Share on other sites More sharing options...
Mieze Posted May 23, 2013 Author Share Posted May 23, 2013 @Mieze Sorry for my bad english. Your kext work fine but i would ask you about my card reader because have the same pci bridge and don't work. (ethernet realtek 8186, card reader). Are they linked? There is a version of the chip that includes a card reader but as these are logically separate devices each of them needs a driver of its own. Mieze Link to comment Share on other sites More sharing options...
Mieze Posted May 23, 2013 Author Share Posted May 23, 2013 I tried it. If anything, it makes things marginally slower. BTW, if I force link speed to 100mbit/sec with your driver, I get 7-8MB/sec. Does that give you any ideas? Not really, but it would support the timing hypothesis because Fast Ethernet lacks support for some advanced features like EEE, etc. but it's easy to reconcile with my results that putting additional load on the connection speeds up the transfer. It's a paradox but slowing things down seems to increase throughput as if the driver/NIC was too fast for the protocol stack? Combining all the information we collected so far, it looks like many factors play a role in this scenario making it very hard to find a solution, in particular because two crucial parts, Apple's SMB stack and the NIC itself, are black boxes. For the time being, I'm completely at a loss. Last night I also disassembled Realtek's AppleRTL8169Ethernet.kext and found out that there are five possible values for the interrupt mitigation register, 0x0000, 0x5050, 0x5151, 0xaf62 or 0xa462, but I was unable to determine which value is used for a particular chipset. Maybe I'll play with this when I find some time for experiments. By the way, please send me a complete set of kernel messages as this might give me some hints. Mieze Link to comment Share on other sites More sharing options...
Mieze Posted May 27, 2013 Author Share Posted May 27, 2013 Hello RehabMan, please try the attached version. As the r8169 reveals some information about the mysterious interrupt mitigate feature, I decided to play with this value a little bit. /* * Undocumented corner. Supposedly: * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets */ Because of the high CPU load of smbd I changed interrupt mitigate to 0xaf62, one of the values used by Realtek's own driver and this solved the problem for me. Now I'm getting good performance values with SMB in both directions, read (> 70MB/sec) and write (30-60MB/sec). AFP performance stays unchanged at ~110MB/sec in both directions. Good luck! Mieze RealtekRTL8111-RehabMan3.zip Link to comment Share on other sites More sharing options...
wastez Posted May 28, 2013 Share Posted May 28, 2013 What about Wake On Lan? Did anybody test it too? Link to comment Share on other sites More sharing options...
beta992 Posted May 28, 2013 Share Posted May 28, 2013 Hello RehabMan, please try the attached version. As the r8169 reveals some information about the mysterious interrupt mitigate feature, I decided to play with this value a little bit. /* * Undocumented corner. Supposedly: * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets */ Because of the high CPU load of smbd I changed interrupt mitigate to 0xaf62, one of the values used by Realtek's own driver and this solved the problem for me. Now I'm getting good performance values with SMB in both directions, read (> 70MB/sec) and write (30-60MB/sec). AFP performance stays unchanged at ~110MB/sec in both directions. Good luck! Mieze Hi Mieze, I'm also using SMB here. (Ubuntu Server > OS X) Would this driver will get better results with moving files from and to the server? I would also like to know if there's a tool to check transfer-speeds, using 1Gb/s with a (terrible) Gigabit-switch, on OS X. EDIT: The reason I ask is because I (may) have a different ethernet-controller. (RTL8168E/8111E) Thanks! Link to comment Share on other sites More sharing options...
Mieze Posted May 28, 2013 Author Share Posted May 28, 2013 Would this driver will get better results with moving files from and to the server? I would also like to know if there's a tool to check transfer-speeds, using 1Gb/s with a (terrible) Gigabit-switch, on OS X. EDIT: The reason I ask is because I (may) have a different ethernet-controller. (RTL8168E/8111E) It's hard to tell if you'll experience any performance problems at all. Certain configurations seem to work perfectly while others are extremely slow. As far as I can tell Apple's SMB implementation is quite sensitive about timing so that it's impossible to give a general rule. I'm currently experimenting with the interrupt mitigation value in order to find an optimal combination between throughput, system load and packet roundtrip time. For performance tests I use Blackmagic Disk Speed Test over an AFP or SMB connection and top in order to keep an eye on system load. Mieze 1 Link to comment Share on other sites More sharing options...
beta992 Posted May 28, 2013 Share Posted May 28, 2013 It's hard to tell if you'll experience any performance problems at all. Certain configurations seem to work perfectly while others are extremely slow. As far as I can tell Apple's SMB implementation is quite sensitive about timing so that it's impossible to give a general rule. I'm currently experimenting with the interrupt mitigation value in order to find an optimal combination between throughput, system load and packet roundtrip time. For performance tests I use Blackmagic Disk Speed Test over an AFP or SMB connection and top in order to keep an eye on system load. Mieze Thanks for your answer. I'm going to try the kext that you posted for RehabMan on my MB. Thanks for your work. Link to comment Share on other sites More sharing options...
Mieze Posted May 28, 2013 Author Share Posted May 28, 2013 I also tried 0xaf73 and 0xaf74 for interrupt mitigate and it seems that there is no optimal value for both protocols. While AFP performance is unaffected (~110MB/sec) by the value, SMB performance and the CPU load is very sensitive to changes. With 0xaf73 CPU load reaches its minimum while using AFP (< 20%). SMB performance is acceptable (30-55MB/sec write, 70-75MB/sec read). With 0xaf74 CPU load of AFP is slightly worse (< 30%) but SMB performance reaches its maximum with 65-75MB/sec in both directions. It seems that you'll have to make a choice between both values depending on your usage profile. I'm quite sure that I'll add a configuration parameter to adjust the interrupt mitigate value, but I still don't know if it will give the user full control or just leave him the choice between different preconfigured settings. Mieze Link to comment Share on other sites More sharing options...
beta992 Posted May 28, 2013 Share Posted May 28, 2013 I also tried 0xaf73 and 0xaf74 for interrupt mitigate and it seems that there is no optimal value for both protocols. While AFP performance is unaffected (~110MB/sec) by the value, SMB performance and the CPU load is very sensitive to changes. With 0xaf73 CPU load reaches its minimum while using AFP (< 20%). SMB performance is acceptable (30-55MB/sec write, 70-75MB/sec read). With 0xaf74 CPU load of AFP is slightly worse (< 30%) but SMB performance reaches its maximum with 65-75MB/sec in both directions. It seems that you'll have to make a choice between both values depending on your usage profile. I'm quite sure that I'll add a configuration parameter to adjust the interrupt mitigate value, but I still don't know if it will give the user full control or just leave him the choice between different preconfigured settings. Mieze I don't know how the SMB-protocol behaves inside Linux and/or Windows. If they show the same CPU-usage, than this could mean that SMB is just not that well designed. Maybe I'm just not that neutral because I do 'hate' most solutions that Microsoft created. If you want I can do tests here. Link to comment Share on other sites More sharing options...
Mieze Posted May 28, 2013 Author Share Posted May 28, 2013 I don't know how the SMB-protocol behaves inside Linux and/or Windows. If they show the same CPU-usage, than this could mean that SMB is just not that well designed. Maybe I'm just not that neutral because I do 'hate' most solutions that Microsoft created. I don't think that its a general weakness of the SMB protocol but who knows. Microsoft always tries to create a jack of all trades device losing sight of the key objectives. Well, it's more likely that Apple's home brewed implementation has been put together under pressure. If you want I can do tests here. Yes, please! I need as many feedback as I can get. You might also want to play with the interrupt mitigation value as I did. Mieze Link to comment Share on other sites More sharing options...
Mieze Posted May 28, 2013 Author Share Posted May 28, 2013 My latest test results indicate that the best value for the interrupt mitigate setting is 0xaf83. Compared with 0xaf74 CPU load with AFP is lower and SMB performance went up to > 70MB/sec in both directions which is on a par with Apple's Broadcom NIC. You are encouraged to use this value for test runs. Mieze What about Wake On Lan? Did anybody test it too? I know that WoL is working with most chipsets and successfully tested it myself with chipset 16 but the 8111C might be affected of a WoL quirk preventing it from recognizing the magic packet. I will look into this issue as soon as the SMB performance problem has been resolved. In case you are also using linux, you could get a copy of Realtek's r8168 driver and check if WoL is working on your system with linux. This would help me to narrow down the cause of the issue. The source is available here: http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false Mieze Link to comment Share on other sites More sharing options...
RehabMan Posted May 28, 2013 Share Posted May 28, 2013 Hello RehabMan, please try the attached version. As the r8169 reveals some information about the mysterious interrupt mitigate feature, I decided to play with this value a little bit. /* * Undocumented corner. Supposedly: * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets */ Because of the high CPU load of smbd I changed interrupt mitigate to 0xaf62, one of the values used by Realtek's own driver and this solved the problem for me. Now I'm getting good performance values with SMB in both directions, read (> 70MB/sec) and write (30-60MB/sec). AFP performance stays unchanged at ~110MB/sec in both directions. Good luck! Mieze Just thought I'd let you know I tried this and there wasn't any significant performance improvement. I'm going to do some tests with a fresh install ML both with your last 'released' driver and this one, just to see if I see any aggregate performance difference so far. I will also do some tests with BlackMagic, as up to now all my tests have been simply file copies from the server. As far as the bit definition of this 16-bit register, do you have a better description for TxTimer/TxPackets/RxTimer/RxPackets? If I had to guess this is some kind of parameters that determine when an interrupt is generated: RxTimer - max time to wait since the last packet was received before generating an interrupt? What are the units of this 4-bit value? RxPackets - maximum number of packets to buffer before generating an interrupt? TxTimer/TxPackets - not sure here.... why would the hardware wait on a tx interrupt (which tells the system when transmitting is done, right?) I think if I understood these params better, I might be able to experiment with different values... Link to comment Share on other sites More sharing options...
Mieze Posted May 28, 2013 Author Share Posted May 28, 2013 (edited) As far as the bit definition of this 16-bit register, do you have a better description for TxTimer/TxPackets/RxTimer/RxPackets? If I had to guess this is some kind of parameters that determine when an interrupt is generated: RxTimer - max time to wait since the last packet was received before generating an interrupt? What are the units of this 4-bit value? RxPackets - maximum number of packets to buffer before generating an interrupt? TxTimer/TxPackets - not sure here.... why would the hardware wait on a tx interrupt (which tells the system when transmitting is done, right?) I think if I understood these params better, I might be able to experiment with different values... No, I don't have any additional information about this register. All my thoughts are based on the idea that it won't be completely different than those of Intel and Broadcom NICs which are fully documented. Unless you have a very good contact to someone inside Realtek's development team, I guess they won't tell you more. My theory is that the completed transmission/reception of a packet triggers the timer and counter instead of generating an interrupt. The actual generation of the interrupt is delayed until one of the following conditions is met: The programmed delay time expires. The programmed number of additional packets has been transmitted/received. For example 0x5151 would mean for the receiver that the interrupt is delayed until 5 time units have expired or the number of additional packets specified by 1 (maybe 1 packet?) has been received whichever happens first. Mieze Edit: It would be helpful to get some results/experiences of other users with different setup as we could not rule out that half of the problem is located on the other communication endpoint which we don't control. Edit 2: Attached you'll find the latest sources which I'm using right now. RealtekRTL8111-RehabMan4.zip Edited May 29, 2013 by Mieze 1 Link to comment Share on other sites More sharing options...
RehabMan Posted May 31, 2013 Share Posted May 31, 2013 No, I don't have any additional information about this register. All my thoughts are based on the idea that it won't be completely different than those of Intel and Broadcom NICs which are fully documented. Unless you have a very good contact to someone inside Realtek's development team, I guess they won't tell you more. My theory is that the completed transmission/reception of a packet triggers the timer and counter instead of generating an interrupt. The actual generation of the interrupt is delayed until one of the following conditions is met: The programmed delay time expires. The programmed number of additional packets has been transmitted/received. For example 0x5151 would mean for the receiver that the interrupt is delayed until 5 time units have expired or the number of additional packets specified by 1 (maybe 1 packet?) has been received whichever happens first. That's what I thought/said. I still don't understand the tx side of it, but... Mieze Edit: It would be helpful to get some results/experiences of other users with different setup as we could not rule out that half of the problem is located on the other communication endpoint which we don't control. Edit 2: Attached you'll find the latest sources which I'm using right now. I've done some testing with this version and the results are inconclusive. There is a lot of randomness in the performance. Sometimes I will get 20-30MB/sec writes, always poor reads usually somewhere around 4-5MB/sec (which is better than the original driver, but not the order of magnitude+ we are looking for). I've played with different values for this register and haven't found any pattern yet. Sometimes I get the better performance and sometimes not. It does seem that when I boot and get the better (write) performance, it is there to stay for that session, but if I reboot, the next session might be the lower 4-5MB/sec writes. Like I said, there is no discernable pattern -- it seems random. I'm going to look at if there is a way to change the value on the fly from the command line (I already do this with my PS2 drivers, so I'm aware of the mechanism for user-mode -> kernel mode ioreg property transfer). Will post back when I have more time to test and provide data. Link to comment Share on other sites More sharing options...
beta992 Posted May 31, 2013 Share Posted May 31, 2013 Hi all, An question: Is it normal to have 'slow' file browsing in Finder (with SMB)? I Google't on it and I'm seeing that a lot of users have this problem, but my question if anyone here also had this or knows a solution. For the record: I will try some solution myself this weekend and will report the result. Thanks! Link to comment Share on other sites More sharing options...
Mieze Posted May 31, 2013 Author Share Posted May 31, 2013 Hi all, An question: Is it normal to have 'slow' file browsing in Finder (with SMB)? I Google't on it and I'm seeing that a lot of users have this problem, but my question if anyone here also had this or knows a solution. For the record: I will try some solution myself this weekend and will report the result. Thanks! Can you please describe your setup in detail? As there are 3 different SMB implementations (Microsoft, SAMBA and Apple) and numerous hardware platforms it's crucial to be able to relate problems with a particular setup. The user reports about SMB performance and my tests, which are mostly based on Macs talking to each other via SMB, seem to speak a common language: Apple's SMB implementation is far from being optimal. Mieze Link to comment Share on other sites More sharing options...
beta992 Posted May 31, 2013 Share Posted May 31, 2013 Can you please describe your setup in detail? As there are 3 different SMB implementations (Microsoft, SAMBA and Apple) and numerous hardware platforms it's crucial to be able to relate problems with a particular setup. The user reports about SMB performance and my tests, which are mostly based on Macs talking to each other via SMB, seem to speak a common language: Apple's SMB implementation is far from being optimal. Mieze NAS: Ubuntu Server 13.04 x64 8GB DDR3 RAM Asus C60M1-I Latest Samba package (3.*) Desk: See signature When browsing on my tablet (Android 4.2) or laptop (Arch Linux) browsing seems to be going fine (almost no delay). If I'm opening a share with a lot of folders it takes more than a minute on OS X. I haven't tested Windows yet, but somehow I'm thinking this should be related to Finder. But to be sure I need to run some speedtests, however I don't have any delay or buffering when opening (HD) movies on a SMB with VLC. Link to comment Share on other sites More sharing options...
Mieze Posted May 31, 2013 Author Share Posted May 31, 2013 (edited) NAS: Ubuntu Server 13.04 x64 8GB DDR3 RAM Asus C60M1-I Latest Samba package (3.*) Desk: See signature When browsing on my tablet (Android 4.2) or laptop (Arch Linux) browsing seems to be going fine (almost no delay). If I'm opening a share with a lot of folders it takes more than a minute on OS X. I haven't tested Windows yet, but somehow I'm thinking this should be related to Finder. But to be sure I need to run some speedtests, however I don't have any delay or buffering when opening (HD) movies on a SMB with VLC. I guess there isn't any delay when you list SMB shares in Terminal? Have you checked the log files for error messages? By the way, why don't you install Netatalk on your server? I have some experiences with it because I contributed a patch for Posix ACL support to the project two years ago and can confirm that speed and stability is quite good. Mieze Edited May 31, 2013 by Mieze Link to comment Share on other sites More sharing options...
Mieze Posted May 31, 2013 Author Share Posted May 31, 2013 That's what I thought/said. I still don't understand the tx side of it, but... I don't think the tx side is worth fiddling with because there's not much work to do in txInterrupt(). It only frees the mbufs associated with the packets which have been successfully transmitted. The higher the values the better for system load but it won't have much influence on the timing in particular when TSO is enabled. I've done some testing with this version and the results are inconclusive. There is a lot of randomness in the performance. Sometimes I will get 20-30MB/sec writes, always poor reads usually somewhere around 4-5MB/sec (which is better than the original driver, but not the order of magnitude+ we are looking for). I've played with different values for this register and haven't found any pattern yet. Sometimes I get the better performance and sometimes not. It does seem that when I boot and get the better (write) performance, it is there to stay for that session, but if I reboot, the next session might be the lower 4-5MB/sec writes. Like I said, there is no discernable pattern -- it seems random. This might be a symptom of a memory management issue which could have a devastating influence on performance while the system still keeps on working stable. Are there dropped packets due to resource shortages or any other error messages in the kernel log? What does netstat -m say? As kernel memory is always physical memory it's amount is strictly limited even on systems with a lot of RAM so that it might become easily exhausted. I still haven't seen any kernel logs from you. Mieze Here are some test results I got accessing my home server, the machine described in the signature with the Realtek NIC (RTL8111E-VL) using my driver, from an Atom D425 (1.8GHz, 2 GB RAM, Realtek RTL8111D) running Win7 (64bit). For the test I copied a 2GB file from the client to the server and back and measured the time to complete the operation: Write (to the server): 60sec to complete -> ~34MB/sec Read (from the server): 48sec to complete -> ~42MB/sec Mieze Link to comment Share on other sites More sharing options...
beta992 Posted June 1, 2013 Share Posted June 1, 2013 I guess there isn't any delay when you list SMB shares in Terminal? Have you checked the log files for error messages? By the way, why don't you install Netatalk on your server? I have some experiences with it because I contributed a patch for Posix ACL support to the project two years ago and can confirm that speed and stability is quite good. Mieze Hi Mieze, Thanks for the tips. I really need to do more investigation to know what the problem might be. The main reason I'm choosing for SMB instead of all other protocols (Netatalk/NFS, etc.) is that it works on all devices that are in my (home) network. Also it is the protocol I already have knowledge of how it works. (That doesn't mean I don't want to learn about others..) I'm still thinking of adding other sharing-protocols to my NAS, but I don't want the network to be 'loaded' with unneeded traffic/broadcasts. I also don't know what impact it has on my NAS, because.. well it is not the most powerful machine. Thanks again for your help & I will try to give you my results before the end of this week. Link to comment Share on other sites More sharing options...
Recommended Posts