Jump to content

New Driver for Realtek RTL8111


Mieze
1,593 posts in this topic

Recommended Posts

It is Open Solaris variant (NEXENTA) with Intel NIC. It is running full-duplex from Windows (120MBs both ways) and real Macs. I have (RahabMan) comparable speed readings with iMacs/MBP/mini (Marvell Yukon Gigabit Adapter 88E8053).

 

Yeah, the Intel NICs seem to be problematic with SMB in some configurations. I guess it's some kind of NAS? Why do you use SMB instead of AFP as Netatalk is available for this platform and virtual any good NAS comes with it?

 

Mieze

Link to comment
Share on other sites

Yeah, the Intel NICs seem to be problematic with SMB in some configurations. I guess it's some kind of NAS? Why do you use SMB instead of AFP as Netatalk is available for this platform and virtual any good NAS comes with it?

 

Mieze

I use NFS on Macs and SMB on Windows. I tested driver you provided to see if it makes any difference in SMB performance with your latest research. Intel NICs are reference type in my view. Although on Macs it may be different story.

Link to comment
Share on other sites

I use NFS on Macs and SMB on Windows. I tested driver you provided to see if it makes any difference in SMB performance with your latest research. Intel NICs are reference type in my view. Although on Macs it may be different story.

 

Intel NICs have a very rigid interrupt mitigation logic which does not get along with Apple's SMB implementation.

 

Mieze

Link to comment
Share on other sites

Hey RehabMan,

 

please try this! While doing some research on the interrupt rate, I had an idea how to improve performance without causing much additional CPU load.

 

Mieze

I'll try to look at this later this week and provide some feedback.

 

Interesting though (I looked at the diff)... processing rxInterrupt even though related rx status bits in IntrStatus are not set?!

Link to comment
Share on other sites

I'll try to look at this later this week and provide some feedback.

 

Interesting though (I looked at the diff)... processing rxInterrupt even though related rx status bits in IntrStatus are not set?!

 

Basically it boils down to reverting the effect of interrupt mitigation while operating with moderate load as there are much more transmitter than receiver interrupts. If the rx interrupt bit isn't set this doesn't necessarily mean that no packets have been received yet and checking the receiver ring once is cheap as we are already in the interrupt routine. Instead of periodically calculating the average interrupt rate I'm now working with a modified algorithm measuring the time interval between interrupts with kFastIntrTreshhold set to 200µs.

void RTL8111::interruptOccurred(OSObject *client, IOInterruptEventSource *src, int count)
{
    UInt64 time, abstime;
	UInt16 status;
    UInt16 rxMask;
    
	WriteReg16(IntrMask, 0x0000);
    status = ReadReg16(IntrStatus);
    
    /* hotplug/major error/no more work/shared irq */
    if ((status == 0xFFFF) || !status)
        goto done;
    
    /* Calculate time since last interrupt. */
    clock_get_uptime(&abstime);
    absolutetime_to_nanoseconds(abstime, &time);
    rxMask = ((time - lastIntrTime) < kFastIntrTreshhold) ? (RxOK | RxDescUnavail | RxFIFOOver) : (RxOK | RxDescUnavail | RxFIFOOver | TxOK);
    lastIntrTime = time;
    
    if (status & SYSErr)
        pciErrorInterrupt();
    
    /* Rx interrupt */
    if (status & rxMask)
        rxInterrupt();

    /* Tx interrupt */
    if (status & (TxOK | TxErr | TxDescUnavail))
        txInterrupt();
    
    if (status & LinkChg)
        checkLinkStatus();
    
    /* Check if a statistics dump has been completed. */
    if (needsUpdate && !(ReadReg32(CounterAddrLow) & CounterDump))
        updateStatitics();
    
done:
    WriteReg16(IntrStatus, status);
	WriteReg16(IntrMask, intrMask);
}

Maybe this will open up a new perspective with adaptive receiver interrupt mitigation, with low values like 0x51 for moderate load and 0x58 for heavy load. I don't know if this is possible but it might be worth a try. I was able to get good performance with this version while communicating with the 2006 MacBook Pro (Marvell Yukon) running Win XP without changing it's maximum interrupt rate from 5000 to 10000 as I had to do with version 1.1.1 for a reasonable performance.

 

By the way the inspiration for the change comes from the documentation of the Broadcom BCM57785 which has a quite sophisticated interrupt mitigation logic.

 

Mieze

Link to comment
Share on other sites

Basically it boils down to reverting the effect of interrupt mitigation while operating with moderate load as there are much more transmitter than receiver interrupts. If the rx interrupt bit isn't set this doesn't necessarily mean that no packets have been received yet and checking the receiver ring once is cheap as we are already in the interrupt routine. Instead of periodically calculating the average interrupt rate I'm now working with a modified algorithm measuring the time interval between interrupts with kFastIntrTreshhold set to 200µs.

void RTL8111::interruptOccurred(OSObject *client, IOInterruptEventSource *src, int count)
{
    UInt64 time, abstime;
	UInt16 status;
    UInt16 rxMask;
    
	WriteReg16(IntrMask, 0x0000);
    status = ReadReg16(IntrStatus);
    
    /* hotplug/major error/no more work/shared irq */
    if ((status == 0xFFFF) || !status)
        goto done;
    
    /* Calculate time since last interrupt. */
    clock_get_uptime(&abstime);
    absolutetime_to_nanoseconds(abstime, &time);
    rxMask = ((time - lastIntrTime) < kFastIntrTreshhold) ? (RxOK | RxDescUnavail | RxFIFOOver) : (RxOK | RxDescUnavail | RxFIFOOver | TxOK);
    lastIntrTime = time;
    
    if (status & SYSErr)
        pciErrorInterrupt();
    
    /* Rx interrupt */
    if (status & rxMask)
        rxInterrupt();

    /* Tx interrupt */
    if (status & (TxOK | TxErr | TxDescUnavail))
        txInterrupt();
    
    if (status & LinkChg)
        checkLinkStatus();
    
    /* Check if a statistics dump has been completed. */
    if (needsUpdate && !(ReadReg32(CounterAddrLow) & CounterDump))
        updateStatitics();
    
done:
    WriteReg16(IntrStatus, status);
	WriteReg16(IntrMask, intrMask);
}

Maybe this will open up a new perspective with adaptive receiver interrupt mitigation, with low values like 0x51 for moderate load and 0x58 for heavy load. I don't know if this is possible but it might be worth a try. I was able to get good performance with this version while communicating with the 2006 MacBook Pro (Marvell Yukon) running Win XP without changing it's maximum interrupt rate from 5000 to 10000 as I had to do with version 1.1.1 for a reasonable performance.

 

By the way the inspiration for the change comes from the documentation of the Broadcom BCM57785 which has a quite sophisticated interrupt mitigation logic.

 

Mieze

>> If the rx interrupt bit isn't set this doesn't necessarily mean that no packets have been received yet and checking the receiver ring once is cheap as we are already in the interrupt routine.

 

Yes, I was thinking why not just check for rx packets for every interrupt...  It might be cheaper than trying to "calculate whether it should be checked"

 

I'll play with when I get a free afternoon...

Link to comment
Share on other sites

>> If the rx interrupt bit isn't set this doesn't necessarily mean that no packets have been received yet and checking the receiver ring once is cheap as we are already in the interrupt routine.

 

Yes, I was thinking why not just check for rx packets for every interrupt...  It might be cheaper than trying to "calculate whether it should be checked"

 

As long as there are only a few thousand packets per seconds this is no problem, it even makes the the system more responsive, but under heavy load you'll might get more then 80000 packets per second. Feeding them into the network stack in small groups of 2 or 3 has a devastating impact on CPU load. You have to handle them in a batch in order to keep CPU usage under control.

 

My results indicate that the units of the interrupt mitigate value are not the number of packets but the number of descriptors processed. While receive buffers are contiguous, which means that there is exactly one descriptor per packet, transmit buffers are fragmented so that each packet usually requires 2 or more descriptors with up to more than 30 for large TSO operations. On my machine this results in more than 15000 transmitter interrupts when it is sending at full speed even with maximun interrupt mitigation value of 0xff. On the other hand the receiver interrupt rate never exceeds 10000 which is an acceptable value. Despite the context switch the transmitter interrupts have a surprisingly low influence on the CPU load.

 

Unfortunately receiver interrupt handling is not that cheap in case there are actually packets to handle because they might trigger a series of processes. While you have to handle 1000 or 2000 packets per second there is no significant increase in CPU usage, when you handle them individually instead of doing batch processing but it will even make the system react faster. A good example for this scenario is listing a large share with ls -lR in Terminal. Compared to a scenario where you transfer a large file with a throughput of 110MB/s the situation is completely different. Doing batch receives the CPU usage stays down at 20-30% instead of more than 50% without receiver interrupt mitigation.

 

Mieze

Link to comment
Share on other sites

Hello RehabMan,

 

here are my latest sources. I ran some SMB tests this afternoon with the 2006 MacBook Pro (WinXP SP3, Marvell Yukon) and was able to copy a 2GB file to/from the OS X server with the Realtek NIC in less than 45 seconds (throughput ~45MB/sec). Although this might not seem impressive it's exactly want was to expect taking the fact into account that the Bootcamp partition, which is located at the end of the harddisk, is not able to deliver a higher data rate. Therefore it might be as well possible that the disk and not the Network is the limiting factor.

 

Comparing these results with the initial test results I got when I did the first SMB benchmarks with this configuration 6 weeks ago, SMB performance has improved dramatically. It went up from ~5MB/s to 45MB/s stable. I'm not sure if these results are reproducible with an Intel NIC on the Windows side but I'm quite confident that we are on the right track.

 

Besides the small modification of the interrupt service routine I changed the interrupt mitigate value to 0xff68 which reduced the interrupt rate under heavy load by 25% and, as a bonus, results in a few percent less CPU usage.

 

Mieze

RealtekRTL8111-RehabMan6.zip

  • Like 1
Link to comment
Share on other sites

Hello RehabMan,

 

here are my latest sources. I ran some SMB tests this afternoon with the 2006 MacBook Pro (WinXP SP3, Marvell Yukon) and was able to copy a 2GB file to/from the OS X server with the Realtek NIC in less than 45 seconds (throughput ~45MB/sec). Although this might not seem impressive it's exactly want was to expect taking the fact into account that the Bootcamp partition, which is located at the end of the harddisk, is not able to deliver a higher data rate. Therefore it might be as well possible that the disk and not the Network is the limiting factor.

 

Comparing these results with the initial test results I got when I did the first SMB benchmarks with this configuration 6 weeks ago, SMB performance has improved dramatically. It went up from ~5MB/s to 45MB/s stable. I'm not sure if these results are reproducible with an Intel NIC on the Windows side but I'm quite confident that we are on the right track.

 

Besides the small modification of the interrupt service routine I changed the interrupt mitigate value to 0xff68 which reduced the interrupt rate under heavy load by 25% and, as a bonus, results in a few percent less CPU usage.

 

Mieze

Hi Mieze,

 

After some testing, I found out that Finder is the real problem. It's very slow when browsing through shares.

I'm now using Path Finder (as trial) and see a lot of improvements. Browsing though folders/files is really 10x times faster.

 

Just search on "Finder slow smb" and you will see a lot of others suffering from the same issue, especially on ML.

 

Last question: what version should I use? The last one you posted above or the one from the download center? :)

Link to comment
Share on other sites

Hi Mieze,

 

After some testing, I found out that Finder is the real problem. It's very slow when browsing through shares.

I'm now using Path Finder (as trial) and see a lot of improvements. Browsing though folders/files is really 10x times faster.

 

Just search on "Finder slow smb" and you will see a lot of others suffering from the same issue, especially on ML.

 

Last question: what version should I use? The last one you posted above or the one from the download center? :)

 

Hello beta992,

 

please use the attached version which has been optimized not only to speed up SMB in certain configurations but also to deliver a better browsing speed. I tested it recursively listing large shares and browsing directories with dozens of pictures.

 

The trick is a small change in the interrupt service routine with regard to the strategy of handling received packets. When more than 200µs have been passed since the routine was called the last time, it assumes that there is not much load and handles the packet with less delay (but also less efficiently) which turned out to be better in this situation. I also changed the interrupt mitigate value to 0xcf68.

 

As I will make the attached code the official version 1.1.2, provided that nobody finds any serious bugs in it, everybody is encouraged to test it.

 

Mieze

RealtekRTL8111-V1.1.2-RC1.zip

  • Like 2
Link to comment
Share on other sites

  • 2 weeks later...

i have been refreshing this thread for almost a week for 1.1.2 binary .... Can't wait ^^

 

Before pushing a new version to github and updating the binary, the code usually has to pass a 24/7 real world test on my home server for at least 1-2 weeks. As of now I haven't received any bug reports nor did I experience any problems so that I will officially release version 1.1.2 next week.

 

Mieze

  • Like 1
Link to comment
Share on other sites

Hello beta992,

 

please use the attached version which has been optimized not only to speed up SMB in certain configurations but also to deliver a better browsing speed. I tested it recursively listing large shares and browsing directories with dozens of pictures.

 

The trick is a small change in the interrupt service routine with regard to the strategy of handling received packets. When more than 200µs have been passed since the routine was called the last time, it assumes that there is not much load and handles the packet with less delay (but also less efficiently) which turned out to be better in this situation. I also changed the interrupt mitigate value to 0xcf68.

 

As I will make the attached code the official version 1.1.2, provided that nobody finds any serious bugs in it, everybody is encouraged to test it.

 

Mieze

I finally had a chance to test this version.  I'm afraid the original performance problems on my ProBook are still present.

 

With your version (1.1.2):

Reads: getting 7-10MB/sec for large file copy from server to laptop.

Writes: getting 1-7MB/sec for large file copy from laptop to server.

 

With my version:

Reads: getting 60MB/sec for large file copy from server to laptop.

Writes: getting 25-40MB/sec for large file copy from laptop to server.

 

Nice try with the idea, however...

Link to comment
Share on other sites

With your version (1.1.2):

Reads: getting 7-10MB/sec for large file copy from server to laptop.

Writes: getting 1-7MB/sec for large file copy from laptop to server.

 

With my version:

Reads: getting 60MB/sec for large file copy from server to laptop.

Writes: getting 25-40MB/sec for large file copy from laptop to server.

 

 

Nevertheless thanks for the test run.

 

Mieze

Link to comment
Share on other sites

strange that this new driver appeared when my realtek rtl8111 stopped working for strange reasons. i can attest that it's not the port themselves or something else that has the problem because i'm posting this using the same system but running Win7.

Link to comment
Share on other sites

  • 2 weeks later...

I managed to adapt the driver to the new driver model introduced with 10.8 which supports QoS (packet scheduling) where the driver actively pulls packets from the transmit queue in order to send them out. Although I'm not sure if I got everything right, it seems to work. As of now I haven't discovered any problems. Performance is on a par with the traditional model, although the overhead of packet scheduling seems to make transmission a more CPU intensive task.

 

Polled receive works too but as there is a serious drop in network performance when polling is active I decided to disable this mode of operation in the attached code.

 

This is experimental code I'm posting as a proof of concept and an inspiration to other drivers programmers. In case you like to experiment with your system you might wan't to try it but if you want to have a reliable machine do not upgrade.

 

After installing the driver run ifconfig -v in Terminal and you'll see the difference. As always, feedback is welcome in particular with regard to the influence of packet scheduling on performance.

 

Good luck!

 

Mieze

 

RealtekRTL8111-NewArch.zip

  • Like 4
Link to comment
Share on other sites

Thanks for your great work Mieze. I'm definitely going to try the new driver.

 

I also found more info about SMB. For some reason OSX ML has it own SMB/Samba implementation. This means that Apple doesn't have a license (anymore) for the 'official' SMB-protocol that Microsoft created.

Their are some problems with Apple implementation:

- Doesn't support all share flags (force user/force group isn't working, files being created as group 'staff')

- Performance issues

- Not able to mount a SMB-share

etc.

 

Their are a lot of complaints about SMB in ML. Luckily you can install Samba yourself with brew. At the moment I'm testing this setup.

I also notice that OSX 10.9 will bring SMB improvements, hopefully some issues can be resolved.

 

I'm happy that currently I don't have any *mac that is running ML, but still 'good old' Lion in a company where SMB-shares are used. :)

 

The reason for using SMB is that it works on almost every OS, were AFP doesn't.

 

Well going to keep you up-to-date. If you/others find more or other info, please let me know. :)

  • Like 1
Link to comment
Share on other sites

ML Server uses a trick in order to improve SMB performance. I found this snippet in /etc/rc.server.firewall:

#
# Set TCP to ack every other packet. (RFC-compliant "compatibility" mode.)
# This should increase server performance, especially when connected
# to Windows clients.
#
sysctl -w net.inet.tcp.delayed_ack=2

The client version of ML uses 3 as the default value for delayed_ack. If you are experiencing bad performance with SMB you should give it a try.

 

Mieze

 

  • Like 2
Link to comment
Share on other sites

ML Server uses a trick in order to improve SMB performance. I found this snippet in /etc/rc.server.firewall:

#
# Set TCP to ack every other packet. (RFC-compliant "compatibility" mode.)
# This should increase server performance, especially when connected
# to Windows clients.
#
sysctl -w net.inet.tcp.delayed_ack=2

The client version of ML uses 3 as the default value for delayed_ack. If you are experiencing bad performance with SMB you should give it a try.

 

Mieze

Read this for more info.

  • Like 1
Link to comment
Share on other sites

Hi Mieze,

 

I'm a little bit confused, is WOL working in combination with OS X 10.8?y

 

On Windows 8 I can wake-up the PC without any problems, I just need to start and shutdown from Windows first.

When I try the same thing in OS X, nothing happens.

 

Do you have an idea?

 

Thanks so far. :)

Link to comment
Share on other sites

Hi Mieze,

 

I'm a little bit confused, is WOL working in combination with OS X 10.8?y

 

On Windows 8 I can wake-up the PC without any problems, I just need to start and shutdown from Windows first.

When I try the same thing in OS X, nothing happens.

 

Do you have an idea?

 

Thanks so far. :)

 

Yes it's working but not with all chipsets. At least with one version of the RTL8111C WoL isn't working as wastez reported some time ago. Unfortunately he didn't told me which chipset he is using and I can't reproduce the behavior with my RTL8111E (chipset 16).

 

Please take a look at the kernel messages and post the chipset. Do you have a linux installation on your Hackintosh too? In case you have, could you please download Realtek's r8168 driver from their homepage and test if WoL is working with it under linux? You can get the source code of the r8168 driver here: http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false

 

Mieze

  • Like 1
Link to comment
Share on other sites

×
×
  • Create New...