Jump to content

New Driver for Realtek RTL8111


Mieze
1,593 posts in this topic

Recommended Posts

Hello dmazar,

 

two of your not working register dumps show that there has already been a link change because the corresponding bit the the Interrupt Status Register is set. As the registers are dumped at a very early stage, in the start() routine, and enable() clears the register before enabling the interrupt it looks like there is a race condition between the establishment of the connection and the activation of interrupts.

 

Did you try calling checkLinkStatus() at the end of the enable() routine?

 

Mieze

 

Yes sure, with the Lnx2Mac kext it is working without problems.

Also tried to disable EEE without success.

 

It might be an inherited weakness of Realtek's linux driver the code is based on. I checked the WoL code again but found nothing special but as WoL is known to work on newer versions of the 8111 with my driver, I'm quite sure that there is no logical error in the implementation.

 

Mieze

Link to comment
Share on other sites

Mieze, sorry for not responding - was spending all free time on comparing your's and Slice's driver sources, since Slice's one is working fine on ProBook now. It seems that following line is causing me trouble with ProBook:

bool RTL8111::start(IOService *provider)
{
...
   setLinkStatus(kIONetworkLinkValid);
...
}

With this line removed, it seems all works fine now. But I had to test it for some more time to be 100% sure.

 

One question: you are using IOInterruptEventSource to trigger interruptOccurred() handler:

void RTL8111::interruptOccurred(OSObject *client, IOInterruptEventSource *src, int count)
{
...
WriteReg16(IntrMask, 0x0000);
   status = ReadReg16(IntrStatus);
...
   WriteReg16(IntrStatus, status);
WriteReg16(IntrMask, intrMask);
}

If I understood correctly, this IOInterruptEventSource is used to run kind of "second level" interrupt routine. Kernel's main interrupt routine will get the real interrupt and then just put it to IOInterruptEventSource queue , which will deliver it to interruptOccurred() as part of workLoop. Meaning: handling of interrupt will occur quickly, but not in "real time" ... handling will happen with some possible delay. Does it have sense to mask interrupts when doing it that way? It sounds reasonable to mask it if interruptOccurred() is called directly from main interrupt routine to handle interrupt right away when it happened. I guess this is how it works in Linux. But if using IOInterruptEventSource ... ?

Link to comment
Share on other sites

Tested your new version quickly:

- 3 times in a row (3 restarts) and it was working fine

- and then stopped working and I got the same symptoms as before - no interrupts and no link detected

 

This is in line with my previous tests. I tried already to put checkLinkStatus() to various places and it did not help. My link will comes up aprox. 2 seconds after enable() method and calling checkLinkStatus() before the link is up does not unblock "blocked interrupts". Calling checkLinkStatus() helps only if called when link is up - and that's why I had to put it to that watchdog timer action previously.

 

Anyway, if I

- remove setLinkStatus(kIONetworkLinkValid) from start() - that one does the trick here

- and remove checkLinkStatus(false) from enable() - because it's not needed

from your v 1.0.4, then it works fine.

 

I need to test this removal of setLinkStatus() more, but it looks to me that this does the trick. This one still did not fail so far. Worked on every start/restart.

Link to comment
Share on other sites

Tested your new version quickly:

- 3 times in a row (3 restarts) and it was working fine

- and then stopped working and I got the same symptoms as before - no interrupts and no link detected

 

This is in line with my previous tests. I tried already to put checkLinkStatus() to various places and it did not help. My link will comes up aprox. 2 seconds after enable() method and calling checkLinkStatus() before the link is up does not unblock "blocked interrupts". Calling checkLinkStatus() helps only if called when link is up - and that's why I had to put it to that watchdog timer action previously.

 

What happens when you pull the cable and replug it if the link isn't detected? Does it happen too when you boot with the cable disconnected? And on wakeup? I'm experiencing similar problems with my 2011 iMac (Broadcom NIC) on wakeup sometimes so that I'm not sure if this is a driver problem or something else? If RehabMan could confirm the issue things would be easy but as of now it might also be a bad NIC or anything else. :unsure:

 

For OS X shutdown, reboot and sleep are all the same with regard to network drivers. The NIC is put to sleep and will be reinitialized on boot or wakeup. There has been a small window during initialization where interrupts might have been missed but this is closed now by calling checkLinkStatus(false) at the end of enable(). Another point is that I don't see any connection between checkLinkStatus() and the fact that interrupts work after it has been called successfully one time.

 

Anyway, if I

- remove setLinkStatus(kIONetworkLinkValid) from start() - that one does the trick here

- and remove checkLinkStatus(false) from enable() - because it's not needed

from your v 1.0.4, then it works fine.

 

I need to test this removal of setLinkStatus() more, but it looks to me that this does the trick. This one still did not fail so far. Worked on every start/restart.

 

This is no solution because it will make the network stack think that the link is up, even if the cable is disconnected resulting in a bunch of dropped packets and confused users. :(

 

Mieze

Link to comment
Share on other sites

Unplug/plug the cable does not help. About wakeup: can not tell, since when in this particular mode (when interrupts are not comming) sleep results in immediate restart. So, no sleep/wake at all.

 

Maybe it's bad nic, but it's working fine in Windows, Linux and with Lynx2Mac's and Slice's driver. Ok, I'm not sure weather they report unplugged cable if it is not connected.

 

A quick test: moving setLinkStatus(kIONetworkLinkValid) from start() to enable(). Does this makes sense? It seems it's working fine here. 6 restarts and net is working here (interrupts are received). And if the cable is not connected, then it is reported as such.

Link to comment
Share on other sites

Unplug/plug the cable does not help. About wakeup: can not tell, since when in this particular mode (when interrupts are not comming) sleep results in immediate restart. So, no sleep/wake at all.

 

This suggests that there might be a general interrupt handling problem.

 

Maybe it's bad nic, but it's working fine in Windows, Linux and with Lynx2Mac's and Slice's driver. Ok, I'm not sure weather they report unplugged cable if it is not connected.

 

Linux and lnx2mac don't rely on the interrupt because the linux driver uses a timer routine to periodically check for link status changes (needed for B revision). lnx2mac waits for some seconds for the link to come up and then never checks again. The Windows driver is closed source and with regard to Slice it looks like he is doing it my way.

 

A quick test: moving setLinkStatus(kIONetworkLinkValid) from start() to enable(). Does this makes sense? It seems it's working fine here. 6 restarts and net is working here (interrupts are received). And if the cable is not connected, then it is reported as such.

 

No, it makes no sense because setLinkStatus(kIONetworkLinkValid) doesn't affect the hardware at all. It's only purpose is to inform upper layers about the current link state, i. e. in this case that the link is down. If it's working then because checkLinkStatus() does the job. I put it into start() just to make sure that you get correct link status information even in case ifconfig up gets never called on the interface. What if you leave it in start() and put another one into enable()?

 

Mieze

Link to comment
Share on other sites

Ok, tested again with setLinkStatus() in start and enable() - it does not work.

 

From my various test so far I can conclude that as soon as setLinkStatus() is called in start, this results in some error which manifests in interrupts not being delivered. If setLinkStatus() is called in enable(), then all is working fine. Why this happens on my ProBook and not on desktop, and not on RehabMan's ProBook, I do not know. Maybe my ProBook is too slow and net stack is not configured properly yet during start()? I even tried to call setLinkStatus() in start() through commandGate, but this did not helped either.

Is there any sense in calling setLinkStatus() while the whole net stack is not connected and configured properly? Is enable() the only place where it is safe to assume that net interface is configured and attached properly to the rest of the net layer?

 

Few more observations:

I've took a quick look at Apple's sample AppleUSBCDCEEM driver - this one has setLinkStatus() in enable() and not in start(). Although, I know that this does not have to mean anything.

 

Quick look at IONetworkingFamily sources: setLinkStatus() uses some locking: MEDIUM_LOCK -> IOTakeLock -> IOLockLock: Lock the mutex. If the lock is held by any thread, block waiting for its unlock. This function may block and so should not be called from interrupt level or while a spin lock is held. Locking the mutex recursively from one thread will result in deadlock.

Meaning: calling setLinkStatus() from interrupt routine should be done with caution, probably avoided (by calling through commandGate?).

 

All in all - I found the code that works for me:

- v 1.0.4

- commented out setLinkStatus() from start()

- changed checkLinkStatus():

} else {
 if (interrupt) {
	 /* Stop watchdog and statistics updates. */
	 timerSource->cancelTimeout();
	 setLinkDown();

	 if (tp->mcfg == CFG_METHOD_23) {
		 WriteReg32(ERIDR, 0x00000001);
		 WriteReg32(ERIAR, 0x8042f108);
	 }
 } else {
	 /* Called from enable() - just notify that the link is down. */
	 setLinkStatus(kIONetworkLinkValid);
 }
}

 

Since you made the code available, it's not a problem for me to modify it as above for my personal use and compile it here.

 

EDIT: just to mention it once again: I do not need checkLinkStatus() to be called in enable() at all. All I need there is setLinkStatus(kIONetworkLinkValid) - and all works fine.

Link to comment
Share on other sites

EDIT: just to mention it once again: I do not need checkLinkStatus() to be called in enable() at all. All I need there is setLinkStatus(kIONetworkLinkValid) - and all works fine.

 

Ok, I will remove setLinkStatus(kIONetworkLinkValid) from start() and add it to enable(). This is the easiest solution and it doesn't require any dirty tricks.

 

Mieze

Link to comment
Share on other sites

I just pushed version 1.0.4 with the change suggested by dmazar to github. The binaries will be updated this evening too.

 

By the way: I'm currently trying to figure out how tx checksum in combination with IPv6 works. As there is no documentation available on that topic, I totally depend on using forensic tools in order to find out how the the Win7 driver is doing it. So far I managed to implement TCP checksum offload under IPv6 and successfully ran a first test with iperf which shows that I'm on the right way but there is still a long way to go and I could need some help in testing.

 

Is there anybody out there who is seriously using IPv6 and is willing to run some tests?

 

Mieze

Link to comment
Share on other sites

Hi Mieze,

 

I have a Realtek 8111E so your driver should work OK.

 

But what is the difference between your's and the one from Slice? Do you both using the same (Realtek?) sources and modify/improve them?

 

Thanks for your work. :)

 

PS. Why do all the woman love cats? They 'run away' and you will not see them for days..

Link to comment
Share on other sites

But what is the difference between your's and the one from Slice? Do you both using the same (Realtek?) sources and modify/improve them?

 

Slice took the R1000 driver as a starting point while I started from scratch.

 

PS. Why do all the woman love cats? They 'run away' and you will not see them for days..

 

Basically it's the same thing what men usually do. The only difference is that they call it business trip. :lol:

 

Mieze

  • Like 2
Link to comment
Share on other sites

Mieze, the connection on my RTL8111E gets dropped when I'm trying to copy/move a file in Parallels:

06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.
06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Restart stalled queue!
06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.

 

I'm using v1.0.4 compiled a few days ago.

Link to comment
Share on other sites

Mieze, the connection on my RTL8111E gets dropped when I'm trying to copy/move a file in Parallels:

06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.
06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Restart stalled queue!
06/05/13 11:23:42,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.

 

I'm using v1.0.4 compiled a few days ago.

 

Please report back with the complete set of kernel messages.

 

Mieze

Link to comment
Share on other sites

This is all I could find related to the ethernet (using the debug version):

06/05/13 11:44:07,000 kernel[0]: getFeatures() ===>
06/05/13 11:44:07,000 kernel[0]: getFeatures() <===
06/05/13 11:44:07,000 kernel[0]: createWorkLoop() ===>
06/05/13 11:44:07,000 kernel[0]: createWorkLoop() <===
06/05/13 11:44:07,000 kernel[0]: getWorkLoop() ===>
06/05/13 11:44:07,000 kernel[0]: getWorkLoop() <===
06/05/13 11:44:07,000 kernel[0]: createOutputQueue() ===>
06/05/13 11:44:07,000 kernel[0]: createOutputQueue() <===
06/05/13 11:44:07,000 kernel[0]: getPacketBufferConstraints() ===>
06/05/13 11:44:07,000 kernel[0]: getPacketBufferConstraints() <===
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: PCI power management capabilities: 0xffc3.
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: PME# from D3 (cold) supported.
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: PCIe link capabilities: 0x00077c11, link control: 0x0000.
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: EEE support enabled
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: RTL8168E-VL/8111E-VL: (Chipset 16) at 0xffffff80f4f16000, xx:xx:xx:xx:xx:xx
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: MSI interrupt index: 1
06/05/13 11:44:07,000 kernel[0]: newVendorString() ===>
06/05/13 11:44:07,000 kernel[0]: newVendorString() <===
06/05/13 11:44:07,000 kernel[0]: newModelString() ===>
06/05/13 11:44:07,000 kernel[0]: newModelString() <===
06/05/13 11:44:07,000 kernel[0]: getFeatures() ===>
06/05/13 11:44:07,000 kernel[0]: getFeatures() <===
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:07,000 kernel[0]: getHardwareAddress() ===>
06/05/13 11:44:07,000 kernel[0]: getHardwareAddress() <===
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: kIOEthernetWakeOnMagicPacket added to filters.
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:07,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:07,000 kernel[0]: registerWithPolicyMaker() ===>
06/05/13 11:44:07,000 kernel[0]: registerWithPolicyMaker() <===
06/05/13 11:44:07,000 kernel[0]: setPowerState() ===>
06/05/13 11:44:07,000 kernel[0]: Ethernet [RealtekRTL8111]: Already in power state 1.
06/05/13 11:44:07,000 kernel[0]: setPowerState() <===
06/05/13 11:44:07,000 kernel[0]: configureInterface() ===>
06/05/13 11:44:07,000 kernel[0]: configureInterface() <===
06/05/13 11:44:07,000 kernel[0]: RTL8111: Ethernet address xx:xx:xx:xx:xx:xx
06/05/13 11:44:07,000 kernel[0]: getFeatures() ===>
06/05/13 11:44:07,000 kernel[0]: getFeatures() <===
06/05/13 11:44:07,000 kernel[0]: getChecksumSupport() ===>
06/05/13 11:44:07,000 kernel[0]: getChecksumSupport() <===
06/05/13 11:44:07,000 kernel[0]: getChecksumSupport() ===>
06/05/13 11:44:07,000 kernel[0]: getChecksumSupport() <===
06/05/13 11:44:39,424 configd[18]: network changed: v4(en1:192.168.0.100) DNS Proxy SMB
06/05/13 11:44:39,000 kernel[0]: enable() ===>
06/05/13 11:44:39,000 kernel[0]: Ethernet [RealtekRTL8111]: No medium selected. Falling back to autonegotiation.
06/05/13 11:44:39,000 kernel[0]: selectMedium() ===>
06/05/13 11:44:39,000 kernel[0]: selectMedium() <===
06/05/13 11:44:39,000 kernel[0]: setOffset79() ===>
06/05/13 11:44:39,000 kernel[0]: setOffset79() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() <===
06/05/13 11:44:39,000 kernel[0]: enable() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() <===
06/05/13 11:44:39,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:39,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:39,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:39,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastMode() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:39,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:39,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:40,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:40,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:41,000 kernel[0]: Ethernet [RealtekRTL8111]: Link up on en0, 100-Megabit, Full-duplex, flow-control
06/05/13 11:44:41,000 kernel[0]: getPacketFilters() ===>
06/05/13 11:44:41,000 kernel[0]: getPacketFilters() <===
06/05/13 11:44:41,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:41,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:43,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:43,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:43,750 configd[18]: network changed: v4(en1:192.168.0.100, en0+:192.168.2.100) DNS* Proxy SMB
06/05/13 11:44:46,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:46,000 kernel[0]: setMulticastList() <===
06/05/13 11:44:46,000 kernel[0]: setMulticastList() ===>
06/05/13 11:44:46,000 kernel[0]: setMulticastList() <===
06/05/13 11:46:15,000 kernel[0]: setMulticastList() ===>
06/05/13 11:46:15,000 kernel[0]: setMulticastList() <===
06/05/13 11:46:19,000 kernel[0]: setPromiscuousMode() ===>
06/05/13 11:46:19,000 kernel[0]: Ethernet [RealtekRTL8111]: Promiscuous mode enabled.
06/05/13 11:46:19,000 kernel[0]: setPromiscuousMode() <===
06/05/13 11:46:19,000 kernel[0]: en0: promiscuous mode enable succeeded
06/05/13 11:48:42,000 kernel[0]: Ethernet [RealtekRTL8111]: checksums applied: 0x3, checksums valid: 0x0
06/05/13 11:49:42,000 kernel[0]: Ethernet [RealtekRTL8111]: checksums applied: 0x3, checksums valid: 0x0
06/05/13 11:50:05,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.
06/05/13 11:50:05,000 kernel[0]: Ethernet [RealtekRTL8111]: Restart stalled queue!
06/05/13 11:50:05,000 kernel[0]: Ethernet [RealtekRTL8111]: Not enough descriptors. Stalling.

Link to comment
Share on other sites

Which OS do you use in Parallels? Can you please give me a detailed description of the scenario which leads to this situation? What happens when you pull the plug and replug it when you get this situation?

 

Mieze

Link to comment
Share on other sites

It's windows 7 64bit. Let me explain a bit: I've got my internet connection through my wifi, and my LAN port is connected to a router with a windows 7 share. I can mount the shared drive in Finder and it works great, no disconnection problems so far. But the moment I try to copy or move a file in parallels then the connection is dropped instantly in both OSs. Then if I replug the cable it gets back up but it's dropped again the moment I try anything with files in windows. I can navigate folders just fine but the problem occurs when I move, edit or copy a file.

 

EDIT: oh, I'm using the "Bridge" mode in Parallels but it happens in "shared" mode too.

EDIT2: the same happens with Slice's driver, the only one fully working is Lnx2Mac's so far

Link to comment
Share on other sites

EDIT2: the same happens with Slice's driver, the only one fully working is Lnx2Mac's so far

 

If Slice's driver is affected too this rules out checksum and segmentation offload related issues. According to the log file the Windows side seems to flood the driver with packets it doesn't like. Although there is no deadlock the NIC doesn't seem to do any useful work once you started the transfer. Maybe you should use Wireshark to find out what's going on? As Lnx2mac is working there are two possible explanations for the behavior:

  • A firmware issue which would be really bad luck because there is no documentation at all (Lnx2mac uses older firmware than Slice and I do).
  • Jumbo frames because Slice and I we both do not support this feature. I would suggest to check this first.

Mieze

Link to comment
Share on other sites

Mieze,

Works good here on RTL8168E/8111E (Chipset 13). I will do some more testing using sleep/WOL etc and will report back.

Awesome job!

Thanks!

 

Update:

Sleep/AutoSleep/WOL work

WOD not tested

 

My full specs are here for anyone who is interested.

  • Like 1
Link to comment
Share on other sites

TCP/IPv6 Checksum Offload Support

 

This is an experimental version which includes for TCP/UDP checksum offload over IPv6. It has been successfully tested on a RTL8111E-VL (chipset 16) but there still might be bugs and it's far from being ready to be used on productive systems. Nevertheless any feedback is highly appreciated. In particular I'm interested in

 

Ethernet [RealtekRTL8111]: L4 header offset

 

messages. Here's a short explanation why. The IPv6 header has a fixed size and unlike IPv4 it doesn't have a header length filed anymore, but there can be extension headers making it hard to find the TCP/UDP protocol header in a packet. That's why the NIC has to be supplied with the offset of the TCP/UDP header from the start of the packet when offloading TCP/UDP checksum calculation for IPV6 packets. If there are no extension headers the value 54 will make every packet happy but in case there are, you'll have to scan them all in order to find the layer 4 protocol header because unlike Windows and Linux, OS X doesn't provide this information to the driver.

 

As the network stack doesn't offload checksum calculation for IPSEC packets, this facilitates the job a little bit, but as of now I haven't seen any packet with extension header so that I don't know if the algorithm works as expected. If you rely on IPv6 in your network, you might be the man for the job.

 

Good luck!

 

Mieze

 

PS: Is there really no one who is willing to try it?

RealtekRTL8111-CSO6-experimental.zip

Edited by Mieze
  • Like 2
Link to comment
Share on other sites

×
×
  • Create New...