Jump to content

IntelMausiEthernet.kext for Intel onboard LAN


Mieze
1,015 posts in this topic

Recommended Posts

Just as a write this, I managed to reproduce the problem (still doing backups and testing simultaneously ... it must have found some unique data to send)

 

This time it happened with my DSDT patched to rid of GBES declarations/tests as you described.

 

So, as I predicted, the DSDT is not the problem. I'll keep looking, but given that it only happens in a relatively rare heavy traffic scenario, I'm not as worried about it.

 

The patch I developed was originally made for 9 series mainboards which also had a code sequence writing to the PCI power management register (see below) so that I'm not surprised that it didn't help you as there is no write access to the register.

        Device (GLAN)
        {
            Name (_ADR, 0x00190000)
            OperationRegion (GLBA, PCI_Config, Zero, 0x0100)
            Field (GLBA, AnyAcc, NoLock, Preserve)
            {
                DVID,   16, 
                        Offset (0xCC), 
                        Offset (0xCD), 
                PMEE,   1, 
                    ,   6, 
                PMES,   1
            }

            Method (_PRW, 0, NotSerialized)
            {
                Return (GPRW (0x0D, 0x04))
            }

            Method (_DSW, 3, NotSerialized)
            {
                Store (Arg0, PMEE)
            }

            Method (GPEH, 0, NotSerialized)
            {
                If (LEqual (DVID, 0xFFFF))
                {
                    Return (Zero)
                }

                If (LAnd (PMEE, PMES))
                {
                    Store (One, PWST)
                    Store (One, PMES)
                    Notify (GLAN, 0x02)
                }
            }
        } 

I've logged and analyzed dozens of these incidents and found nothing. I checked the NIC's config registers, the descriptors and the packets without finding any hint. There is no indication for a systematic error and no common ground for these transmitter deadlocks. Most of the logged packets where TSO operations, which is no wonder when you are moving large amounts of data with TCP, but I also found small ACKs and UDP datagrams. The packets as well as the descriptors were correct and matched each other. The fact that it only happens under load is just a consequence of the design: you can't have a transmitter deadlock without transmitter activity and you need transmitter activity in order to detect it.

 

The reason why I'm quite sure that there is something interfering is the fact that users reported the same issue with the Realtek and the Atheros drivers too. For example see http://www.insanelymac.com/forum/topic/300056-solution-for-qualcomm-atheros-ar816x-ar817x-and-killer-e220x/?p=2128204 and in those cases where the issue was resolved it turned out that it was related to power management or a wrong BIOS setting, e.g. external interference.

 

Mieze

Link to comment
Share on other sites

The patch I developed was originally made for 9 series mainboards which also had a code sequence writing to the PCI power management register (see below) so that I'm not surprised that it didn't help you as there is no write access to the register.

        Device (GLAN)
        {
            Name (_ADR, 0x00190000)
            OperationRegion (GLBA, PCI_Config, Zero, 0x0100)
            Field (GLBA, AnyAcc, NoLock, Preserve)
            {
                DVID,   16, 
                        Offset (0xCC), 
                        Offset (0xCD), 
                PMEE,   1, 
                    ,   6, 
                PMES,   1
            }

            Method (_PRW, 0, NotSerialized)
            {
                Return (GPRW (0x0D, 0x04))
            }

            Method (_DSW, 3, NotSerialized)
            {
                Store (Arg0, PMEE)
            }

            Method (GPEH, 0, NotSerialized)
            {
                If (LEqual (DVID, 0xFFFF))
                {
                    Return (Zero)
                }

                If (LAnd (PMEE, PMES))
                {
                    Store (One, PWST)
                    Store (One, PMES)
                    Notify (GLAN, 0x02)
                }
            }
        } 
I've logged and analyzed dozens of these incidents and found nothing. I checked the NIC's config registers, the descriptors and the packets without finding any hint. There is no indication for a systematic error and no common ground for these transmitter deadlocks. Most of the logged packets where TSO operations, which is no wonder when you are moving large amounts of data with TCP, but I also found small ACKs and UDP datagrams. The packets as well as the descriptors were correct and matched each other. The fact that it only happens under load is just a consequence of the design: you can't have a transmitter deadlock without transmitter activity and you need transmitter activity in order to detect it.

 

The reason why I'm quite sure that there is something interfering is the fact that users reported the same issue with the Realtek and the Atheros drivers too. For example see http://www.insanelymac.com/forum/topic/300056-solution-for-qualcomm-atheros-ar816x-ar817x-and-killer-e220x/?p=2128204 and in those cases where the issue was resolved it turned out that it was related to power management or a wrong BIOS setting, e.g. external interference.

 

Mieze

 

It would be nice if there was a way to fix the stalled transmitter without bringing down the link.

 

Possible?

Link to comment
Share on other sites

It would be nice if there was a way to fix the stalled transmitter without bringing down the link.

 

Possible?

 

In order to recover from this condition you need to reset the NIC and it will be difficult to achieve without loosing the link. Of course you don't need to tell the network stack about it but I doubt that this is a good idea because of the side effects. Trying to find the cause is a more promising approach from my point of view.

 

Mieze

Link to comment
Share on other sites

In order to recover from this condition you need to reset the NIC and it will be difficult to achieve without loosing the link. Of course you don't need to tell the network stack about it but I doubt that this is a good idea because of the side effects.

I thought of the same idea (delay reporting the link down condition, until it is clear it is not coming back up).

 

But didn't even bother because of the rarity of this problem and because doing so would require a relatively long delay (10 sec, maybe more). That said, it would be possible to restrict this delay only to the case of a forced reset due to deadlocked transmitter, which makes it a bit more acceptable. I think if the link didn't go down, the system would recover better from the problem.

 

But I understand your reluctance to hack around this problem...

Link to comment
Share on other sites

I thought of the same idea (delay reporting the link down condition, until it is clear it is not coming back up).

 

But didn't even bother because of the rarity of this problem and because doing so would require a relatively long delay (10 sec, maybe more). That said, it would be possible to restrict this delay only to the case of a forced reset due to deadlocked transmitter, which makes it a bit more acceptable. I think if the link didn't go down, the system would recover better from the problem.

 

But I understand your reluctance to hack around this problem...

 

As long as you don't know exactly what went wrong the only reliable method to restore full operation is a complete reset but once you located the cause of the problem it's usually easier to eliminate it instead of creating a workaround.

 

Mieze

Link to comment
Share on other sites

It appears I am having the same problem as RehabMan. Same error after extended high speed transfers to NAS storage device:

kernel[0]: Ethernet [IntelMausi]: Tx stalled? Resetting chipset. txDirtyDescIndex=796, STATUS=0x40080083, TCTL=0x3103f0fa.

This is the device I am using:

Intel 82579V PCI Express Gigabit Ethernet:

  Name:	Intel Ethernet Controller
  Type:	Ethernet Controller
  Bus:	PCI
  Slot:	Built In
  Vendor ID:	0x8086
  Device ID:	0x1503
  Subsystem Vendor ID:	0x1458
  Subsystem ID:	0xe000
  Revision ID:	0x0004
  BSD name:	en0
  Kext name:	IntelMausiEthernet.kext
  Location:	/System/Library/Extensions/FakeSMC.kext/Contents/PlugIns/IntelMausiEthernet.kext
  Version:	2.0.0

As seen in the iStat screenshot, the transfers can go for quite a while before failing, whereas other times it will fail in just a short period of time. I'm using rsync to copy over a lot of files, with an until loop to retry the command after 10 seconds until it exits cleanly. Directory-intensive transfers lower the overall throughput, to the effect that it rarely stalls, whereas large movie transfers see stalling more frequently. Another thing I have noticed is that if I create more "noise" in the traffic, for example browsing a lot of directories on the NAS share or opening up quick-look previews while the large transfer is going, it seems to crash more often.

Attached is my DSDT (which appears to not contain any special power-management calls like your previous post), a system log of all IntelMausi debug errors over a long period of time, and one log of all the system events surrounding one crash. There are no specific power management features that I can toggle in the BIOS, and I get crashes whether WOL is enabled or not. Ethernet configuration is set to basic full-duplex as recommended, however, even with EEE and flow-control the crashes persist, although flow-control appears to not be enabled/supported on my hardware.

en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 index 4
	eflags=8c0<ACCEPT_RTADV,TXSTART,ARPLL>
	options=6b<RXCSUM,TXCSUM,VLAN_HWTAGGING,TSO4,TSO6>
	ether 90:2b:34:XX:XX:XX 
	inet6 fe80::922b:34ff:XXXX:XXXX%en0 prefixlen 64 scopeid 0x4 
	inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255
	inet6 2002:4b6f:e5a4::922b:34ff:XXXX:XXXX prefixlen 64 autoconf 
	inet6 2002:4b6f:e5a4::a85c:e70a:223a:2590 prefixlen 64 autoconf temporary 
	nd6 options=1<PERFORMNUD>
	media: autoselect (1000baseT <full-duplex>)
	status: active
	type: Ethernet
	link quality: 100 (good)
	scheduler: QFQ 
	link rate: 1.00 Gbps

I appreciate all the work you have done on this and thank you for looking in to this problem.

Logs.zip

post-199409-0-99609800-1439109125_thumb.png

DSDT.aml.zip

Link to comment
Share on other sites

@RehabMan:  Intel's datasheets of the 82579 and the I217 contain the following advice with regard to the transmit descriptor handling policy.

post-983225-0-74133700-1439139065_thumb.png

The strange thing is that neither the Windows nor the Linux driver follow this advice but on my I217, which was affected of random tx deadlocks in early development versions of the driver too, setting TXDCTL=0 eliminated the problem. That's why I added this workaround in version 2.0.0d2 but as I don't have an 82579 to test on, I haven't been able to verify it on this NIC.

/**
 * intelConfigureTx - Configure Transmit Unit after Reset
 * @adapter: board private structure
 *
 * Configure the Tx unit of the MAC after a reset.
 **/
void IntelMausi::intelConfigureTx(struct e1000_adapter *adapter)
{
    struct e1000_hw *hw = &adapter->hw;
    UInt32 tctl, tarc;
    UInt32 txdctl;
    
    /* Setup the HW Tx Head and Tail descriptor pointers */
    intelInitTxRing();
    
    /* Set the Tx Interrupt Delay register */
    intelWriteMem32(E1000_TIDV, adapter->tx_int_delay);
    /* Tx irq moderation */
    intelWriteMem32(E1000_TADV, adapter->tx_abs_int_delay);
    
    txdctl = intelReadMem32(E1000_TXDCTL(0));

    if (chipType == board_pch_lpt) {
        txdctl = 0;
        intelWriteMem32(E1000_TXDCTL(0), txdctl);
    }
    /* erratum work around: set txdctl the same for both queues */
    intelWriteMem32(E1000_TXDCTL(1), txdctl);

    /* Program the Transmit Control Register */
    tctl = intelReadMem32(E1000_TCTL);
    tctl &= ~E1000_TCTL_CT;
    tctl |= E1000_TCTL_PSP | E1000_TCTL_RTLC | (E1000_COLLISION_THRESHOLD << E1000_CT_SHIFT);
    
    /* errata: program both queues to unweighted RR */
    if (adapter->flags & FLAG_TARC_SET_BIT_ZERO) {
        tarc = intelReadMem32(E1000_TARC(0));
        tarc |= 1;
        intelWriteMem32(E1000_TARC(0), tarc);
        tarc = intelReadMem32(E1000_TARC(1));
        tarc |= 1;
        intelWriteMem32(E1000_TARC(1), tarc);
   }
   intelWriteMem32(E1000_TCTL, tctl);

   hw->mac.ops.config_collision_dist(hw);
}

EDIT: Checking the source code again I discovered that the workaround I described above is only applied to the I217 and I218 while the 82579 still uses the default transmit descriptor handling policy. Please change

if (chipType == board_pch_lpt) {  

into

if ((chipType == board_pch_lpt) || (chipType == board_pch2lan)) { 

in order to apply it to the 82579 too. Please report back. In case of a positive result I will include the workaround in the next update. Good luck!

 

Mieze

Edited by Mieze
  • Like 3
Link to comment
Share on other sites

@RehabMan:  Intel's datasheets of the 82579 and the I217 contain the following advice with regard to the transmit descriptor handling policy.

attachicon.gifBildschirmfoto 2015-08-09 um 18.49.58.png

The strange thing is that neither the Windows nor the Linux driver follow this advice but on my I217, which was affected of random tx deadlocks in early development versions of the driver too, setting TXDCTL=0 eliminated the problem. That's why I added this workaround in version 2.0.0d2 but as I don't have an 82579 to test on, I haven't been able to verify it on this NIC.

/**
 * intelConfigureTx - Configure Transmit Unit after Reset
 * @adapter: board private structure
 *
 * Configure the Tx unit of the MAC after a reset.
 **/
void IntelMausi::intelConfigureTx(struct e1000_adapter *adapter)
{
    struct e1000_hw *hw = &adapter->hw;
    UInt32 tctl, tarc;
    UInt32 txdctl;
    
    /* Setup the HW Tx Head and Tail descriptor pointers */
    intelInitTxRing();
    
    /* Set the Tx Interrupt Delay register */
    intelWriteMem32(E1000_TIDV, adapter->tx_int_delay);
    /* Tx irq moderation */
    intelWriteMem32(E1000_TADV, adapter->tx_abs_int_delay);
    
    txdctl = intelReadMem32(E1000_TXDCTL(0));

    if (chipType == board_pch_lpt) {
        txdctl = 0;
        intelWriteMem32(E1000_TXDCTL(0), txdctl);
    }
    /* erratum work around: set txdctl the same for both queues */
    intelWriteMem32(E1000_TXDCTL(1), txdctl);

    /* Program the Transmit Control Register */
    tctl = intelReadMem32(E1000_TCTL);
    tctl &= ~E1000_TCTL_CT;
    tctl |= E1000_TCTL_PSP | E1000_TCTL_RTLC | (E1000_COLLISION_THRESHOLD << E1000_CT_SHIFT);
    
    /* errata: program both queues to unweighted RR */
    if (adapter->flags & FLAG_TARC_SET_BIT_ZERO) {
        tarc = intelReadMem32(E1000_TARC(0));
        tarc |= 1;
        intelWriteMem32(E1000_TARC(0), tarc);
        tarc = intelReadMem32(E1000_TARC(1));
        tarc |= 1;
        intelWriteMem32(E1000_TARC(1), tarc);
   }
   intelWriteMem32(E1000_TCTL, tctl);

   hw->mac.ops.config_collision_dist(hw);
}

EDIT: Checking the source code again I discovered that the workaround I described above is only applied to the I217 and I218 while the 82579 still uses the default transmit descriptor handling policy. Please change

if (chipType == board_pch_lpt) {  
into

if ((chipType == board_pch_lpt) || (chipType == board_pch2lan)) { 
in order to apply it to the 82579 too. Please report back. In case of a positive result I will include the workaround in the next update. Good luck!

 

Mieze

 

Thanks... I'll give it a try. No time to test for a while (due to the intermittent nature of the problem, it is very time consuming), but I'll let you know when I do.

Link to comment
Share on other sites

 

What do you mean by: 

  1. Call "Archive" from the menu "Product" and save the built driver.

 

The first clause should be clear. In order to save the driver select it in the Organizer, which is opened automatically, and click "Export". Now select "Save Built Products", click "Next" and select a directory where the products should be saved, for example the desktop, and confirm with "Export". Finally open the saved folder in Finder and you'll find a subdirectory hierarchy called "System/Library/Extensions" in it. There you will find your driver.

 

Mieze

  • Like 1
Link to comment
Share on other sites

The first clause should be clear. In order to save the driver select it in the Organizer, which is opened automatically, and click "Export". Now select "Save Built Products", click "Next" and select a directory where the products should be saved, for example the desktop, and confirm with "Export". Finally open the saved folder in Finder and you'll find a subdirectory hierarchy called "System/Library/Extensions" in it. There you will find your driver.

 

Mieze

Xcode has seriously brain damaged defaults when it comes to build products...

Not sure what they were thinking...

 

I set Xcode->Preferences->Locations->Advanced->Custom "Relative to Workspace".

This way you can find your build results in ./Build relative to the project instead of some far off place with a random garbage name.

Link to comment
Share on other sites

Woohoo! I complied with the changes you suggested above and it doesn't seem to be crashing! I copied a large file from one network share to another (maximally stressing both upload and download), as well as random I/O on the network as well. Usually, this always guarantees a crash, but it's been holding up this time!

However, the console appears to be flooded with 

Ethernet [IntelMausi]: replaceOrCopyPacket() failed.

and 

Ethernet [IntelMausi]: Not enough descriptors. Stalling.
Ethernet [IntelMausi]: Restart stalled queue!

Other than that it is--at least initially--working. I'll report back if I get any disconnects, but thanks for the fix!

 

I attached the binary for 10.10 for anyone who can't compile themselves.

Console.log.zip

IntelMausiEthernet.kext.zip

  • Like 1
Link to comment
Share on other sites

Oops, you are right. I don't use Xcode much, but I poked around a bit and got it to recompile as 2.0.0. And now... nooooo! Still failed like before, although it looks like it put up with more of a beating than the original V2 version used to take. Also, I might have posted prematurely on the v1.0.0 post, but I tested it hard for a good 30 mins without failure, whereas this one failed after about 10 minutes.

Console.log.zip

IntelMausiEthernetV2.kext.zip

Link to comment
Share on other sites

@The Edge3000: This is a hardware issue which is independent of the driver version. It affects version 1.0.x as well as 2.0.x. Please send me a complete ACPI dump of your machine (DSDT and SSDTs). 

 

Mieze

Edited by Mieze
Link to comment
Share on other sites

 @The Edge3000: Looks ok. I found nothing which might interfere. Let's see what results RehabMan will have.  :unsure:

 

EDIT: Are you sure you have disabled all of these items in the UEFI setup?

  • Network stack
    Disables or enables booting from the network to install a GPT format OS, such as installing the OS from the Windows Deployment Services server. (Default: Disable Link)

  • &  IPv6 PXE Boot Support
    Enables or disables IPv6 PXE Support. This item is configurable only when Network stack is enabled.

  • &  IPv4 PXE Boot Support
    Enables or disables IPv4 PXE Support. This item is configurable only when Network stack is enabled. 

  • LAN PXE Boot Option ROM

Allows you to decide whether to activate the boot ROM integrated with the onboard LAN chip. (Default: Disabled) 

 

Besides that, are you overclocking?

 

Mieze

Edited by Mieze
Link to comment
Share on other sites

Hey Mieze, sorry for the late response. Work got in the way, etc. 
 
To answer your question, both of those options are disabled. I am overclocking, but only by raising the CPU multiplier. I am not messing with any voltages or overclocking the BCLK.
 
I don't know if this would have any effect or not, but my BIOS mod (GA-Z77X-UD5H BIOS F16 mod11) per TweakTown forums includes the change "Intel GigabitLanX64 6.0.24 to 6.3.27"

Link to comment
Share on other sites

I don't know if this would have any effect or not, but my BIOS mod (GA-Z77X-UD5H BIOS F16 mod11) per TweakTown forums includes the change "Intel GigabitLanX64 6.0.24 to 6.3.27"

 

Frankly, I don't know. What does the change do?

 

Mieze

Link to comment
Share on other sites

Frankly, I don't know. What does the change do?

I don't know either. It's a ROM on the BIOS.. probably more has something to do with the boot over network logic than anything. And I really don't think it could affect a driver.

 

Both? There are 4 options you have to disable!

Since the other two are just subsets of network stack, there are two options set to disabled in my BIOS. They only even show up as options with network stack enabled.

Link to comment
Share on other sites

Since the other two are just subsets of network stack, there are two options set to disabled in my BIOS. They only even show up as options with network stack enabled.

No, this isn't enough as the link I attached suggests that you also should disable the sub options too.

 

Mieze

Link to comment
Share on other sites

×
×
  • Create New...