Pene Posted October 26, 2018 Share Posted October 26, 2018 (edited) Hello again vit9696, How are you? I come to you with another issue related to AptioMemoryFix (latest rev): When it is used on boards with chipset Z390 (and according to some other reports, it seems that also with H370/B370/H310 chipsets) - problems appear: - it results in OSX kernel panic on reboot/shutdown (you can see some reports, for example, in this topic). - there also seems to be a problem with native NVRAM from within OSX, as a test var saved from OSX with 'nvram' command is NOT existing after reboot (Clover's last chosen OS is being saved though). I just replaced my board to an AsRock Z390 ITX/ac (I had before an AsRock Z370 ITX/ac, really similar boards), and these issues started appearing (everything was fine with AptioMemoryFix and Z370). Now, In my case, OSX starts fine with AptioMemoryFix, but when I reboot/shutdown, I don't even get to see the console screen with panic information, and the system ends up either hanging up or (uncleanly) rebooting (I use -v, and I do see the console screen with AptioMemoryFix when OSX loads). When I add EmuVariableUefi (toghether with AptioMemoryFix, although I know it is not the intended use), shutdown/reboot work properly, and I also see the -v output when shutting down or rebooting. I will gladly help with any further information or tests you may need. Thanks. Edited October 26, 2018 by Pene 3 Link to comment Share on other sites More sharing options...
Slice Posted October 26, 2018 Share Posted October 26, 2018 Hello Pene, This is known issue, see Link to comment Share on other sites More sharing options...
vit9696 Posted October 26, 2018 Author Share Posted October 26, 2018 (edited) Hi Pene, I am aware of this issue, but as you can imagine I do not have the hardware to research it. What happened here is another change in APTIO. In former versions there were 3 drivers to implement NVRAM: NvramDxe, NvramSmm, NvramSmi. In newer boards NvramSmi and NvramSmm became one driver. I believe we can try to research this to a certain level with a considerable amount of help from your side. Frankly said, I am not sure Hardware NVRAM works on these boards in any OS at all, so the first step would be to: 1) Test NVRAM support in UEFI Linux, I believe they have a tool for this 2) Test NVRAM support in UEFI Windows, you will have to write some tool that elevates the privileges and utilises GetFirmwareEnvironmentVariableA/SetFirmwareEnvironmentVariableA APIs, see this tool for an example. The "test" we want here is fairly simple. Firstly we want to try to write some variable to Apple Boot Variable GUID: 7C436110-AB2A-4BBB-A880-FE41995C9F8, then read it, reboot, and read again. If it all fails, we could try writing to Efi Global Variable GUID: 8BE4DF61-93CA-11D2-AA0D-00E098032B8C, and check if it at least some GUIDs are available for writing. Next, the kernel panic on reboot is what really worries me. We know that macOS writes stuff to NVRAM during the reboot, and I would like to know if disabling SetVariable/GetVariable, GetVariableNext "fixes" the problem with the reboot. Just like we debugged it previously, write jmp ASM_PFX(RtShimsReturnInvalidParameter) to the relevant shims here for a test: https://github.com/acidanthera/AptioFixPkg/blob/master/Platform/AptioMemoryFix/X64/AsmRtShims.nasm Thirdly, regarding the panics on reboot, I would like to get as much information as possible. Please apply the kernel patch mentioned here: https://applelife.ru/posts/686953, which disables kext list printing in the panic log, and try screening the kernel panic in -v keepsyms=1 debug=0x100 mode. We should hopefully see more data to stat to work with. @Slice, unfortunately this one is new. These guys never learn, and they managed to bork things once again. So it is neither the issue we researched a year ago, nor the whitelist. It is just another extension of the borked code. Vit Edited October 26, 2018 by vit9696 9 Link to comment Share on other sites More sharing options...
Pene Posted October 26, 2018 Share Posted October 26, 2018 (edited) 16 hours ago, vit9696 said: What happened here is another change in APTIO. In former versions there were 3 drivers to implement NVRAM: NvramDxe, NvramSmm, NvramSmi. In newer boards NvramSmi and NvramSmm became one driver. I believe we can try to research this to a certain level with a considerable amount of help from your side. Hi again, Thank you for your reply. Here are some results: NVRAM is writable from Windows. I used before a Windows tool for writing to the nvram (UEFIVAR). So I issued this command from Windows: uefivar -G:"7C436110-AB2A-4BBB-A880-FE41995C9F82" -N:"TestVarWin" -WHEX:"01020304" -A:"NV" EDIT: NVRAM is also writable from Linux: # sudo -s # printf "\x07\x00\x00\x00\x01\x02\x03\x04" > /sys/firmware/efi/efivars/TestVarLin-7c436110-ab2a-4bbb-a880-fe41995c9f82 And then, under OSX: $ nvram -p | grep TestVar TestVarWin %01%02%03%04 TestVarLin %01%02%03%04 For the second test, disabling only SetVariable in AptioMemoryFix solved the panic, and system shuts down properly. For the third test, I cannot post any result, because, as I mentioned, I do not get the console window on shutdown/restart when the panic occurs, so with the steps you mentioned, I just get a black screen forever, until I press the power button to turn off the PC. More tests are welcome P.S. Hi Slice But apparently as Vit mentioned we are "lucky" with a new Aptio issue. Hopefully solvable. Edited October 27, 2018 by Pene 2 Link to comment Share on other sites More sharing options...
vit9696 Posted October 26, 2018 Author Share Posted October 26, 2018 Well, with the random reboots there is no real issue indeed, just NVRAM variable saving once again corrupts our memory. I believe, if you try to write a lot of variables from macOS the operating system will eventually panic. That could have been somewhat usable if you were able to get the log dumps, but otherwise it is probably not beneficial. Basically what needs to be done here now is reverse-engineering NvramSmm/NvramDxe and comparing them against the previous versions, which we fortunately have sources for. I do not think I will have much time any soon for that, so if you have time and are able to use IDA Pro/Hex-Rays, help will be welcome here. Link to comment Share on other sites More sharing options...
Pene Posted October 26, 2018 Share Posted October 26, 2018 1 hour ago, vit9696 said: I believe, if you try to write a lot of variables from macOS the operating system will eventually panic. That could have been somewhat usable if you were able to get the log dumps, but otherwise it is probably not beneficial. Well, I didn't manage to make macOS panic by a lot of variable writes. But then again... does OSX save these vars to EFI in runtime or does it store internally and then call SetVariable only on reboot/shutdown? In any case, the lifetime of these added variables is just as long as the computer doesn't reboot. After reboot they are all gone. Link to comment Share on other sites More sharing options...
vit9696 Posted October 26, 2018 Author Share Posted October 26, 2018 It depends on the GUID you write to, from what I remember. If you write to apple guid, then they are applied on later onwards, otherwise immediately. That is how I got the idea of their NVRAM driver. Link to comment Share on other sites More sharing options...
Pene Posted October 26, 2018 Share Posted October 26, 2018 (edited) Mmmm... kernel panic did not occur when writing to any GUID (with SetVariable enabled, of course). Only on reboot/shutdown I get the panic. And when SetVariable is disabled, it seems I can still "write" to any GUID. All Info is present while in OSX, but gone after reboot. As a side note, NVRAM write can work with AptioMemoryFix when not in OSX. So, for example, if we return InvalidParameter only when we have a virtualized pointer, Clover's Last booted OS is saved to nvram, and OSX won't panic on reboot/shutdown: global ASM_PFX(RtShimSetVariable) ASM_PFX(RtShimSetVariable): ; For performance and simplicity do initial validation ourselves. test rcx, rcx jz ASM_PFX(RtShimsReturnInvalidParameter) ; VariableName is NULL test rdx, rdx jz ASM_PFX(RtShimsReturnInvalidParameter) ; VendorGuid is NULL .INITIAL_VALIDATION_OVER: ; Once boot.efi virtualizes the pointers we should protect read-only ; variables from writing. mov rax, qword [ASM_PFX(gGetVariableOverride)] test rax, rax jnz .SKIP_ACCESS_CHECK ; We have a virtualized pointer, so we also need to protect write-only ; variables from reading. Compare VendorGuid against gReadOnlyVariableGuid ; and return EFI_SECURITY_VIOLATION on equals. mov rax, qword [rdx] cmp qword [ASM_PFX(gReadOnlyVariableGuid)], rax ;jnz .SKIP_ACCESS_CHECK jnz ASM_PFX(RtShimsReturnInvalidParameter) mov rax, qword [rdx+8] cmp qword [ASM_PFX(gReadOnlyVariableGuid)+8], rax jz ASM_PFX(RtShimsReturnSecurityViolation) .SKIP_ACCESS_CHECK: mov rax, qword [ASM_PFX(gSetVariable)] jmp short FiveArgsShim Edited October 26, 2018 by Pene Link to comment Share on other sites More sharing options...
Slice Posted October 27, 2018 Share Posted October 27, 2018 NVRAM usually works in Clover time without any efforts. The problem was in macOS time. Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 (edited) yes, of course. I just mentioned it in the idea that the code in AptioMemroyFix basically works on the new Aptio. As with SetVariable completely disabled, Clover won't update its last booted OS, when booting OSX. Edited October 27, 2018 by Pene Link to comment Share on other sites More sharing options...
LockDown Posted October 27, 2018 Share Posted October 27, 2018 hi @Pene do you combined rc script with emuvariable? Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 (edited) ellaosx, the information is just for debugging purposes. Emuvariable is not being used, nor rc scripts. We don't have a proper solution yet. Edited October 27, 2018 by Pene Link to comment Share on other sites More sharing options...
LockDown Posted October 27, 2018 Share Posted October 27, 2018 @Pene i asked coz my 300-series board need emuvariable else issue on restart/shutdown too. i was wondering if i need rc script with it. Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 vit9696, I figured out that if I shutdown the system using 'sudo shutdown -h now' I do get to see the panic. So here is a photo: Link to comment Share on other sites More sharing options...
vit9696 Posted October 27, 2018 Author Share Posted October 27, 2018 Hmmm, a timeout… There could be some infinite loop in their code. But let's try to disable the timeout first. Add slto_us=0 to boot-args and tell me whether anything changed. Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 OSX won't start at all with slto_us=0. Panics on startup. Link to comment Share on other sites More sharing options...
vit9696 Posted October 27, 2018 Author Share Posted October 27, 2018 Well, try gradually lowering the value then. From slto_us=250000 and onwards, could divide by 2. Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 10000 is about the lowest that loads. And it still panics. And about the panic details - I guess it was mere luck that I managed to see it, as once again it freezes before I get to see it. By the way, did you notice in the previous screenshot the multiple vm_map_delete: .... nothing at <address>? Is that normal? Link to comment Share on other sites More sharing options...
vit9696 Posted October 27, 2018 Author Share Posted October 27, 2018 Well, the guard prints at vm_map_delete are most likely work in progress for improving the vm_map unexpected situation detection. It is very unlikely that it has anything to do to us. I believe without debug=0x100 you should be able to see the panic, but I suppose it could be the same. Link to comment Share on other sites More sharing options...
Erroruser Posted October 27, 2018 Share Posted October 27, 2018 i had an panic in first pic then i removed aptiomemory an added aptiofixdrv+emuvariable an seems to have gotten rid of it as im on a 370 board with i9 9900k added the terminal output an the report.rtf in hopes that maybe helpful Terminal-Saved-Output.zip report.rtf.zip Link to comment Share on other sites More sharing options...
vit9696 Posted October 27, 2018 Author Share Posted October 27, 2018 If you want to help us with the panic, please redo it properly as suggested above. Link to comment Share on other sites More sharing options...
Pene Posted October 27, 2018 Share Posted October 27, 2018 (edited) I found out that what allowed me to view the panic before was actually the nv_disable=1 argument. So I have now a reliable method to see panics. Here are some panics when using slto_us=10000: Edited October 27, 2018 by Pene Link to comment Share on other sites More sharing options...
vit9696 Posted October 28, 2018 Author Share Posted October 28, 2018 (edited) Good to have. Well, the second picture makes it very clear. XNU kernel invokes APTIO RuntimeServices SetVariable code, and then this code never returns. What we have in SetVariable is the following code coming from NvramDxe, I can tell that it did not change anyhow since the source leak, and the one in the source leaks are known to work. UINTN GetVariableNameSize(IN CONST CHAR16 *String, IN UINTN MaxSize){ CHAR16 *Str, *EndOfStr; ASSERT(String!=NULL); if (String==NULL) return 0; EndOfStr = (CHAR16*)((UINT8*)String + MaxSize); for(Str = String; Str < EndOfStr; Str++) if (!*Str) return (Str - String + 1)*sizeof(CHAR16); return MaxSize+1; } EFI_STATUS Communicate (UINTN MessageLength){ UINTN CommSize; UINT64 Control; EFI_STATUS Status; if (SmmCommProtocol==NULL) return EFI_UNSUPPORTED; if ( NvramSmmCommunicationBuffer == NULL || NvramSmmCommunicationBufferPhysicalAddress == NULL ) return EFI_OUT_OF_RESOURCES; if (MessageLength > MaxMessageLength) return EFI_OUT_OF_RESOURCES; Control = NvramSmmCommunicationBuffer->Control; NvramSmmCommunicationBuffer->MessageLength = MessageLength; CommSize = CommunicationHeaderSize + MessageLength; Status = SmmCommProtocol->Communicate (SmmCommProtocol, NvramSmmCommunicationBufferPhysicalAddress, &CommSize); if (EFI_ERROR(Status)) return Status; if (NvramSmmCommunicationBuffer->Control == Control) return EFI_NO_RESPONSE; if ((NvramSmmCommunicationBuffer->Control & NVRAM_SMM_ERROR_BIT)!=0) Status = NVRAM_SMM_STATUS_TO_EFI_STATUS(NvramSmmCommunicationBuffer->Control); return Status; } EFI_STATUS DxeSetVariableSmmWrapper ( IN CHAR16 *VariableName, IN EFI_GUID *VendorGuid, IN UINT32 Attributes, IN UINTN DataSize, IN VOID *Data ) { EFI_STATUS Status; UINTN AvailableBufferSize, VariableNameSize; SMI_SET_VARIABLE_BUFFER *SetBuffer; if (NvramSmmCommunicationBuffer == NULL) return EFI_UNSUPPORTED; if (!VariableName || !VendorGuid || (DataSize && !Data)) return EFI_INVALID_PARAMETER; AvailableBufferSize = MaxMessageLength - sizeof(SMI_SET_VARIABLE_BUFFER); VariableNameSize = GetVariableNameSize(VariableName, AvailableBufferSize); // If variable name or data is too large to fit into our buffer, it is also too large to fit // into NVRAM store. if (AvailableBufferSize < VariableNameSize) return EFI_OUT_OF_RESOURCES; AvailableBufferSize -= VariableNameSize; if (AvailableBufferSize < DataSize) return EFI_OUT_OF_RESOURCES; SetBuffer = (SMI_SET_VARIABLE_BUFFER *)&NvramSmmCommunicationBuffer->Control; SetBuffer->Control = NVRAM_SMM_COMMAND_SET_VARIABLE; SetBuffer->Attributes = Attributes; SetBuffer->DataSize = DataSize; SetBuffer->Guid = *VendorGuid; SetBuffer->VariableNameSize = VariableNameSize; MemCpy(SetBuffer+1, VariableName, VariableNameSize); MemCpy((UINT8*)(SetBuffer+1)+VariableNameSize, Data, DataSize); Status = Communicate( sizeof(SMI_SET_VARIABLE_BUFFER) + VariableNameSize + DataSize ); return Status; } EFI_STATUS DxeSetVariableSafe( IN CHAR16 *VariableName, IN EFI_GUID *VendorGuid, IN UINT32 Attributes, IN UINTN DataSize, IN VOID *Data ) { EFI_STATUS Status; BEGIN_CRITICAL_SECTION(NvramCs); if (NvramSmmIsActive) Status = DxeSetVariableSmmWrapper( VariableName,VendorGuid,Attributes,DataSize,Data ); else Status = DxeSetVariableWrapper( VariableName,VendorGuid,Attributes,DataSize,Data ); END_CRITICAL_SECTION(NvramCs); return Status; } The code relevant to SMM switching looks the same too, and EFI_SMM_COMMUNICATION_PROTOCOL implementation is provided by EDK2. They still allocate the SMM communication buffer as EfiRuntimeServicesData, and still pass its address via NvramMailbox NVRAM variable, so it should be guarded by AptioMemoryFix. As a result I believe that the infinite loop happens somewhere on the way to NvramSmm (which now represents former Smi and Smm code glued together). However, the brief checking of the binary and the source shows that the Smi handler (NvramSmmCommunicationHandler, SetVariableSmmHandler) is pretty much the same too. This leaves us in an uneasy situation, where we do not know where to look for the problem. What I could suggest is writing a EFI runtime driver (by ripping off the known APTIO V source) that will reimplement communication with SMM: 1. Allocate a new communication buffer. 2. Check & overwrite the address of the old communication buffer in MailBox variable 3. Overwrite EFI_RUNTIME_SERVICES Variable functions with APTIO code but the new communication buffer. The above will result in having a complete path prior to SMM code under our control. Afterwards we should be able to get this code fully functional on some working APTIO V system (e.g. Skylake or Kaby Lake), and try it on the new problematic system. By changing the logic via the return codes it should be easy to ensure where the issue is: DXE or SMM driver. Other than it may even help us to understand whether the SMI handler exists at all. If it is SMM, I would probably try replacing NvramSmm with NvramSmm & NvramSmi from some Z370 BIOS first and reflashing the firmware. Then… perhaps reverse-engineer/reimplement NvramSmm with the new changes and try to debug it too. If you like the idea, I can share APTIO src and let you proceed. Edited October 28, 2018 by vit9696 5 Link to comment Share on other sites More sharing options...
Pene Posted October 29, 2018 Share Posted October 29, 2018 (edited) Hi vit9696, Yes, an uneasy situation indeed... Meanwhile I made some test, with a rather strange result. I set a simple override for SetVariable, something like: OvrSetVariable( IN CHAR16 *VariableName, IN EFI_GUID *VendorGuid, IN UINT32 Attributes, IN UINTN DataSize, IN VOID *Data ) { EFI_GUID gEfiAppleBootGuid = {0x7C436110, 0xAB2A, 0x4BBB, {0xA8, 0x80, 0xFE, 0x41, 0x99, 0x5C, 0x9F, 0x82}}; return gOrgRS.SetVariable(L"TestVar", &gEfiAppleBootGuid, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, 4, "1234"); } ... which from some reason didn't panic on restart (obviously the "TestVar" was updated first by Clover's call to SetVariable, so I don't think it actually tried to write anything to nvram on restart). But then I changed only the preset apple boot guid to: return gOrgRS.SetVariable(L"TestVar", VendorGuid, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, 4, "1234"); ... which resulted again in panic on restart. Can't really explain why. Perhaps you have an idea. EDIT: Maybe it panics only when it actually has data to change? Otherwise it probably just returns EFI_SUCCESS and exists. Regarding your suggestion, it sounds like a good plan, but it's a bit too big for me at the moment, considering my limited experience with runtime EFI drivers combined with the limited free time that I currently have. But if someone else is up for this task, I'll be more than willing to test it on the Z390. Edited October 29, 2018 by Pene Link to comment Share on other sites More sharing options...
Pene Posted October 30, 2018 Share Posted October 30, 2018 On 10/28/2018 at 1:54 PM, vit9696 said: They still allocate the SMM communication buffer as EfiRuntimeServicesData, and still pass its address via NvramMailbox NVRAM variable, so it should be guarded by AptioMemoryFix. By the way, no chance it is not guarded? How can we check this? Link to comment Share on other sites More sharing options...
Recommended Posts