Jump to content

Help installing Mojave on Xeon W-2175 and Asus WS C422 mobo


obus
 Share

852 posts in this topic

Recommended Posts

8 hours ago, obus said:

Ok.

In attached extracted *.bin file I have both CFG Lock, MSR Lock Control and Package Power Limit MSR Lock. 

The first one (CFG Lock) is Enabled/locked (0x1).

The second one (MSR Lock Control) is 0x15 which is unfamiliar for me. This is the same as you changed in your firmware if I understand you correctely. But you changed it from 0x1 --> 0x0 and not from 0x15?.

The third one (Package Power Limit MSR Lock) is enabled/locked (0x1) like the CFG Lock. 

Now to the "one hundred thousand dollar question" which one (s) should I change?

 

Any ideas

3003.text

 

So, your BIOS is rather different from mine and that makes me a bit nervous about modifying this. If you'll notice, there are two different forms titled "CPU Configuration". There's FormId 0x513, which I believe is the version you're seeing in the BIOS UI. Note how all its variables are stored into VarStore 0x15.

 

Next let's take a look at CPU Configuration #2, FormId 0x272F. All the variables are stored into VarStore 0x1. It's a different table, yet contains a lot of the same configuration like hyperthreading. Hyperthreading is stored at VarStore 0x1, offset 0x4FC here yet at VarStore 0x15, offset 0x5 in the other form. This makes me very suspicious and kind of assume that only the VarStore 0x15 values are actually used (assuming FormID 0x513 is the what you're seeing in the BIOS UI).

 

In FormID 0x272F, we have a reference to its power configuration page, "CPU - Power Management Control" (FormId 0x2732) which is where that CFG Lock option lives. Which means that it's only settable on VarStore 0x1. And it's unclear if that will actually do anything. Same goes for Package Power Limit MSR Lock, which is also in VarStore 0x1. This would also explain why modifying those values in the NVRAM directly would have zero effect if they're not actually used. Showing them won't make any difference because they're modifying variables that the firmware isn't looking at.

 

The only one of the three options that's in VarStore 0x15 is MSR Lock Control. It looks like it should be visible, though. Is that not showing up on your existing CPU Configuration page in the bios, underneath the link to CPU Power Management Configuration?

  • Like 1
Link to comment
Share on other sites

47 minutes ago, yapan4 said:

Good job, @obusBut I think confused in lock/unlock and hide/unhide, no?

 

No not really. I want to try to unhide the MSR Lock Control settings in firmware so we can set it manually in bios settings. The problem is that I'm not sure how to acchive it yet. I need more help from @eritius  regarding which hex I should use and how to find/edit them in hex editor.

 

  

Edited by obus
Link to comment
Share on other sites

40 minutes ago, eritius said:

The only one of the three options that's in VarStore 0x15 is MSR Lock Control. It looks like it should be visible, though. Is that not showing up on your existing CPU Configuration page in the bios, underneath the link to CPU Power Management Configuration?

Right this Settings is visible in bios but it is only for MSR 3Ah,MSR 0E2h and CSR 80h.

 

" This would also explain why modifying those values in the NVRAM directly would have zero effect if they're not actually used. Showing them won't make any difference because they're modifying variables that the firmware isn't looking at"

 

That sounds plausible too because the MSR 0xE2 register was still locked after trying to modify the values in NVRAM.

 

 

This is unfortunately way above my head but I'm prepared to test what ever I need. I can always use the flashback utility if I brick the board with a faulty flash (hopefully).

So if you have any suggestions I will try. 

Edited by obus
Link to comment
Share on other sites

49 minutes ago, obus said:

Right this Settings is visible in bios but it is only for MSR 3Ah,MSR 0E2h and CSR 80h.

 

" This would also explain why modifying those values in the NVRAM directly would have zero effect if they're not actually used. Showing them won't make any difference because they're modifying variables that the firmware isn't looking at"

 

That sounds plausible too because the MSR 0xE2 register was still locked after trying to modify the values in NVRAM.

 

 

This is unfortunately way above my head but I'm prepared to test what ever I need. I can always use the flashback utility if I brick the board with a faulty flash (hopefully).

So if you have any suggestions I will try. 

 

 

Well, if you're comfortable playing with fire, here's what I would try first. Here's where the BIOS decides which version of CPU Configuration to show to you, the more advanced one or the stock one:

0x659A2 		Suppress If {0A 82}
0x659A4 			QuestionId: 0xEFB equals value 0x1 {12 06 FB 0E 01 00}
0x659AA 			Suppress If {0A 82}
0x659AC 				QuestionId: 0xEEC equals value 0x7 {12 86 EC 0E 07 00}
0x659B2 					QuestionId: 0xEEC equals value 0x8 {12 06 EC 0E 08 00}
0x659B8 					Or {16 02}
0x659BA 				End {29 02}
0x659BC 				Ref: CPU Configuration, VarStoreInfo (VarOffset/VarName): 0xFFFF, VarStore: 0x0, QuestionId: 0x359, FormId: 0x272F {0F 0F EF 00 F0 00 59 03 00 00 FF FF 00 2F 27}
0x659CB 			End If {29 02}
0x659CD 			Suppress If {0A 82}
0x659CF 				True {46 02}
0x659D1 				Ref: Power & Performance, VarStoreInfo (VarOffset/VarName): 0xFFFF, VarStore: 0x0, QuestionId: 0x35A, FormId: 0x2731 {0F 0F 29 01 2A 01 5A 03 00 00 FF FF 00 31 27}
0x659E0 			End If {29 02}
0x659E2 		End If {29 02}
0x659E4 		Suppress If {0A 82}
0x659E6 			QuestionId: 0xEFB equals value 0x0 {12 06 FB 0E 00 00}
0x659EC 			Ref: CPU Configuration, VarStoreInfo (VarOffset/VarName): 0xFFFF, VarStore: 0x0, QuestionId: 0x35B, FormId: 0x513 {0F 0F EF 00 F0 00 5B 03 00 00 FF FF 00 13 05}
0x659FB 		End If {29 02}

It's choosing based on QuestionId 0xEFB which is a variable that is completely unsettable anywhere in the UI. It's used in a lot of places to choose whether to show additional or more complicated options. If I had to guess a name for it, it's probably something like UserMode or ReleaseMode. Now, it's possible that none of this will work unless we set QuestionId 0xEFB to 0x0. Let's start with something less drastic, though, and instead modify the SuppressIf conditions to show the advanced CPU Configuration page.

 

In your hex editor, open the binary, jump to 0x659A4 (the offset for the QuestionId line). The next 6 bytes shown at that offset should be 12 06 FB 0E 01 00. Change the 01 to 00, so you should have 12 06 FB 0E 00 00.

 

I'm not sure what QuestionId 0xEEC is but let's just bypass it. At offset 0x659B2, you should see the 6 bytes 12 06 EC 0E 08 00. Replace those with 12 06 FB 0E 00 00 in your hex editor.

 

Save, reextract, confirm that it now looks like the following:

0x659A2 		Suppress If {0A 82}
0x659A4 			QuestionId: 0xEFB equals value 0x0 {12 06 FB 0E 00 00}
0x659AA 			Suppress If {0A 82}
0x659AC 				QuestionId: 0xEEC equals value 0x7 {12 86 EC 0E 07 00}
0x659B2 					QuestionId: 0xEFB equals value 0x0 {12 06 FB 0E 00 00}
0x659B8 					Or {16 02}
0x659BA 				End {29 02}

 

Since 0xEFB is set to 1, this should allow the other CPU Configuration page to show.

 

Replace the binary body in UEFITool, save as, flash, reboot, and you should see two options for CPU Configuration. The first option is the new one. That should give you access to the CFG Lock option.

Edited by eritius
  • Like 4
Link to comment
Share on other sites

2 hours ago, eritius said:

Well, if you're comfortable playing with fire, here's what I would try first.

Ok. Now I have a new entry in bios. CFG Lock set as disabled as default.  (defined in bios as "Configure MSR 0xE2 (15), CFG Lock bit).

This is a good progress but unfortunately I still can't boot with that firmware..

If I check VerifyMsrE2.efi from shell with debug version of OC the result is still a locked MSR 0xE2 register. This regardless of CFG Lock in bios is set to disabled.

 

Any further ideas?

 

Setup.text

Setup.bin

Edited by obus
Setup.text and Setup.bin
  • Thanks 1
Link to comment
Share on other sites

14 hours ago, yapan4 said:

Did the same changes as above but now on the 1202 bios.

Result was the same with a new CFG Lock button (enabled/disabled) under CPU Power Management in bios. If I set that button to enabled or disabled dosen't matter because my rig is booting with booth settings so that setting seems to change nothing.  

 

 

 

  • Like 1
Link to comment
Share on other sites

Well, negative result also is result.

 

Here ASUS announced BIOS updates for X299 series

https://www.asus.com/News/g027af4nenss0hsi

 

And this MoBo already got new BIOS

https://www.asus.com/Motherboards/Pro-WS-C422-ACE/HelpDesk_BIOS/

 

Also my question to ASUS Support still opened.

 

So hopefully the new BIOS will come out and be fixed...

Edited by yapan4
Link to comment
Share on other sites

On 12/20/2019 at 3:10 PM, obus said:

Ok. Now I have a new entry in bios. CFG Lock set as disabled as default.  (defined in bios as "Configure MSR 0xE2 (15), CFG Lock bit).

This is a good progress but unfortunately I still can't boot with that firmware..

If I check VerifyMsrE2.efi from shell with debug version of OC the result is still a locked MSR 0xE2 register. This regardless of CFG Lock in bios is set to disabled.

 

Any further ideas?

 

Setup.text

Setup.bin

Again, this doesn't surprise me. As I said, it's setting a variable in VarStore 0x1 while the normal CPU Configuration page is only using variables in VarStore 0x15. The only other thing I can think to try would be to revert to the original firmware and modify the variable QuestionId 0xEFB. It's currently set to 1, which I'm guessing is something like ReleaseMode or UserMode, but it's impossible to say. You could hijack an unused option, change it's QuestionId to 0xEFB, and then turn it off in the BIOS. That should swap the CPU Configuration pages that we were doing before as well (assuming 0xEEC is set to something reasonable but who knows).

 

Before you flash it, see what the disassembled text looks like for the "One Of" you hijacked. The text following the colon is looked up by QuestionId so it might provide some more insight into what the hell that variable actually is intended for.

 

That said, I really can't say what the consequences of this will be and I really wouldn't recommend it. At this point, we're straying pretty far from toggling single values towards trying to enable testing-only factory settings. And that's a best case scenario. Even then, we're doing it blindly with no good sense of what it affects. As far as I know, it could put it into a half-baked state and cause permanent hardware damage to the motherboard or something connected to it.

  • Like 2
Link to comment
Share on other sites

3 hours ago, eritius said:

Again, this doesn't surprise me. As I said, it's setting a variable in VarStore 0x1 while the normal CPU Configuration page is only using variables in VarStore 0x15. The only other thing I can think to try would be to revert to the original firmware and modify the variable QuestionId 0xEFB. It's currently set to 1, which I'm guessing is something like ReleaseMode or UserMode, but it's impossible to say. You could hijack an unused option, change it's QuestionId to 0xEFB, and then turn it off in the BIOS. That should swap the CPU Configuration pages that we were doing before as well (assuming 0xEEC is set to something reasonable but who knows).

 

Before you flash it, see what the disassembled text looks like for the "One Of" you hijacked. The text following the colon is looked up by QuestionId so it might provide some more insight into what the hell that variable actually is intended for.

 

That said, I really can't say what the consequences of this will be and I really wouldn't recommend it. At this point, we're straying pretty far from toggling single values towards trying to enable testing-only factory settings. And that's a best case scenario. Even then, we're doing it blindly with no good sense of what it affects. As far as I know, it could put it into a half-baked state and cause permanent hardware damage to the motherboard or something connected to it.

Thank's for your input. I will see if I dare to do the testing you are suggesting. It could as you say make a lot of harm and in worst case "brick" my mobo.

Anyway Iv'e learned a lot from this testing which is good for my confidence. Now I will wait and see if Asus maybe will come back to me with. hopefully a new unlocked firmware. 

  • Like 1
Link to comment
Share on other sites

There is another crazy idea - cut out the CPU microcode from Apple iMacPro1,1 firmware and insert into  ASUS BIOS... its joke...

:)

 

Ok, now it is known that v.3003 does not make any improvements (even vice versa) so there is no need to upgrade to it. Will be Xeon W-22xx on the hands, then we will think...

 

Edited by yapan4
  • Haha 1
Link to comment
Share on other sites

  • 3 weeks later...
On 8/5/2019 at 10:18 AM, eritius said:

 

Me too! I'm getting a setup with a C621 chipset and dual Xeon Golds up and running at the moment. I've encountered a lot of the issues mentioned here. Unfortunately, the same CPUID patch (0x0506E3) causes all sorts of issues with my setup. For some reason, it causes random memory corruption in the kernel which results in kernel panics in all sorts of extensions and messages like:


"Zone cache element was used after free!"

and


"a freed zone element has been modified"

So far, the only workaround I've found is to use a less compatible CPUID 0x040674. In the handful of cases where I got luck and the memory scribbling didn't land on anything critical, the dual processors showed up as Xeon Ws in About This Mac. Geek Bench scores were not where I would expect them to be, though, so there are some issues to sort out even if that becomes stable.

 

As well, as of Catalina beta 5, the C621 chipset doesn't appear to be entirely support. USB all work out of the box which is great, but there's no driver for the X722 ethernet ports, or at least not one that comes up automatically for the 1Gb ports on this motherboard.

 

If anyone has any suggestions for things to try, I'm all ears!

Hello, I had build with C621 chipset, and beat all except MacPro 2019.

But X722 ethernet not work.

Open Core is better, if u can not disable CFG Lock.

 

 

Edited by MacXZ
  • Like 1
Link to comment
Share on other sites

  • 4 weeks later...

@eritius  Unfortunately it’s pretty common for a vendor to use basically a template for their BIOS menus, and they will simply hide a lot of stuff instead of removing it.  
 

What this means is that the existence of a hidden option in the bios menus does not in anyway imply that said functionality is actually present in the bios.  Sometimes it still is, but often it isn’t.  Supermicro is especially bad about this.  You can find entire overclocking sub menus in the X11DAi bios, even though it has no ability to overclock.

 

 

@eritius, I wanted to ask you something.  I have the same motherboard as you, and I am trying to get a similar rig up and running, the main difference being that I’m using 2nd generation Xeon scalable/Cascade Lake-SP CPUs.  Specifically, dual Xeon Platinum 8260m CPUs.  
 

Thanks to your work on Skylake-SP and open core, I’m able to almost boot.  I have all the necessary xcpm related quirks turned on, but I can’t make it past loading of the ACPI tables.  
 

The tables load fine, but then the boot process just... stops.  I’ve made the boot process as verbose as possible with every debut feature I know to turn on, and I can’t find any thing that would indicate the problem.  There is no error, no panic text, nothing.  The last line is typically one about the ACPI tables being loaded successfully.  This happens in both Mojave and Catalina 10.15.3.

 

I’ve tried both iMacPro and MacPro7,1 profiles, I’ve tried a fake cpuid of 50654 as well as 40476.  

I am using the latest BIOS from super micro.  I wonder if that introduced a chance in the ACPI tables that macOS is not cool with?  
 

Another theory is it is CPU related though that seems unlikely if Skylake-SP is working.  I just say that because this is also the point where AppleACPICPU would begin to load.  
 

Would you be willing so do me a huge favor and upload a dump of your motherboard’s DSDT and SSDT tables so I can do a diff on them with respect to mine?  I don’t think the earlier bios (which I can’t find regardless) supports cascade lake so I’m kind of stuck on the more recent BIOS.  

 

 

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, yapan4 said:

Hi, @obus

As there is no activity from ASUS, I suggest to continue searching for the BIOS v.3003 solution. Or may bee OpenCore already  has solution?

Probably @metacollin has the same or similar issue...

 

 


I believe my issue is different unfortunately.  @eritius actually uploaded his open core config a while ago in this thread, and I have the exact same motherboard as him (X11DAi-N) and using that config, I still can’t make it past the ACPI tables loading (which do load successfully).  Every boot attempt regardless of any changes or BIOS options or cleaned NVRAM/cleared CMOS... it always ends the same way.  Like this:

 

 

6DBC6CD8-E915-4FF8-94A3-CECC6AA6F750.thumb.jpeg.67f0e371fc028832d80e254158996ff5.jpeg

 

(Ignore the NVMe alloc error, that was just me messing around, it happens without it)

 

The only meaningful differences between my config and eritius’ is I’m using platinum 2nd generation scalable xeons vs gold first generation, and I am using a newer bios version than he is.  My suspicion is that the kernel may be hanging upon the execution of the AML code, which presumably it would begin executing shortly (or even immediately) after successfully loading said tables.  Considering how close cascade lake-sp is to skylake-sp, I highly doubt it’s the CPUS.  
 

Anyway, I’m going to try a some DSDT modifications that I suspect might be the cause of it... but I am totally expecting it to end up being just another dead end at this point.  I’m running out of ideas of what could even be wrong. 

 

But yeah, as far as I know, Open Core has all the necessary patches to disable macOS from writing to the problem MSRs. If you enable the AppleXcpmCfgLock and AppleXcpmExtrasMsrs quirks in Open Core, that should get most motherboards booting (assuming there are not any other problems).  These quirks basically implement the various pikeralpha, pmheart et all xcmp msr patches but in a more robust, dynamic way (automatically patching the kernel even if the addresses etc change).  AppleXcpmExtrasMsrs implements additional, scalable Xeon specific (I think?) patches (In addition to the extra xcpm patches widely known) which I don’t think exist in clover-friendly form unfortunately 

 

I think that is probably the best solution at least for now.  It may prove extremely difficult to modify any motherboard’s BIOS to unlock the MSRs needed (especially for the Xeon scalable ones which write to additional often locked registers, not just 0xE2).  All of the past methods of modding BIOSes to unlock said registers relied upon locating an actual bios menu item (hidden or otherwise) to find the right addresses to change.  It seems that the menu option that would normally unlock the right MSRs doesn’t actually do anything in most (all?) C62x and C422 based mobos, so it could prove very difficult indeed to mod any of these BIOSes unfortunately. 

  • Like 1
Link to comment
Share on other sites

OK, I think I may have discovered the problem.  I know this is affecting zero other people right now, but if anyone else ever tries to hackintosh a X11DAi-N with the latest firmware, they'll (hopefully) find this post helpful. 

 

So it looks like Supermicro is using the LoadTable() function to selectively load several additional SSDTs based on BIOS options (specifically, those relating to power management for the CPUs). 

 

LoadTable(), according to the ACPI 6.x spec, is supposed to load an additional DefinitionBlock containing .aml file into the ACPI namespace.  Of course, to allow these tables to be selectively loaded, they cannot be declared as DSDT or SSDT types, as the OS would load them automatically, removing any ability to only load certain ones.  These tables are, in fact, SSDT tables, but they're named with nonstandard table types, specifically OEM1, OEM2, OEM3 and OEM4.  Any OS will see those types and simply ignore the tables, unless one of them is forced to load via either a DSDT or SSDT table invoking  LoadTable()

 

Therein lies the problem:  LoadTable() is an AML function, and inside AML code.  That means it happens at run time, when the OS is actually executing the AML code contained in the DSDT and SSDTs.  It is important to remember that AML is a combination of hardware/device descriptions, but also executable code that lets the OS interact with said hardware.  

 

macOS appears to perform the table loading and AML code execution separately, as based on my BIOS settings when I took that screenshot in my last post, tables OEM2 and OEM4 should have been loaded.  Yet, at the bottom, it says 5 tables were successfully loaded.  There is the DSDT of course, plus 4 SSDTs, which is 5.  If tables OEM2 and OEM4 (remembering that these are secretly just SSDTs) were being loaded by macOS, that count would be 7. 

 

The specific point where LoadTable() is called is upon executing various standard ACPI functions on the CPU device descriptors.  Supermicro's intention was to selectively load things like HWP SSDT tables or legacy P-state tables depending on certain BIOS switches (ACPI variables that it checks with an if statement).  

 

I think macOS must only support static loading and not dynamic table loading, and then when it tries to execute the loaded DSDT and SSDTs, it reaches a LoadTable() call and can't find the requested table (since, being of a type other than DSDT or SSDT, were never loaded upon initialization), it just hangs. 

 

I'll hopefully test this theory later today, I have to rework the Supermicro SSDTs a bit.  I'd rather not drop the table entirely if at all possible. 

Edited by metacollin
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Wow, @metacollin! With this knowledge, you can help yourself and anyone on this forum :lol:

I hope you are on the right direction.

Maybe you can help us too:worried_anim:

 

For comparison: old SkyLake and new CascadeLake ACPI from ASUS WS C422Pro/SE (dumped by Clover).

 

I'm sure the non-bootable macOS on CascadeLike BIOS v.3003 is somehow related to ACPIs tables. Locked MSR is important but not critical and is temporarily ignored.

WS C422Pro:SE OEM ACPI Tables.zip

Edited by yapan4
Link to comment
Share on other sites

17 hours ago, yapan4 said:

I'm sure the non-bootable macOS on CascadeLike BIOS v.3003 is somehow related to ACPIs tables. Locked MSR is important but not critical and is temporarily ignored.

WS C422Pro:SE OEM ACPI Tables.zip

 

Indeed, you are right.  The ACPI table issue was still a problem for my motherboard in the sense that it would prevent macOS from loading certain SSDTs (which may or may not even matter), but unfortunately it had nothing to do with the hang/crash.  Working around all instances of LoadTable() did nothing, though now all the SSDTs I want to load are loading.  

 

Oh, and are you saying that other people are having issues booting cascade lake?  I thought it was just me.  I should reread this thread more carefully.  :blush:

 

Anyway, thank you for those ACPI tables.  I'll take a look, though I agree that it probably isn't anything to do with ACPI tables.  I mistakenly thought other people were able to boot macOS using cascade lake (possibly using a fake cpuid for skylake-x) and that the issue must be due to the only difference between eritius' motherboard and mine - the BIOS version (and thus likely some problem with the ACPI tables).  

 

But if there is generally an issue with cascade lake and macOS... well, then obviously the problem is that I am using cascade lake CPUs and it has nothing to do with my BIOS or anything else.  I wish I could get some sort of debug information out of the macOS kernel but it just seems to lockup. 
 

I was able to get a CPU register dump after it had locked up trying to boot macOS via IPMI, though I haven’t really looked at it yet, but here it is just in case:

 

save_config.txt

Edited by metacollin
  • Like 2
Link to comment
Share on other sites

Well the register dump is apparently just the machine check exception registers, which are generally for hardware problems.  And they all just show that there is no exceptions.  I thought the dump would have more in it.  

 

But, I have great news! I'm an idiot :D!  This entire time, the answer was at my very hands.  Literally.  I have a backlit keyboard, and it is currently configured to always be on - if it has power, it lights up.  And back when I was working on just getting OpenCore to even load the kernel etc., I knew there was a problem whenever the keyboard stayed lit.  However, if OpenCore managed to start the kernel booting, shortly before the boot began, my keyboard's backlight would always turn off.  I learned to recognize my keyboard going dark as a sign that macOS was successfully starting to boot.  

 

But that means power to that USB port is being turned off completely.   It's a pretty dumb keyboard, it doesn't know or care if it has been enumerated or anything else.  If it has power, it lights up.  It can't not light up unless it is unpowered.  And some quick research seems to show that USB 3.0 is a complete mess on X299 (and presumably, C621) chipsets, with only USB 2.0 working without a ton of mucking about.  Which is unfortunate, as the X11DAi-N doesn't actually have any USB 2.0 ports, or any EHCI controller to speak of.  

 

Anyway, if macOS is cutting power to all my USB ports the moment it starts booting... I feel like this probably presents a problem if both my EFI partition and Catalina install partition are on a USB thumb drive.  I feel like this probably might result in the boot process simply hanging.  It's like pulling the drive out of the port the moment you see any verbose boot text.  

 

So uh, I'm 99% sure my problem has everything to do with me not dealing with USB properly (or just avoiding it entirely) and nothing to do with Cascade Lake.  

 

Once I sort that out, hopefully I can make some progress on Cascade Lake + C621 (which should translate over to C422).  

 

Ugh, I can't believe how long it took me to make the connection between keyboard turning off and USB crapping out.

  giphy.gif

  • Like 1
Link to comment
Share on other sites

 Share

×
×
  • Create New...