Jump to content
78 posts in this topic

Recommended Posts

1 hour ago, rafale77 said:

 

Sorry but I respectfully disagree. I don't think you fully understood what we are trying to do here. It isn't about resolving an issue but rather pursuing performance optimizations.

 

 

 


Optimization is not the right term, you are looking for extremes, the optimum - best performance-per-watt - is probably even slightly lower than 65w.

@WhenMusicAttacks In this experiment, it is premature to declare what's right and what's wrong and killing ideas will only thwart creativity.  You are absolutely correct in your statements about efficiency and exponentially increasing power dissipation and we all understand the need to be aware of the dangers of "cooking" our hardware.  I happen to agree with you about the synthetic benchmarks being only "bragging rights" and not true indicators of real-world performance.  A Visual Studio / Xamarin C# build on my HP EliteDesk 800 G4 Mini with i7-8700 takes 4 minutes, while the same build completes in 3 minutes on my HP EliteDesk 800 G5 Mini with i9-9900, so the performance gains in the "real world" are impressive without overclocking or defeating mfg limits.  As you said, real-world applications typically allow component cool-down time between compute intensive tasks.

 

We're just trying to have some fun with our hacks, since fun with hacking was the primary driver for doing it in the first place.  No reason for the fun to end yet.

  • Like 1
1 hour ago, WhenMusicAttacks said:


Optimization is not the right term, you are looking for extremes, the optimum - best performance-per-watt - is probably even slightly lower than 65w.

 

It's your definition of it. Not mine. We never talked about performance per watt or power efficiency. I am optimizing for best performance per dollar (already spent) and performance per footprint.... 😁

I hit the PL2 limit on a regular basis with applications I run. What is an extreme to you can be daily tasks for others. If we were looking for performance per watt, we would have bought a 35W TDP CPU... 

The fun is in uncovering how to do it.

 

Thank you @theroadw, It is a bit scary though. I would rather change these power limits rather than taking them out altogether... Is there a way to do that? Or do you think that the MSR settings would then become active? On my unit, I found PL1 and PL2 override flags which I tried to disable but they seem to get reset at each reboot:

 

image.thumb.jpeg.74619bde109904a63e26d187710215a8.jpeg

11 hours ago, rafale77 said:

 

It's your definition of it. Not mine. We never talked about performance per watt or power efficiency. I am optimizing for best performance per dollar (already spent) and performance per footprint.... 😁

I hit the PL2 limit on a regular basis with applications I run. What is an extreme to you can be daily tasks for others. If we were looking for performance per watt, we would have bought a 35W TDP CPU... 

The fun is in uncovering how to do it.

 


I am not questioning the fun, 100% agree on that. I am telling you since the beginning that undervolting is the most reasonable, efficient and long term solution to any performance problem with pre built systems. If your app hits the P2 limit for one millisecond, it's just as a benchmark number - not real world impact. If your power limit is giving you a sensible hit it means it is a long uninterrupted 100% run on the cpu - thus, the limit is goign to be somwhere else, as the machine is designed to maximize profit every part is tailored with that cpu PL in mind.
 

12 hours ago, deeveedee said:

@WhenMusicAttacks In this experiment, it is premature to declare what's right and what's wrong and killing ideas will only thwart creativity.  You are absolutely correct in your statements about efficiency and exponentially increasing power dissipation and we all understand the need to be aware of the dangers of "cooking" our hardware.  I happen to agree with you about the synthetic benchmarks being only "bragging rights" and not true indicators of real-world performance.  A Visual Studio / Xamarin C# build on my HP EliteDesk 800 G4 Mini with i7-8700 takes 4 minutes, while the same build completes in 3 minutes on my HP EliteDesk 800 G5 Mini with i9-9900, so the performance gains in the "real world" are impressive without overclocking or defeating mfg limits.  As you said, real-world applications typically allow component cool-down time between compute intensive tasks.

 

We're just trying to have some fun with our hacks, since fun with hacking was the primary driver for doing it in the first place.  No reason for the fun to end yet.


Not killing ideas but i stated in the beginning that you needed to look into undervolting first, it is even funnier than PL, as you need to tune and test every parameter.
Yor gain from 4 to 3 minutes is in line with the core count increase. You will also see gains with undervolt for sure. In most real task that are single core you will not notice much difference between these cpus - and mostly because of the increase in cache size and speed :D
 

11 hours ago, rafale77 said:

Thank you @theroadw, It is a bit scary though. I would rather change these power limits rather than taking them out altogether... Is there a way to do that? Or do you think that the MSR settings would then become active? On my unit, I found PL1 and PL2 override flags which I tried to disable but they seem to get reset at each reboot:

 

My ssdt removes the limit and locks the register so the OS can't override it. The power limit will revert back on wake from sleep or reboot.

Just as @WhenMusicAttacks mentions, undervolting is much preferred, but in my case with my Zbook G5, HP locked the ability to undervolt in a bios update that cannot be reversed. The performance in general suffered a big hit by a lazy fan profile, so I fixed the thermal side using liquid metal and then found that there was also a power limit in place that wasn't there before. Using Throttlestop I tested removing it in windows and after making sure if was safe, I made this ssdt for OSX. My version also hooks on WAK so the power limiter is removed all the time.

Thermal PROCHOT is still enabled and other platform thermal limits are still ok, so it should be safe and in theory it may be possible to damage battery or cpu voltage regulators if running benchmarks 24/7 but that's not the case, and actually if undervolting is still an option for you  it would also make it even "safer".

Edited by theroadw
  • Like 1

@theroadw Would you mind explaining your SSDT and how you found the addresses and values to modify?  Looks like very interesting work.  Nice job!

11 hours ago, deeveedee said:

@theroadw Would you mind explaining your SSDT and how you found the addresses and values to modify?  Looks like very interesting work.  Nice job!

The offsets and values were shared by the creator of Throttlestop (uncleweb) and I just used the old acpi trick to map those offsets to variables and then I change the values, changing the underlying memory content. 

 

 

DefinitionBlock ("", "SSDT", 1, "HP", "PWLOFF", 0x00001000)
{
    Method (XMXX, 0, NotSerialized)
        {
            LKZ1 = Zero          // Remove CPU Power Limit
            LKZ2 = 0x80000000    // Remove CPU Power Limit
        }

    OperationRegion (HPZK, SystemMemory, 0xFED159A0, 0x64)   // Define Memory Region
    Field (HPZK, AnyAcc, Lock, Preserve)
    {
        LKZ1,   32,  // Map Variable
        LKZ2,   32   // Map Variable
    }
}
Edited by theroadw
  • Like 1
  • Thanks 1

Thank you @theroadw, I may give this a shot.

 

@deeveedee, See my latest CB23 undervolting results below comparing 3 i9 CPUs:

i9 9900 on HP Elitedesk 800 G5 with PL1/PL2 = 65W/85W

i9 10900 on Lenovo P340tiny without undervolting (which is about the same as the 10850K without undervolt) and PL1/PL2 = 65W/120W

i9 10850K on Lenovo P340tiny with -130mV/-30mV/-120mV undervolt and PL1/PL2 = 65W/120W

 

image.png.a30e32dab9140b1c2d2838ef29253611.pngimage.png.ff1fdaba3162a4e741daf6f881e96b09.pngimage.png.d7e3766f1e04366586da1a0a24333dd2.png

 

I wish I had saved my 8700T scores from my previous HP 800 Mini G4 for comparison.

Not much to see on the Single-Core results besides the sad outcome that there has been basically very little improvements going from Kaby Lake to Comet Lake. Power limit and undervolting does close to nothing for this.

The most interesting is the Multi-Core numbers in the middle chart which shows that the Mini G5 Power limit was so constraining (power?/Thermal?) that the CPU behaves similarly to the 45W TDP 9880H which is essentially the same CPU with lower power limits.

 

 

  • Like 1

@rafale77 Looks like you are making good progress.  I haven't had time to play with my G5 Mini, but will circle back to here when I get a chance.

2 hours ago, deeveedee said:

@rafale77 Looks like you are making good progress.  I haven't had time to play with my G5 Mini, but will circle back to here when I get a chance.

It would be interesting to see if the Mini G5 UEFI can be read as easily by ControlMsrE2.efi. Once I figured it out, it was fairly easy to toggle both the CFG Lock and OC Lock with RU.efi.

 

I have also put together the Cinebench R23 graph below showing the drop in multicore performance on these constrained form factor machines as well as comparing to the last couple generations of Apple offering. You can see why I am so thrilled about my HackTiny... and I believe there is potential to do even better with Power limits and tau modifications. I noticed indeed that the default PL1/PL2/tau settings are completely overridden on my machine. The "K" CPUs Tau are suppose to be 56s, I am observing 28s.

 

 

image.png.3b29a576788e73c045a90b9a691d3530.png

 

 

Edit: The SSDT from @theroadw works! The PL values from Voltageshift now are effective and the tau override is also gone.

 

Edited by rafale77

@theroadw  It looks like your SSDT is writing to the 64-bit memory location of MSR @ 0x610.  Is that the way you interpret your SSDT?  If so, then as you indicated, 0x80000000 is writing a 1 to the MSB of the register which sets the MSR lock.  The bits in the 64-bit field are defined in the link I posted here.  If that is indeed the case, and your SSDT is writing MSR 0x610, then we can enhance your SSDT by using it to actually set new power limits and time windows.

 

Do you agree?  If so, could we use this same technique to write other MSR registers and if so, we just need to find the addresses of those MSR registers (analgous to 0xFED159A0).

 

If we are going to use your SSDT to write other fields within the MSR, then it would probably be best to redefine Field (HPZK) so that it explicitly defines each variable within the field instead of just two arbitrary 32 bit fields witin the 64 bit register.  Something like this:

 

        Field (HPZK, AnyAcc, Lock, Preserve)
        {
			PL1,  15, 
			PL1_EN,   1,
			PL1_CLMP, 1,
			PL1_T, 7,
			,8,
			PL2, 15,
			PL2_EN, 1,
			PL2_CLMP, 1,
			PL2_T, 7,
			,7,
			MSR_LK, 1
        }

 

According to this, the undervolt MSR is at 0x150.

 

This post is interesting.

Edited by deeveedee
  • Like 2

Looking forward to see what you can come up with @deeveedee.

I am not seeing a huge improvement but I can report what I am observing:

  1. The kext does remove the 28s locked tau from my testing on cinebench R23. This is from reading the wattage and frequency on HWmonitor during testing. I no longer see the drop to 65W after 28s about halfway through the test. 
  2. PL1 and PL2 values appear to be reversed on my machine Vs. what Voltage shift is reporting. I am seeing higher clocks and initial voltages varying PL1 and higher clocks and voltages varying PL2.

 

That being said, the GB5 scores are not impacted (multicore test is too short) and I see only a very small, albeit consistent improvement of 200+pts from the higher clocks at the beginning and at the end of the cinebench test. I don't know why power gets stuck at 95W with clocks at 4.3GHz throughout the middle of the test. I don't think I am thermally limited. It seems like AppleXCPM takes over...

 

Looking forward to see what you come up with as I would love the idea of superceding Voltageshift with an SSDT...

It may be a while before I get more time to experiment.  If anyone else wants to investigate, my next step is to use Rehabman's ACPI Debug to print memory location values so that I can confirm the values prior to any modifications.

using RM's ACPI Debug is my paranoid way of ensuring that the SSDT is actually referring to the correct memory locations.  I will need to install Windows, since I agree with you that the easiest way to test actual value changes is with Throttlestop.

Edited by deeveedee

EDIT: I'm testing on Big Sur now and ACPIDebug is working.  I'll go back and test on Monterey when I have more time, but I don't think it was a Monterey issue. I ended up rebuilding ACPIDebug.kext with XCode and now ACPIDebug is writing to the log.  I should now be able to use ACPIDebug to view memory locations as seen by the SSDT before attempting any MSR changes via the SSDT.

 

I haven't had much time to play with this.  I attempted to use Rehabman's ACPIDebug, but am not seeing any ACPIDebug output to system.log.   I think the last time I used ACPIDebug was with macOS Catalina and I'm currently attempting to use it with Monterey.  I've confirmed that Device (RMDT) is loaded and ACPIDebug.kext is loaded, so I'm not sure what's going on.  If anyone has used ACPIDebug with Monterey, I'm open to suggestions.  Thanks.

Edited by deeveedee

@theroadw I have my first output from ACPIDebug, but I don't understand it.  In the following debug output, LKZ2 and LKZ1 are the values read from your variables.  The other variables are values read from mine.  I read the values at memory address 0xFED159A0 (I didn't write anything).

 

I'm assuming I did something wrong and will look at this again when I get time.  I have confirmed that ACPIDebug is reporting correct values for other storage locations, so I must be doing something wrong with 0xFED159A0.

 

2022-05-28 13:26:47.726720-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "LKZ2, LKZ1", 0x0, 0x10000, }
2022-05-28 13:26:54.002451-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL1", 0x0, }
2022-05-28 13:26:54.002544-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL1 ENABLE", 0x0, }
2022-05-28 13:26:54.002717-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL1 CLAMP", 0x1, }
2022-05-28 13:26:54.002908-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL1 TIME", 0x0, }
2022-05-28 13:26:54.003096-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL2", 0x0, }
2022-05-28 13:26:54.003284-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL2 ENABLE", 0x0, }
2022-05-28 13:26:54.003472-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL2 CLAMP", 0x0, }
2022-05-28 13:26:54.003660-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "PL2 TIME", 0x0, }
2022-05-28 13:26:54.003848-0400 0x1d42     Default     0x0                  0      0    kernel: (ACPIDebug) ACPIDebug: { "MSR LOCK", 0x0, }

 

EDIT: The address 0xFED159A0 is MCHBAR + 0x59A0.  I'm venturing into unknown territory (for me).  Do I need to confirm the MCHBAR address for my EliteDesk G5 Mini?  Maybe it's not 0xFED10000?

 

EDIT2: After further reading about MCHBAR, I made a loose connection with MCHBAR and PNP0C02.  Then I found this in IORegistry.  Maybe 0xFED159A0 is write-only to MSR 0x610 and can't be read via an ACPI patch?  I'm reluctant to start experimenting with SSDT patches that write 0xFED159A0 until I can confirm its contents.  NOTE: This is an IOReg snapshot from my hackbookpro15,2 (thus the PNLF device which my HackMini8,1 does not have).  the PDRC address show is the same for my HackBookPro and HackMini.

Spoiler

1305021930_ScreenShot2022-05-29at12_29_38PM.png.6d3f947b34e4cf6fc61a86b22ac32068.png

 

EDIT3: While I was trying to learn how to inspect physical memory in macOS, I stumbled upon boot-arg kmem=1 which does still work in Monterey.  Setting boot-arg kmem=1 enables /dev/kmem.

Edited by deeveedee

@theroadw Yes - I've read the same following your reference to uncleweb.  I may be on to something.  I re-read this which explains how to obtain MCHBAR (MCH Base Address).  

sudo setpci -s 0:0.0 48.l

Then I installed setpci for macos using @Vampire Cat 's repo here.

 

After adding boot-arg debug=0x144, I am able to run the setpci command from above and get this result:

sudo ./setpci -s 0:0.0 48.l
fed10001

Is it possible that my MCHBAR is 0xfed10001 and not 0xfed10000 on my rig?  I'll test and report back.

 

 

EDIT: @theroadw Changing MCHBAR to 0xfed10001 didn't help.  The values read by ACPIDebug were slightly different from those using MCHBAR 0xfed10000, but still made no sense.  I'll think about this more before I continue experimenting

 

If you want to try to duplicate my 'setpci' experiment, you should be able to use the attached executable as follows:

 

  1. Unzip setpci
  2. Add boot-arg debug=0x144 and reboot
  3. chmod +X setpci
  4. run 'sudo ./setpci -s 0:0.0 48.l'

 

EDIT2: @theroadw and @rafale77 - maybe I'm reading the memory mapped I/O relation incorrectly.  I reread this and noticed this:

610h MSR_PKG_POWER_LIMIT is not synchronized with MCHBAR Package Power Limit for unknown reasons. If you set only one of the two the lower value seems to take effect.

 

That makes it seem that reading the Package Power Limit from MCHBAR + 0x59A0 does not return the value of MSR 0x610.  Maybe my idea of reading the values at MCHBAR + 0x59A0 won't return anything until I actually write to it.  Since both of you have already used the SSDT-RM-PLIM to write to 0xfed159a0, if you don't mind running the setpci command above, I'd like to see the value setpci returns on your rigs.  If you also get 0xfed10001, that will be good to know.  Thank you.

 

 

setpci.zip

 

 

Edited by deeveedee

@rafale77 Thanks for trying.  There must be something else that I installed to get it working.  Let me retrace my steps and I'll post a correction here.  Thanks.

 

EDIT: @rafale77 Try disabling SIP.  When I set boot-arg debug=0x144 and disable SIP, setpci is working for me.

Edited by deeveedee

@deeveedee, I have briefly tried disabling SIP and am still getting the same error. Not sure what I am doing wrong. I have been busy investigating other things on my mini...

I have also pretty much concluded that I am dealing with thermal throttling limits at this point and @theroadw's SSDT has unlocked what was probably most important for my setup, the OEM Tau which seems to now sit at 56s instead of 28s from my cinebench runs. My machine appears to be hitting a wall at about 110W regardless of the power limits and the long term power limit appears to stabilize at 95W (up from the stock 65W limitation I started with). This is already pretty amazing for such a small box which doesn't get all that noisy. The only improvement I can think of woulld implement the undervolting from an SSDT instead of a kext which gets reset whenever the machine goes to sleep.

 

I have also tinkered with CPUFriends to lower the idle frequency and boost the frequency profiles under load but concluded that nothing was to be gained. The lower frequency at idle (800MHz Vs 1.2GHz) doesn't yield observable power savings.

Edited by rafale77
×
×
  • Create New...