Jump to content

Clover General discussion


ErmaC
30,136 posts in this topic

Recommended Posts

Got this in console :

Loading 'mach_kernel'...
Could not open file 'mach_kernel'
Error loading kernel 'mach_kernel' (0xe)

and this in screen :

1697132802_Screenshot2021-03-17at11_35_05.thumb.png.47c3924ee865f20cea2f9d84b030d4ed.png

 

The kernel is not there so all that seems normal. No crash...

 

If you can reproduce, try launching from Qemu folder in Clover with the qemu in qemu-portable folder. maybe we have qemu bug problem...

Also, the clover you gave me doesn't boot with timeout. I tried compiling one, but no crash either.

Link to comment
Share on other sites

It is not a qemu problem as other users reported the problem on real hardware. 

1. This is legacy hardware having no UEFI. There is only bios boot. No problem with UEFI boot.

2. Boot by timeout is required for reproduce. If we type "enter" then no crash.

I reproduced the behaviour with my qemu and then made git bisection to find a commit when the problem started.

Link to comment
Share on other sites

One more condition:

3. There is no OpenRuntime.efi. If it presents then no crash.

First observation

https://applelife.ru/threads/clover.42089/page-1339#post-926032

Second confirmation

https://www.applelife.ru/threads/clover.42089/page-1351#post-928012

My confirmation

https://www.applelife.ru/threads/clover.42089/page-1351#post-928188

  • Like 1
Link to comment
Share on other sites

You sent me a CloverX64.efi that suppose to reproduce but this efi doesn't boot with timeout.

5 hours ago, Slice said:

1. This is legacy hardware having no UEFI. There is only bios boot. No problem with UEFI boot.

2. Boot by timeout is required for reproduce. If we type "enter" then no crash.

I've well understood that, but can't reproduce yey on my Qemu.

5 hours ago, Slice said:

I reproduced the behaviour with my qemu

Good, I'd like the same.

4 hours ago, Slice said:

I never said it's not true. No need to copy/paste link as "proof".

What I'm saying is using an intermediary variable doesn't "solve" the problem. It probably just hides it by coincidence. The problem cannot come for using .wc_str() in args. There is plenty of that and if we need to extract an intermediary variable for each, we won't see the end. It's basic compiler task, I doubt there is a bug in that.

Also, the fact that pressing enter avoids the problem is a sign of a deeper problem.

 

Creating an intermediary change the sized used from the stack and probably hides the fact that a pointer is wrongly rewritten by duplicating it. Or something else.

 

For me, it's crucial to find that. And I need help to reproduce and investigate.

If you think it doesn't matter, just let me know and I won't bother you anymore...

Link to comment
Share on other sites

I agree there is the task. I just can't understand why you can't reproduce this while I did this many times.

It is very strange bug, not logical. And very strange that there is no crash if OpenRuntime presents. Why it influences on this codes?

Link to comment
Share on other sites

Memory allocation and or usage. Loading OpenRuntime.efi may change, reallocate, move some memory. I'm pretty sure some memory is overwritten. If it's written on unused memory, it works...

 

What even more strange is no bug if you press enter!

 

To be sure it's not depending on qemu version, compiler version etc., the only solution is to prepare a self contained img that has the bug when you launch it with qemu-portable (the version I put inside QEMU folder) and send it to me. That way, I can execute the exact same code in the exact same emulator. Once I'm sure that my current computer config can create this bug, I can start to do different tests.

It's important to try with the same qemu as me because we may not have the same QEMU configuration (bios, vga bios, etc.). That can have an influence.

 

So I propose :

- make an img that creates the problem in your usual qemu, launched with your usual command.

- adapt the launch command from the launch script and try it with qemu portable from QEMU folder. If the crash is still there : send it to me, including the command you use.

 

NOTE to try Qemu portable : be in your Clover folder and type ./Qemu/launch [path to CloverX64.efi to copy into the img]

 

path to CloverX64.efi is usually something like ./Build/Clover/DEBUG_GCC53/X64/CloverX64.efi

Link to comment
Share on other sites

OK, I reproduce this one more time.

git checkout 0dc0a93045a9ff1490277cfc2738f4a1d0c637db

No matter, some commit between bad and repair.

Insert one line into sources: nvram.c

INTN
FindStartupDiskVolume (
  REFIT_MENU_SCREEN *MainMenu
  )
{
  INTN         Index;
//  LEGACY_ENTRY *LegacyEntry;
//  LOADER_ENTRY *LoaderEntry;
//  REFIT_VOLUME *Volume;
  REFIT_VOLUME *DiskVolume;
  BOOLEAN      IsPartitionVolume;
  XStringW     LoaderPath;
  XStringW     EfiBootVolumeStr;
//test crash by timeout
  return 0;
  //

This return 0 makes me boot by timeout for first entry.

./hebuild.sh -fr

cp /src/CloverBootloader/CloverPackage/CloverV2/EFI/CLOVER/CLOVERX64.efi Qemu/

./Qemu/launch Qemu/CLOVERX64.efi

And voila!

Снимок экрана 2021-03-17 в 23.30.54.png

 

CLOVERX64.efi

Link to comment
Share on other sites

I

10 hours ago, Slice said:

cp /src/CloverBootloader/CloverPackage/CloverV2/EFI/CLOVER/CLOVERX64.efi Qemu/

./Qemu/launch Qemu/CLOVERX64.efi

And voila!

 

Ok, making progress. I did that and I got the crash !

 

If I used the img you've sent me few post ago, I got stuck at :

 699118774_Screenshot2021-03-18at10_20_14.thumb.png.f0f8f48b9cfc915302b9dfd7c719532c.png

I also have 

ehci: PERIODIC list base register set while periodic schedule
      is enabled and HC is enabled
ehci: ASYNC list address register set while async schedule
      is enabled and HC is enabled

in the mac console.

 

Knowing which img you use is easy. The img committed in Qemu folder has this theme :

1252074348_Screenshot2021-03-18at10_23_47.thumb.png.c7df4f37f30f27cda0202de3fa517694.png

 

Could you have a quick check by copying the img you've sent me in Qemu folder? Rename it to "disk_image_gpt.img" or modify the launch command.

Link to comment
Share on other sites

In the evening only when I will be at home.

 

But if you have a result with your image then it is enough. I send you shorten image I don't remember exactly what inside.

Link to comment
Share on other sites

Sure, sure. I just wonder why this difference. No hurry especially now that I can work on the crash by using the committed img.

 

I think I found why I couldn't recreate this crash at first : I was compiling debug version. Seems to only happen in release version !

Link to comment
Share on other sites

I've added this in main.cpp, line 847

UINT32 a = GetDebugPrintErrorLevel ();
DBG("%d", a);

and the crash is also avoided. So even more clue that it's a stack or memory problem.

I've just added "--define=DEBUG_ON_SERIAL_PORT" to the release compilation and the crash is also avoided. So that's the main difference between DEBUG and RELEASE. I guess I have to debug without debug messages !

  • Like 2
Link to comment
Share on other sites

I have to leave for now but I saw that some lines below, after setting up open core structure, I try to start openruntime.efi image even if it’s not loaded.

i only had time to test once but the crash seems gone.

Will test more and finish that tomorrow.

Link to comment
Share on other sites

One more issue. debug.log is not created at this crash.

It is very-very bad. Old method for debug.log assumed that it must be created at any crash. Not now.

In the old method file opened or created and closed at every line. In the new method it will be open once and so not saved at incidence.

  • Like 1
Link to comment
Share on other sites

For decades, I told programmer : "check return status of everything you call, even in experimental code. If you are sure of the return status, still check it and panic (or exception) if it's not what you think".

The obvious reason is to quickly know where is the problem when things change. If you don't panic, crash happens later on. You waste a lot of time understand where and why.

Why also in experimental code ? Because pieces of experimental code always crawl into production code !

I killed a lot of programmers over few decades to not follow this ! :)

Look what I did when I was trying to integrate OC 

    EFI_HANDLE DriverHandle;
    Status = gBS->LoadImage(false, gImageHandle, FileDevicePath(self.getSelfLoadedImage().DeviceHandle, FileName), NULL, 0, &DriverHandle);
    DBG("Load OpenRuntime.efi : Status %s\n", efiStrError(Status));
    Status = gBS->StartImage(DriverHandle, 0, 0);
    DBG("Start OpenRuntime.efi : Status %s\n", efiStrError(Status));

Do you see it ?

I call StartImage even if LoadImage fails !!! So bad. At that time, I thought OpenRuntime.efi was mandatory. But that's no excuse. By not testing the result code (Status) of LoadImage, I just wasted 2 programmers a lot of hours. Arg, shame on me.

If I had put something like this :

    EFI_HANDLE DriverHandle;
    Status = gBS->LoadImage(false, gImageHandle, FileDevicePath(self.getSelfLoadedImage().DeviceHandle, FileName), NULL, 0, &DriverHandle);
    DBG("Load OpenRuntime.efi : Status %s\n", efiStrError(Status));
    if ( EFI_ERROR(Status) ) panic("Cannot load OpenRuntime.efi");
    Status = gBS->StartImage(DriverHandle, 0, 0);
    DBG("Start OpenRuntime.efi : Status %s\n", efiStrError(Status));

See the "if ( EFI_ERROR(Status) ) panic("Cannot load OpenRuntime.efi");" ?

The message would have been very clear when people try to not put OpenRuntime.efi. Bug would have been fixed in 5 minutes top.

Good example that my advice was good and I must have followed it ;).

Sorry everyone.

@Slice : and please say sorry for me in Russian on Applelife.ru

 

  • Like 3
Link to comment
Share on other sites

Ah and for the log : in the config.plist in the Qemu disk image, the key was Log instead of Debug. I fixed it and it's in the zip. Delete "disk_image_gpt.img.zip" and the launch script will re-extract the zip.

I don't understand why you said that I'm not closing the log, and why you re-import old method. At the end of the new "SaveMessageToDebugLogFile", there is "closeDebugLog();"

If there is a log creation problem, tell me how to reproduce and I'll fix it.

Link to comment
Share on other sites

4 hours ago, Jief_Machak said:

 

See the "if ( EFI_ERROR(Status) ) panic("Cannot load OpenRuntime.efi");" ?

 

 

Hmmm... Not agree. Legacy bios boot may live without OpenRuntime.efi so in this case the driver is not mandatory.

May be better

if ( !EFI_ERROR(Status) ) Status = gBS->StartImage(DriverHandle, 0, 0);
Link to comment
Share on other sites

Just now, Slice said:

Hmmm... Not agree. Legacy bios boot may live without OpenRuntime.efi so in this case the driver is not mandatory.

May be better


if ( !EFI_ERROR(Status) ) Status = gBS->StartImage(DriverHandle, 0, 0);

Yes, it's what I did now I know it's not mandatory.

I was explaining that, when I thought it was mandatory and always there, I still should have checked the return code. And because it was experimental at that time, if I had put a panic, we would have catch that bug very easy the first time someone tried to boot without OpenRuntime.efi, therefore avoiding painful process to reproduce and fix.

In short, when creating new code and focusing on one case (boot with OpenRuntime.efi), it's better to put some panic for non-handled cases instead of nothing.

Link to comment
Share on other sites

As I remember Pene can boot Z390 without OpenRuntime in UEFI mode.

1 minute ago, Jief_Machak said:

Yes, it's what I did now I know it's not mandatory.

I was explaining that, when I thought it was mandatory and always there, I still should have checked the return code. And because it was experimental at that time, if I had put a panic, we would have catch that bug very easy the first time someone tried to boot without OpenRuntime.efi, therefore avoiding painful process to reproduce and fix.

In short, when creating new code and focusing on one case (boot with OpenRuntime.efi), it's better to put some panic for non-handled cases instead of nothing.

As a method yes

Link to comment
Share on other sites

22 minutes ago, MifJpn said:

Hello, Slice, Jief.
Thank you for always doing your best for everyone.

 

I'm worried about Slice's rig malfunction.

 

For now, the latest commits can be compiled by me as well. Also, there is no problem with its movement.

From the place where * Dirty.efi was written in the middle of the compilation log, it seemed to me that the environment variables were adjusted again and the compilation was restarted.
I expect it to be an adjustment to avoid any problems.
Or is it due to enhancements?

 

I hope you can answer when you have time.

 

I hope the problem of qemu issue will be cured.

 

Thank you.

I didn't understand you.

1. Where is my rig malfunction?

2. "I expect it to be an adjustment to avoid any problems." - it is about what? What is Dirty.efi?

Link to comment
Share on other sites

I continue to refactor SETTINGS_DATA to separate functional layers.

For most of the settings it is just a rename, which cannot really introduce new bugs. But there is of course some more complicated ones.

So If you use the SSDT or CPU settings EnableC2, EnableC4, EnableC6, C3Latency, HWPEnable or HWPValue, I'll very much appreciate a test with the latest commit, or the efi supplied in this post.

Thanks.CLOVERX64.efi

 

Link to comment
Share on other sites

×
×
  • Create New...