Jump to content

Lion kernel testing on AMD (don't ask help here: use the Help Topic)


ham4ever
 Share

613 posts in this topic

Recommended Posts

Not sure... This is pretty damn weird... It runs bsd_init(),bsdinit_task() and even the whole kern_exec.c runs... But launchd doesn't want to run.

 

New kernel @ attachements, it gives "Err: 0" just before the bsd_init() is done... And this is message added by me to execve():

 

 

int
execve(proc_t p, struct execve_args *uap, int32_t *retval)
{
printf("execve()...\n");
struct __mac_execve_args muap;
int err;
muap.fname = uap->fname;
muap.argp = uap->argp;
muap.envp = uap->envp;
muap.mac_p = USER_ADDR_NULL;
err = __mac_execve(p, &muap, retval);
printf("Err: %d\n", err); //<--- THERE
return(err);
}

 

And _mac_execve returns 0 on success, but launchd doesn't run... :D

 

lion-test-16 diff: http://www.solidfiles.com/d/9a29b63572/

lion-test-16.zip

Link to comment
Share on other sites

lion-test-16 -v arch=x86-64 npci=0x3000

BSD root: disk4s2, major 14, minor 16
bsd_utaskbootsrtap() started
Kernel is LP64
bsdinit_task() started
Setting security token
execve()...
__mac_execve()...
exec_activate_image() started
execargs_alloc() started
return (0)
exec_save_path () started
goto bad_notrans; - 1
goto bad_notrans; - 2
exec_check_permissions() started
pal_kernel_announce() started
goto bad; - 1
calling mountroot_post_hook
calling mountroot_post_hook (again)
bsd_init() done?
goto bad; - 2
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
in the for loop now...
exec_fat_imgact() started
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
exec_add_user_string() started
exec_apple_strings() started
exec_add_user_string() started
exec_add_user_string() started
Setting security token
goto again; - 1
bad:proc_transend(p, 0);
bad_notrans: returning error
check_for_signature() started
skipping KERN_FAILURE
proc_lock(p)
proc_unlock(p)
switch_protect
Signed or not?
CS_valid. Going to done.
Well, we went to done.
I think we're returning error... ?
Err: 0
end of bsdinit_task()?

It does, of course, just hang there.

Link to comment
Share on other sites

lion-test-16 -v arch=x86-64 npci=0x3000

BSD root: disk4s2, major 14, minor 16
bsd_utaskbootsrtap() started
Kernel is LP64
bsdinit_task() started
Setting security token
execve()...
__mac_execve()...
exec_activate_image() started
execargs_alloc() started
return (0)
exec_save_path () started
goto bad_notrans; - 1
goto bad_notrans; - 2
exec_check_permissions() started
pal_kernel_announce() started
goto bad; - 1
calling mountroot_post_hook
calling mountroot_post_hook (again)
bsd_init() done?
goto bad; - 2
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
in the for loop now...
exec_fat_imgact() started
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
exec_add_user_string() started
exec_apple_strings() started
exec_add_user_string() started
exec_add_user_string() started
Setting security token
goto again; - 1
bad:proc_transend(p, 0);
bad_notrans: returning error
check_for_signature() started
skipping KERN_FAILURE
proc_lock(p)
proc_unlock(p)
switch_protect
Signed or not?
CS_valid. Going to done.
Well, we went to done.
I think we're returning error... ?
Err: 0
end of bsdinit_task()?

It does, of course, just hang there.

 

Yep, no idea why it just can't run launchd... load_init_program() returns no error etc...

Link to comment
Share on other sites

Hi, Andy!

 

I'm also convinced it's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land.

 

The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.

 

Thank you for the time you're investing on it.

 

Hey, Delta! Your debug version is very promising. I think it can be made even more accurate. Say, you done something like this:

 

{

printf("exec_add_user_string() started\n");

int error = 0;

 

I think it's cool to know when each function starts, but it would be even better if we know which value they return or which task they actually do, or which results from each statement, something like that:

 

return ERROR;

printf("the xxxxxx function returned the value \n", ERROR);

}

 

By the way, lots of good info already from the debug version you already made. Notice this:

 

 

goto bad_notrans; - 1

goto bad_notrans; - 2

exec_check_permissions() started

pal_kernel_announce() started

goto bad; - 1

calling mountroot_post_hook

calling mountroot_post_hook (again)

bsd_init() done?

goto bad; - 2

goto bad; - 3

in the for loop now...

exec_mach_imgact() started

in the for loop now...

exec_fat_imgact() started

goto bad; - 3

in the for loop now...

exec_mach_imgact() started

exec_add_user_string() started

exec_apple_strings() started

exec_add_user_string() started

exec_add_user_string() started

Setting security token

goto again; - 1

bad:proc_transend(p, 0);

bad_notrans: returning error

check_for_signature() started

skipping KERN_FAILURE

proc_lock(p)

proc_unlock(p)

switch_protect

Err: 0

end of bsdinit_task()?

 

I would do a version myself with the suggestions i made, but my Xcode stopped working on a sudden, so i'll have to reinstall everything here.

 

Thank you all guys for your effort!

Link to comment
Share on other sites

Hi, people!

 

It's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land. The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.

 

Yes, that is our problem. However, I think I've investigated the whole kern_exec.c, and it looks like it runs just fine.

I'll take look @ mach_loader.c later. :)

Link to comment
Share on other sites

Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can.

 

I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?

Link to comment
Share on other sites

Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can.

 

I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?

 

Thanks for the great idea! I'll add return values and remove some unnecessary info... :)

Will post another diff & kernel soon! :)

 

EDIT: And btw, for example:

goto bad_notrans; - 1

 

means we got PAST goto bad_notrans; (first of them) :D

I should have made the messages a bit more clear... :D

 

EDIT2: lion-test-21 compiled: http://www.solidfile...m/d/fcf9be63ed/

Diff coming soon... :)

 

Diff: http://www.solidfiles.com/d/ce042eba5a/

Link to comment
Share on other sites

means we got PAST goto bad_notrans; (first of them) :D

 

Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later.

 

About the newest debug kernel, i'm going to test it now.

Link to comment
Share on other sites

Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later.

 

About the newest debug kernel, i'm going to test it now.

 

Ahh, now I get it... xD

It goes to bad and bad_notrans at some point... Needs more debugging. :D

Link to comment
Share on other sites

P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed.

Link to comment
Share on other sites

P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed.

 

Yea, it's bad to just skip them... Like we tried with the EACCES error... However, I can't even get that far anymore... :D

 

 

EDIT: Too bad that "bad" doesn't have any arguments, the code just skips to it somewhere... I added some messages to find out the exact point like this:

if (--iterlimit == 0) {
printf("Going to bad (4)\n");
error = EBADEXEC;
goto bad;
}

 

 

lion-test-22: http://www.solidfiles.com/d/e5a4e695a6/

Link to comment
Share on other sites

Even more important would be knowing if and where else the outputs of the "bad" functions are used. Are the "bad" functions being called somewhere else?

 

I think the "bad" functions are just like functions inside another function. Like if the "main" function does something wrong -> the code skips to "bad" part of the function.

The "bad" function I'm trying to figure out is located in kern_exec.c -> load_init_program() (the function that calls launchd).

I added those messages to all (gotta do a double check) "goto bad;" parts, but still it goes to bad, without giving me any of those "going to bad (x)" messages, so it must be called from outside?

This is damn weird... :D

 

EDIT: I'm sorry, I meant the exec_activate_image() function... :D Not load_init_program().

 

EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it.

 

EDIT3: I gotta go now, I'll be back in few hours. :D

Link to comment
Share on other sites

Thank you, Delta!

 

Andy, any ideas how much relevant this bad function could be? I'm looking at the source and found it nowhere but in kernel_exec.c. Perhaps the search tool here is malfunctioning...?

 

EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it.

 

Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes.

Link to comment
Share on other sites

Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes.

 

Exactly what I was thinking. It must be done this way...

 

If we just could build verbose launchd? Or something to see if the code even tries to run it?

 

Okay, new kernel. This one has more specific debug messages about those "bad" functions like this:

bad:
printf("We are in bad of exec_mach_imgact()\n");
return(error);
}

 

lion-test-23: http://www.solidfile...m/d/9d673a70ae/

 

 

 

EDIT: How is this possible? The kernel seems to execute most (if not all) of the "bad" functions... Still needs some more work.

 

 

EDIT2: Meklort shared his wisdom in IRC... Bad functions will be executed. The problem is somewhere else... Or something.

Link to comment
Share on other sites

We have some kind of progress, maybe...

If you're running AMD SL / Lion (or secretly even ML):

 

1. Download this: http://www.solidfiles.com/d/428fa4efbc/

2. sudo su in terminal

3. chmod +x tiny

4. ./tiny

5. Post here what happened.

What does this do?

Link to comment
Share on other sites

What does this do?

 

It's just an ultra-small Mach-O executable:

http://osxbook.com/blog/2009/03/15/crafting-a-tiny-mach-o-executable/

 

nicertiny.asm.

Meklort told us to test if kernel starts launchd with that. It doesn't need dyld, so it eliminates it out... :)

but I get illegal instruction when running the nicertiny on my AMD...

 

Changed /sbin/launchd to /tiny on the source, put tiny on root of the HDD and boot. I got panic! :)

Link to comment
Share on other sites

Good plan... then we can see if it is dyld :)

 

but we both get "Illegal instruction" when trying to run the nicertiny...

I tried to boot with it, panic... Most likely somehow related to the illegal instruction when ran from terminal.

But now we know that the kernel DOES start the launchd. :)

Probably something about dyld.

  • Like 1
Link to comment
Share on other sites

 Share

×
×
  • Create New...