instant idiot Posted December 26, 2012 Share Posted December 26, 2012 Should I test all of them, or just the latest one? Link to comment Share on other sites More sharing options...
Andy Vandijck Posted December 26, 2012 Share Posted December 26, 2012 How is the test coming around? We need to know more before we can proceed... Link to comment Share on other sites More sharing options...
Deltac0 Posted December 26, 2012 Share Posted December 26, 2012 Not sure... This is pretty damn weird... It runs bsd_init(),bsdinit_task() and even the whole kern_exec.c runs... But launchd doesn't want to run. New kernel @ attachements, it gives "Err: 0" just before the bsd_init() is done... And this is message added by me to execve(): int execve(proc_t p, struct execve_args *uap, int32_t *retval) { printf("execve()...\n"); struct __mac_execve_args muap; int err; muap.fname = uap->fname; muap.argp = uap->argp; muap.envp = uap->envp; muap.mac_p = USER_ADDR_NULL; err = __mac_execve(p, &muap, retval); printf("Err: %d\n", err); //<--- THERE return(err); } And _mac_execve returns 0 on success, but launchd doesn't run... lion-test-16 diff: http://www.solidfiles.com/d/9a29b63572/ lion-test-16.zip Link to comment Share on other sites More sharing options...
instant idiot Posted December 27, 2012 Share Posted December 27, 2012 lion-test-16 -v arch=x86-64 npci=0x3000 BSD root: disk4s2, major 14, minor 16 bsd_utaskbootsrtap() started Kernel is LP64 bsdinit_task() started Setting security token execve()... __mac_execve()... exec_activate_image() started execargs_alloc() started return (0) exec_save_path () started goto bad_notrans; - 1 goto bad_notrans; - 2 exec_check_permissions() started pal_kernel_announce() started goto bad; - 1 calling mountroot_post_hook calling mountroot_post_hook (again) bsd_init() done? goto bad; - 2 goto bad; - 3 in the for loop now... exec_mach_imgact() started in the for loop now... exec_fat_imgact() started goto bad; - 3 in the for loop now... exec_mach_imgact() started exec_add_user_string() started exec_apple_strings() started exec_add_user_string() started exec_add_user_string() started Setting security token goto again; - 1 bad:proc_transend(p, 0); bad_notrans: returning error check_for_signature() started skipping KERN_FAILURE proc_lock(p) proc_unlock(p) switch_protect Signed or not? CS_valid. Going to done. Well, we went to done. I think we're returning error... ? Err: 0 end of bsdinit_task()? It does, of course, just hang there. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 lion-test-16 -v arch=x86-64 npci=0x3000 BSD root: disk4s2, major 14, minor 16 bsd_utaskbootsrtap() started Kernel is LP64 bsdinit_task() started Setting security token execve()... __mac_execve()... exec_activate_image() started execargs_alloc() started return (0) exec_save_path () started goto bad_notrans; - 1 goto bad_notrans; - 2 exec_check_permissions() started pal_kernel_announce() started goto bad; - 1 calling mountroot_post_hook calling mountroot_post_hook (again) bsd_init() done? goto bad; - 2 goto bad; - 3 in the for loop now... exec_mach_imgact() started in the for loop now... exec_fat_imgact() started goto bad; - 3 in the for loop now... exec_mach_imgact() started exec_add_user_string() started exec_apple_strings() started exec_add_user_string() started exec_add_user_string() started Setting security token goto again; - 1 bad:proc_transend(p, 0); bad_notrans: returning error check_for_signature() started skipping KERN_FAILURE proc_lock(p) proc_unlock(p) switch_protect Signed or not? CS_valid. Going to done. Well, we went to done. I think we're returning error... ? Err: 0 end of bsdinit_task()? It does, of course, just hang there. Yep, no idea why it just can't run launchd... load_init_program() returns no error etc... Link to comment Share on other sites More sharing options...
Andy Vandijck Posted December 27, 2012 Share Posted December 27, 2012 Like I said: I don't think it is the kernel... It's weird... Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 Hi, Andy! I'm also convinced it's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land. The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us. Thank you for the time you're investing on it. Hey, Delta! Your debug version is very promising. I think it can be made even more accurate. Say, you done something like this: { printf("exec_add_user_string() started\n"); int error = 0; I think it's cool to know when each function starts, but it would be even better if we know which value they return or which task they actually do, or which results from each statement, something like that: return ERROR; printf("the xxxxxx function returned the value \n", ERROR); } By the way, lots of good info already from the debug version you already made. Notice this: goto bad_notrans; - 1 goto bad_notrans; - 2 exec_check_permissions() started pal_kernel_announce() started goto bad; - 1 calling mountroot_post_hook calling mountroot_post_hook (again) bsd_init() done? goto bad; - 2 goto bad; - 3 in the for loop now... exec_mach_imgact() started in the for loop now... exec_fat_imgact() started goto bad; - 3 in the for loop now... exec_mach_imgact() started exec_add_user_string() started exec_apple_strings() started exec_add_user_string() started exec_add_user_string() started Setting security token goto again; - 1 bad:proc_transend(p, 0); bad_notrans: returning error check_for_signature() started skipping KERN_FAILURE proc_lock(p) proc_unlock(p) switch_protect Err: 0 end of bsdinit_task()? I would do a version myself with the suggestions i made, but my Xcode stopped working on a sudden, so i'll have to reinstall everything here. Thank you all guys for your effort! Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Hi, people! It's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land. The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us. Yes, that is our problem. However, I think I've investigated the whole kern_exec.c, and it looks like it runs just fine. I'll take look @ mach_loader.c later. Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can. I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows? Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can. I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows? Thanks for the great idea! I'll add return values and remove some unnecessary info... Will post another diff & kernel soon! EDIT: And btw, for example: goto bad_notrans; - 1 means we got PAST goto bad_notrans; (first of them) I should have made the messages a bit more clear... EDIT2: lion-test-21 compiled: http://www.solidfile...m/d/fcf9be63ed/ Diff coming soon... Diff: http://www.solidfiles.com/d/ce042eba5a/ Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 means we got PAST goto bad_notrans; (first of them) Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later. About the newest debug kernel, i'm going to test it now. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later. About the newest debug kernel, i'm going to test it now. Ahh, now I get it... xD It goes to bad and bad_notrans at some point... Needs more debugging. Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed. Yea, it's bad to just skip them... Like we tried with the EACCES error... However, I can't even get that far anymore... EDIT: Too bad that "bad" doesn't have any arguments, the code just skips to it somewhere... I added some messages to find out the exact point like this: if (--iterlimit == 0) { printf("Going to bad (4)\n"); error = EBADEXEC; goto bad; } lion-test-22: http://www.solidfiles.com/d/e5a4e695a6/ Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 Even more important would be knowing if and where else the outputs of the "bad" functions are used. Are the "bad" functions being called somewhere else? Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Even more important would be knowing if and where else the outputs of the "bad" functions are used. Are the "bad" functions being called somewhere else? I think the "bad" functions are just like functions inside another function. Like if the "main" function does something wrong -> the code skips to "bad" part of the function. The "bad" function I'm trying to figure out is located in kern_exec.c -> load_init_program() (the function that calls launchd). I added those messages to all (gotta do a double check) "goto bad;" parts, but still it goes to bad, without giving me any of those "going to bad (x)" messages, so it must be called from outside? This is damn weird... EDIT: I'm sorry, I meant the exec_activate_image() function... Not load_init_program(). EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it. EDIT3: I gotta go now, I'll be back in few hours. Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 Thank you, Delta! Andy, any ideas how much relevant this bad function could be? I'm looking at the source and found it nowhere but in kernel_exec.c. Perhaps the search tool here is malfunctioning...? EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it. Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes. Exactly what I was thinking. It must be done this way... If we just could build verbose launchd? Or something to see if the code even tries to run it? Okay, new kernel. This one has more specific debug messages about those "bad" functions like this: bad: printf("We are in bad of exec_mach_imgact()\n"); return(error); } lion-test-23: http://www.solidfile...m/d/9d673a70ae/ EDIT: How is this possible? The kernel seems to execute most (if not all) of the "bad" functions... Still needs some more work. EDIT2: Meklort shared his wisdom in IRC... Bad functions will be executed. The problem is somewhere else... Or something. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 We have some kind of progress, maybe... If you're running AMD SL / Lion (or secretly even ML): 1. Download this: http://www.solidfiles.com/d/428fa4efbc/ 2. sudo su in terminal 3. chmod +x tiny 4. ./tiny 5. Post here what happened. Link to comment Share on other sites More sharing options...
Andy Vandijck Posted December 27, 2012 Share Posted December 27, 2012 We have some kind of progress, maybe... If you're running AMD SL / Lion (or secretly even ML): 1. Download this: http://www.solidfiles.com/d/428fa4efbc/ 2. sudo su in terminal 3. chmod +x tiny 4. ./tiny 5. Post here what happened. What does this do? Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 Hi, Andy! It creates a mach-o static executable (that is, does not use dyld). We intend to replace launchd with it, to see what's the effect. This binary executable must be also able to run on an AMD machine, otherwise the experiment is DOA. Best regards. Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 What does this do? It's just an ultra-small Mach-O executable: http://osxbook.com/blog/2009/03/15/crafting-a-tiny-mach-o-executable/ nicertiny.asm. Meklort told us to test if kernel starts launchd with that. It doesn't need dyld, so it eliminates it out... but I get illegal instruction when running the nicertiny on my AMD... Changed /sbin/launchd to /tiny on the source, put tiny on root of the HDD and boot. I got panic! Link to comment Share on other sites More sharing options...
Andy Vandijck Posted December 27, 2012 Share Posted December 27, 2012 Good plan... then we can see if it is dyld Link to comment Share on other sites More sharing options...
Deltac0 Posted December 27, 2012 Share Posted December 27, 2012 Good plan... then we can see if it is dyld but we both get "Illegal instruction" when trying to run the nicertiny... I tried to boot with it, panic... Most likely somehow related to the illegal instruction when ran from terminal. But now we know that the kernel DOES start the launchd. Probably something about dyld. 1 Link to comment Share on other sites More sharing options...
theconnactic Posted December 27, 2012 Share Posted December 27, 2012 64-bit kernel, Delta? Maybe it's just the dyld indeed... that would be good news. Link to comment Share on other sites More sharing options...
Recommended Posts