Setting up the stack is easy. Setting the Program Counter to point to INJECT_ENTRY is also very easy. Unfortunately, while this is enough on PPC, is far from being sufficient on intel.
This is enough to call INJECT_ENTRY, but it crashes as soon as any extern function is called.
DYLD jump tables
As explained briefly in Jonathan Rentzch’s Mac Hack paper, an implementation block than needs access to functions defined elsewhere contains a vector table which entries are the actual external function addresses. This vector table is filled at bind time.
While this is always true on PPC, things are a bit different on intel. The stub called when accessing an external function directly is rewritten at bind time into a very simple JMP instruction.
For instance, a call to an external function:
external_func();
becomes an assembler call to a stub
call 0x305f <dyld_stub_external_func>
and dyld_stub_external_func is something like
JMP 0x0x098ff110
This instruction is written at bind time.
Unfortunately the JMP instruction is a relative jump (i.e. its argument is an offset from current instruction). Thus, as soon as we move the code, this jump points to nowhere interesting and generally leads to a crash.
As a consequence, when injecting a dyld image to the target task, one must manually offset the JMP instructions so that they point to the actual external functions. This means the dyld image must be copied, then modified, then injected in the thread process.
Fortunately, the position of these JMP instructions within a dyld image is easy to get with dyld API by accessing the ("__IMPORT", "__jump_table") section.
Once done, external functions get called correctly, unfortunately, most of them crash very soon, and even a very simple INJECT_ENTRY function crashes on the mandatory thread_suspend() call.
pthread structure
This one is a bit tricky to explain.
Many libc functions try to access some data associated to the current pthread (posix thread). They generally get this information by using a call named pthread_self() defined in libc.
This call returns a pointer to the data structure associated to the current pthread.
They generally behave well if this call returns NULL.
We would expect pthread_self() to return NULL in INJECT_ENTRY since we call it by creating a mach thread, and no pthread environment has been set for this thread.
Unfortunately on IA32, if called in INJECT_ENTRY without any prior work, it doesn’t return NULL, it crashes !
This is due to the way posix thread data structure is accessed on IA-32. On many processors, a dedicated register points to this data. But not on IA-32. On this architecture, a segment register is used instead.
Segmented access is some kind of indirection to access memory zones. For example, segment 0 can be used to access physical address 0x1000000 and beyond, segment 1 to access physical address 0xffc000, etc ...
Then, if %gs register is set to 1, the following instruction:
movl %gs:8, %ax
would put the content of physical address 0xffc000+8 into %ax
When a pthread is created, a memory zone is allocated to store the thread data structure.
The address of this memory zone is then passed to a function named pthread_set_self()
This function asks the kernel to setup segment number 0x37 so that it points to the newly allocated data structure. It then sets %gs to 0x37.
From now on, %gs:OFFSET is a direct access to the thread data structure. Since this structure contains a pointer to itself at offset 0x48, address %gs:0x48 contains the address of current thread data structure.
Guess what ? This is exactly what pthread_self() uses to get the pointer to current thread data structure.
The thing is, in our INJECT_ENTRY function, pthread_set_self() has never been called. The segment 0x37 has not been prepared, and any attempt to access it will certainly crash the process.
This is exactly what happens as soon as some function calls pthread_self(), and believe me, many do, including thread_suspend() (via mig_get_reply_port)
The solution to this problem is to allocate a fake thread data structure full of zeroes and then call pthread_set_self at the beginning of INJECT_ENTRY.
Since malloc() itself calls pthread_self, the memory region must be allocated before INJECT_ENTRY is called and given as an argument to it. I used the higher part of the stack, one might want to allocate a dedicated memory zone, but I did not want to change mach_inject too much.