Up vote 5 down vote favorite share g+ share fb share tw.
I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter). Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language.
Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory.
Do an unconditional jump to the first instruction. Voila! So, with that in mind, I have the following small C program: #include #include #include int main() { int *m = malloc(sizeof(int)); *m = 0x90; // NOP instruction code asm("jmp *%0" : /* outputs: */ /* none */ : /* inputs: */ "d" (m) : /* clobbers: */ "eax"); return 42; } Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).
Question: Am I on the right path? Question: Could you show me a modified program that manages to find its way back to somewhere inside main? Question: Other issues I should beware of?
PS: My goal is to gain understanding, not necessarily do everything the right way. Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box: #include #include #include #include unsigned char *m; int main() { unsigned int pagesize = getpagesize(); printf("pagesize: %u\n", pagesize); m = malloc(1023+pagesize+1); if(m==NULL) return(1); printf("%p\n", m); m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1)); printf("%p\n", m); if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) { printf("mprotect fail...\n"); return 0; } m0 = 0xc9; //leave m1 = 0xc3; //ret m2 = 0x90; //nop printf("%p\n", m); asm("jmp *%0" : /* outputs: */ /* none */ : /* inputs: */ "d" (m) : /* clobbers: */ "ebx"); return 21; } c assembly x86 link|improve this question edited Feb 8 at 9:42 asked Jan 26 at 8:36Magnus Madsen1004.
Another option is to just interpret the instructions or intermediate code w/o executing anything directly. – Alex Jan 26 at 9:37 @Alex: that's another option to implement a language, but by definition it's not a JIT. – Steve Jessop Jan 26 at 10:06.
I would say yes. Question: Could you show me a modified program that manages to find its way back to somewhere inside main? I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.
Question: Other issues I should beware of? Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.
On Linux this is done using mprotect() with PROT_EXEC.
1 In addition, the instruction cache generally does not monitor the underlying memory, so an explicit cache flush may be required before executing the jump. – Simon Richter Jan 26 at 8:57 @Simon: agreed, and in general that's "after you've finished writing the instructions to memory, but before executing it". So in my experience you write the code to do it just after you've finished writing, rather than just before you execute.
In this example code those are the same place, but in practice you might execute more times than you write. And it's important that as Simon remarks, flush the instruction cache rather than the data cache. – Steve Jessop Jan 26 at 10:01 You don't need to flush the I cache on x86, SMC only doesn't work if it's extremely close to eip.
Moderately close SMC imposes huge penalties. – harold Jan 26 at 13:08 There is an SO question somewhere about self modifying code and I remember figuring out you have to mmap on aligned boundaries in order to access that memory. Anyone know where that ticket is, that should show you how to generate instructions in memory and then execute them.
– dwelch Jan 26 at 15:07 here it is stackoverflow.com/questions/4812869/… – dwelch Jan 26 at 15:08.
If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way: typedef void (*generated_function)(void); void *func = malloc(1024); unsigned char *o = (unsigned char *)func; generated_function *func_exec = (generated_function *)func; *o++ = 0x90; // NOP *o++ = 0xcb; // RET func_exec().
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.