Qhimm.com Forums
Miscellaneous Forums => Scripting and Reverse Engineering => Topic started by: sfx1999 on 2005-08-24 03:47:59
-
What exactly is the difference between JIT emulation as apposed to a regular interpreter? I plan on writing an emulator some day, so it would help me a lot if I knew this.
-
Shortly a "Just in Time"-Emulator translates the opcodes in real time whereas other approaches does this work before the actual execution.
-
More precisely, an interpreter simply chews, processes and executes opcode by opcode. It does not produce executable code as output (hence there is no caching), it simply does the appropriate action for each opcode by itself. A compiler (more precisely dynamic recompiler), on the other hand, translates the entire source program into host executable code and then runs this as a normal program on the host. Just-in-time compilation is sort of an optimized recompiler, in that it only recompiles code as needed. When the program jumps into a block of unevaluated code/data, the JIT compiler kicks in and translates the block/function/line/whatever-unit-size into host code, which is then executed. Thus, for large programs only the code that's actually used is translated/recompiled. So in closing, JIT is somewhere between recompiling and interpreting.
-
Well, I already found that information. The only problem I can't find a way past is getting C to execute code it just generated. The only thing I can think of is an array of function pointers that have assembly stored in them. I don't even know if that would work.
Then there is self modifying code. That has to be difficult.
-
Just a thought, but you might not need to use function pointers to build your emulator. Since you'll probably be dealing with the interpretation of opcodes, you can have a setup similar to the following:
char* memory = malloc( SOME_LARGE_AMOUNT );
int mem_pointer = 0;
while(true){
switch(memory[mem_pointer++]){ // Pick a function based upon the opcode in this part of memory
case 0: // NOP
break;
case 1: // Immediate Mode ADD A to B and store into address C
char a = memory[mem_pointer++];
char b = memory[mem_pointer++];
memory[mem_pointer] = a + b;
break;
// et cetra
default: printf("Unknown Opcode encountered!"); exit(-1);
}
}
As you can see, this allows for C to execute particular code based upon the contents of something loaded into memory. In this manner, it's also possible to write self-modifying code, by selectively altering the contents of the memory array.
If you're in a situation that requires you to create new functions on the fly, I suggest looking into Lisp. It's got great functionality for creating and passing around functions at runtime.
-
It is a lot faster to make a bunch of pointers to functions and use the opcodes as a look up table then to use switch/case. That's not what I was talking about, though.
-
It is a lot faster to make a bunch of pointers to functions and use the opcodes as a look up table then to use switch/case.
Are you sure? At one point I looked at the assembly generated by GCC for a 6502 emulator and it actually created a jump table for a big switch statement.
If you want to compile stuff, for example for a JIT compiler, you'll need to implement the instructions in assembler as position independant code. Then you can copy them into a new block of memory and patch immediate values and jump instructions. Of course, you can make that as complicated as you want, for example with an immediate code that you can optimise.
For old 8 and 16 bit cpus I wouldn't bother with a compiler, though.
-
There is an emulator called Generator that uses a JIT. Anyway, it would be better if I started with something simpler as opposed to trying to write a PS2 emulator first.
I would like to start with NES, though.
Micky, when you did that, did you use any optimazation switches?
-
Micky, when you did that, did you use any optimazation switches?
That was about 7 years ago, when the first emulators came up. Don't ask me which version of gcc, or which switches exactly. I'd guess it was simply -O2.
-
sfx1999:
suggestion for you and making an NES emulator :D
Emutalk/Programing/NES Thread (http://www.emutalk.net/programming/25700-nes.html)
Emutalk is a decent place, but it has some different rules than here. However that thread is relatively active. I would say about 30-40 emulation authors frequent the place.
As for JIT .. there is no point for JIT with a 6502. A modern processor.. can interpret the instructions significantly faster than the original processor could ever run them and have plenty of time left over for graphical and audio update.
Cyb
-
It's a learning experience. Anyway, it could help on PDAs and stuff.
-
It's a learning experience. Anyway, it could help on PDAs and stuff.
That's a possibility
interesting note PalmSource is moving too Linux. :o
The reason why is many Cell systems in China and India etc are going that route. Since they are 2 billion plus consumers guess what?
However on that sort of platform you'll need to do some mindful work.
So JIT might be helpful most importantly is optimization for emulating the graphics system.
Cyb
-
If you are working with such a simple instruction set as is in the Nintendo Entertainment System, it’s probably best to aggregate the list of instructions.
Nintendo Entertainment System instruction set has 151 opcodes, which, although tediously, can be fit into an aggregate array where each index into the array represents the relative instruction.
Each entry into the array should contain the size of the code, a bitmask for decoding, possibly text in case you want to print the code (extra feature? Learning experience? Help with debugging?), and a pointer to a function that interprets the code.
Using a switch case is not necessarily as slow as you would think because the compiler will generate a jump table where each four bytes is a DWORD indicating the location where to go to get the appropriate case.
Then the ASM to actually handle the case would be similar to “jmp DWORD PTR: [ESI+4*EAX]†where ESI already contains the location of the jump table and EAX contains the case number.
It will go to the jump table, go to the correct index in the jump table, and that index will contain a DWORD pointer to the code that has the case where it is jumping. All done in one instruction.
But it is not recommended; it is even more tedious to construct and you get less out of it. Every action you want to perform based on the instruction will have to be manually coded into each switch case.
If you use an aggregate list where the opcode is the actual index into the list, when you read the opcode you instantly know where to go to get all the information regarding the opcode.
L. Spiro
-
Now if someone could tell me how to dynamically create code into memory and call it using C, I would really appreciate it.
L. Spiro, I might just use that switch/case if that is how it is done. I can use inline functions to make it a little more readable. Anyway, does that only happen when you turn on optimization?
-
Now if someone could tell me how to dynamically create code into memory and call it using C, I would really appreciate it.
You'll need to know assembler. You can copy around functions written in C in memory under certain conditions (position-independent code), but I wouldn't depend on it...
1) allocate memory
2) write your machine instructions into memory. Don't forget a return
3) cast a pointer to your memory into a function pointer
4) call the function pointer
For a function without parameters this should work out of the box, if you need parameters you'll have to look up the ABI of the CPU and operating system you're using. Some parameters may be in registers, others are passed on the stack. And you may have to clean up the stack before leaving your generated code.
I don't have a windows box around, so this is from memory:
- don't put code into memory allocated form the stack, some operating systems allocate no-execute pages for that
- memory returned by malloc should be OK. If it isn't you'll have to use VirtualAlloc yourself and make sure you request executable pages
-
I am 92% sure that the switch cases will be optimized that way regardless of compiler optimizations.
Once you have 3 or more cases, they get this optimization.
Be sure to look into the __fastcall keyword, and also you can find the source code for Project64 via Google.
L. Spiro
-
Actually, you don't need to pass parameters at all. I found a tutorial. When you generate the code, you pass pointers to the generated code. It's somewhat simple.
-
Where is the tutorial?
L. Spiro
-
http://www.zenogais.net/projects/tutorials/Dynamic%20Recompiler.html
-
That link seems to have been short-lived; it is already down.
Either that or it likes to periodically go down and up.
I will keep checking on it periodically, but you may want to upload to mirex’s bin of trash if you like, and if it is fairly small.
L. Spiro
-
It works for me. Are you sure that it's not working?
-
I got it now.
After looking at it, I have to question if it is really what you want to be using.
That tutorial tells you how to recompile the entire target code all at once and then just execute it.
You said you want to compile on-the-fly.
Well, the tutorial can help a little with this, particularly with emitting x86 code, but you are going to have to do heavy modifications to their base concept.
Essentially the problem is this: They use a single thread for everything. They read the target code and emit x86 code. Of course while this process is happening, the registers in their main thread are changing.
Well, for them, this is no problem, because the last step will be to execute the written code all at once.
Since it is all done at once, the code will be able to set registers and retrieve them without fear of them being modified.
Now here is the problem with trying to do this instruction-by-instruction.
Your method would be to write code, execute code, write code, execute code, etc.
When you write code, your thread’s registers will change.
When you execute the code, the code might set a register to a specific value.
When you write the next set of code, that register may be overwritten because it may be needed for the writing of the code, so when the next set of code is executed, the register it wants is basically some random value, rather than what it had set previously.
You would need a separate thread for the execution of the code, and certainly you will need a way to make sure that thread can wait for new code to be written while maintaining the registers used by the emitted code.
If you use naked functions, this isn’t so difficult.
One solution off the top of my head would be to PUSH each register, then check a flag that determines if there is new code to be executed.
If there is, POP each register back into place and go directly to the new code, making sure not to change any registers on the way, and use a CALL to get to the new code.
The new code would end in a RETN, putting your thread back into its loop where it immediately PUSH’s all its registers again to keep them safe.
You could improve speed by using CMP directly on the value you are checking (to determine if new code has been written), so that nothing is loaded into registers. This would allow you to avoid all the PUSH’ing and POP’ing.
But then there is the problem of other thread flags being modified, such as EFlags.
By using CMP or TEST, you will modify flags in your thread that the emitted code may want to use.
You would have to work around this as well.
There is the option of storing fake copies of registers and thread flags and using them instead of using the real thread’s registers and flags, but you would be interpreting code rather than recompiling it at run-time.
Since you aren’t compiling it all at once and executing it, your challenge is to get your emitted-code thread to wait for new code to be written without changing its flags/registers.
L. Spiro
-
I am not sure what you mean exactly. Aren't the registers stored when a function is called? As for making sure that the emulated registers aren't destroyed, you put them in a global variable.
-
Aren't the registers stored when a function is called?
When necessary, the ESP and EBP registers are stored on the stack. The rest can change (functions often return the values through registers).
-
To sfx1999.
There is no need to store them in globals; it is slower than simply pushing/popping them, and more cumbersome.
If you store to globals, you have to use each global name (don't use array indexes or speed will be compromised) individually; a specific global set aside for a specific register.
Then you would have to code the assembly:
mov g_rEAX, EAX
mov g_rECX, ECX
mov g_rEDX, EDX
etc.
Then before you execute your translated code, you have to put the registers back:
mov EAX, g_rEAX
mov ECX, g_rECX
mov EDX, g_rEDX
etc.
It’s easier to just do:
PUSH EAX
PUSH ECX
PUSH EDX
…
POP EDX
POP ECX
POP EAX
This is all in regards to storing them to ensure they don’t get overwritten while you are waiting for new code to process.
To answer your question…
Aren't the registers stored when a function is called?
Generally EBP (the function’s local stack pointer) and ESP (the stack pointer) are stored at the start of a function, changed in some way, then restored at the end of the function, back to whatever they were originally.
Meanwhile, other registers will be changing, and those changes last even after the function returns.
Especially functions that return values, in which case EAX will contain the return value (and possibly EDX).
Well, normally you wouldn’t need to worry about this, since it should (in theory) only be the code you are emulating that will change registers, and of course whatever changes it makes, you don’t want to get in the way.
The problem is actually trying to stay out of the way.
You have one thread that sits in a loop and waits for new code to be written.
It checks a simple flag to determine if there is new code.
When the flag is changed, it goes immediately to the location where the new code was written, executes it, and comes back to the loop.
The new code may have stored a value in EAX that is intended for the next instruction (which has yet to be parsed/executed).
There is no second copy of EAX unless you specifically make one (with a global or by PUSH’ing).
So you go back to your loop.
It checks the global. Maybe the code looks like this:
MOV EAX, g_bCodeWritten
TEST EAX, EAX
Well now your loop has changed the same EAX that the emitted code is supposed to be using.
This is why you will definitely be required to write your loop in ASM yourself, so you know exactly what it is changing.
As I mentioned before, you don’t actually have to store backups of the registers at all, if you just write an ASM loop that uses “CMP g_bCodeWritten, 0†directly.
This will avoid changing any registers, so you don’t need to have globals or PUSH/POP.
However, it will change the EFlags, and this will cause a problem.
Registers are not actually the problem.
The thread flags used for conditional jumps now become the problem.
If you don’t understand why, read up on how CMP works in conjunction with JE, JNZ, or whatever conditional jump you like.
Luckily, there is a method for you to get around this problem.
Basically you’re going to have your own EIP pointer which keeps track of where you are in the code, so you know what instruction to translate next.
You load the instruction, translate it, write it in x86, set a flag, your second thread executes it, goes back to its loop, and you continue.
Well, you’re always saving the code to a specific location.
That means you actually can’t write CALL or JMP instructions (or conditional jumps).
Actually, these instructions would be parsed directly, and rather than translating them and executing them, you make the call or jump with your original EIP pointer, then continue from there.
To do this correctly, your original interpreter (with the EIP pointer) will be required to correctly interpret TEST and CMP instructions, and any others that modify EFlags. It will keep its own EFlags value for storing these results (I mean it will have a DWORD m_dwEFlags member; I am not suggesting it uses its real EFlags value in its thread context).
Then when it encounters a JNZ or whatever, it will check its m_dwEFlags to determine if the jump is taken.
If it is, it will need to correctly go to the location where it jumps and continue from there.
Remember that all of this is done without the steps of translating/executing the code.
This part is all interpreted.
I know this seems confusing.
You may want to really read up on ASM and thread context members.
You are going to need to code in ASM and set up a working system (partially in ASM) that allows two threads to communicate frequently and stay synchronized perfectly.
Your EIP thread can’t extract an instruction and write it to be executed while the executing thread is still executing the last bit if code it was passed.
You are also going to have to set up your own storage method to hold the values that the emulated code will be using.
The emulated code may try to access 0x003800EC, which quite likely won’t be a valid address in your program.
So you are going to create a method of mapping memory back and forth from the real location in your program to the respective location in the emulated code.
This part SHOULD be nothing more than a simple offset. Depends on what you are emulating.
L. Spiro
-
Aren't the registers stored when a function is called?
When necessary, the ESP and EBP registers are stored on the stack. The rest can change (functions often return the values through registers).
Aren't the registers stored when a function is called?
Generally EPB (the function’s local stack pointer) and ESP (the stack pointer) are stored at the start of a function, changed in some way, then restored at the end of the function, back to whatever they were originally.
Meanwhile, other registers will be changing, and those changes last even after the function returns.
Especially functions that return values, in which case EAX will contain the return value (and possibly EDX).
deja vu? :o
Also, I was quite inaccurate when talking about these registers. Only EBP (or EPB as L. Spiro names it :P - it's a Base Pointer btw.) is stored (on the stack). ESP is restored from the EBP when the call is completed.
However, it will change the EFlags, and this will cause a problem.
Registers are not actually the problem.
The thread flags used for conditional jumps now become the problem.
If you don’t understand why, read up on how CMP works in conjunction with JE, JNZ, or whatever conditional jump you like.
If the CMP is used then conditional jump usually directly follows that opcode (not always, but since the instructions between those two CAN'T change EFlags, I don't see any problem). When using two threads, it's OS job not to mess up two thread's registers/flags.
-
If the CMP is used then conditional jump usually directly follows that opcode (not always, but since the instructions between those two CAN'T change EFlags, I don't see any problem). When using two threads, it's OS job not to mess up two thread's registers/flags.
Normally it wouldn’t be a problem.
If he followed that tutorial page, he would end up recompiling the whole thing and executing it all at once.
But this isn’t what he wants to do.
To emulate with Just-in-Time emulation, he executes instruction-by-instruction.
Although he can work ahead and execute several instructions at once, he would still run the risk of resetting the EFlags flag while in the waiting loop.
The risk I am trying to explain is that when he checks his flag while in the waiting loop (via CMP), that thread’s EFlags is going to change, and unfortunately that is the same thread that is executing the emitted code.
Emitted code performs its own CMP instruction, setting thread’s EFlags a certain way.
Thread goes back to its loop to wait for the next instruction it will execute.
While in the loop, it uses CMP to check a flag that determines if there is new code to execute.
This is all on the same thread, which means when it checked to determine if there is new code, it just overwrote the flags set by the emitted code.
Generally speaking, the conditional jump will immediately or almost immediately follow a CMP, but don’t let that fool you into thinking you can get around this problem 100% safely by simply executing sequentially all the instructions after the CMP up to the first conditional jump.
It IS possible that multiple conditional jumps use the same CMP instruction to determine if they jump or not.
But again, this is not actually a problem.
Your executing thread won’t actually execute CMP, TEST, CALL, JMP, or any conditional jump instructions.
These few instructions will be interpreted separately.
Since you want your emitted code to always land in the same spot, just think what would happen if you actually executed a CALL instruction!
This won’t work at all with Just-in-Time emulation, because it would require you to recompile the location where the call goes, all the way up to the return of the CALL, and thus you would also have to recompile every CALL it makes, etc.
Instead, you have one thread that runs through the target code. This thread has two DWORD values it uses to keep track of its “emulated†EIP and EFlags values. It also has its own fake stack which it uses to keep track of CALL and RETN instructions.
When this thread encounters a CMP, it sets the emulated EFlags value accordingly and then continues to the next instruction, without sending it to the secondary thread.
This first thread uses its EIP value to keep track of its position inside the target code.
When this first thread hits a conditional jump, it checks its emulated EFlags value and sets its emulated EIP accordingly, then continues, again without sending the jump to be executed by the secondary thread.
When it encounters a CALL, it pushes the return location into its own fake stack (this is the only purpose of this stack; all other stack operations will be executed in the secondary thread and will use that thread’s real stack) and sets its fake EIP to the location of the call. It then continues, again, without sending the instruction to be executed.
When it encounters a RETN, it pops its fake stack and sets its fake EIP pointer to that location, exactly how a real thread would. It then continues, again, without sending the instruction to be executed.
All other instructions are sent to be actually executed by the secondary thread.
It has to work this way for Just-in-Time emulation to even work.
The secondary thread CAN’T execute JMP or CALL instructions because it will send the thread off somewhere that isn’t valid code and you die.
If you want the secondary thread to execute JMP or CALL instruction, then you are going to have to actually recompile ahead, which in turn (by the nature of how it works) will actually force you to recompile the entire program all at once, which is not Just-in-Time emulation.
L. Spiro
-
The risk I am trying to explain is that when he checks his flag while in the waiting loop (via CMP), that thread’s EFlags is going to change, and unfortunately that is the same thread that is executing the emitted code.
Now it makes sense :) Sorry, for not-fully-understanding your previous post.
-
Using threads seams like a good idea, but won't jumping back and forth from thread to thread cause a performance hit?
Also, would using Windows' thread system be faster than using the pthreads library? I would like to keep this portable.
Another problem will be returning the new registers from one thread to the other, though. That won't necessarily be easy. Since I plan to interpret and compile at the same time, then just run from memory the next time, I will need a way to pass registers, which involves global variables. If a block is really small, your thread system won't necessarily be faster.
So, I was thinking, why not just read the registers and store them at the beginning and end of a block execution?
Also, the memory handling wouldn't be that difficult. It would be hard if the processor had a BIOS for creating pointers. I could just create pointers to system memory and that case, but self modifying code would be a pain in the ass there and I could bring the computer to its knees with a bad program that doesn't run out of memory as fast as the real deal.
-
Using threads seams like a good idea, but won't jumping back and forth from thread to thread cause a performance hit?
The purpose of using two threads together is so that you can do true execution of the emitted code with minimal overhead used to store copies of registers and flags.
Switching back and forth shouldn’t cause any performance loss; based on the setup, you should see an increase in performance.
Without two threads, you would have to back up AND restore registers for each instruction you execute.
You can create a waiting loop that does not modify registers, which alleviates the need for restoring registers before executing emitted code. This alone decreases the actual code size by 8 instructions.
Imagine, for every instruction you execute, you add 8 more instructions to it to save the registers, then 8 more to restore the registers.
Suddenly one instruction becomes 17.
But your main thread needs a way to get the registers from the executing thread quickly, so it may be a good idea to back them up in globals (but not restore them, since they won’t be modified in your waiting loop).
This adds 8 steps to every instruction you execute, but it is still faster than performing it on one thread, where you will be required to both store and restore them.
But note that you will be at the mercy of Windows®’ thread management.
You must set each thread at the same priority or they will block each other.
But if they simply use signals to each other, there should not be a speed loss by switching back and forth.
As for PThreads, while I have never used them myself, they are popular for a reason.
On the other hand, if you are using Windows®, you have to go through their API anyway; using a wrapper is only slower.
Unless you plan to make your own device driver.
I plan to interpret and compile at the same time, then just run from memory the next time
Well that changes things a lot.
You’re going to have to map out a large space where you can put lots of code.
And create a fast method for determining if a section of code has already been executed.
Lunch time so I can’t finish my essay.
So sad!!!
L. Spiro