Author Topic: JIT Emulation? (Read 13500 times)

dziugo · « **Reply #25 on:** 2005-09-08 10:53:59 »

Quote from: I

Quote from: sfx1999
Aren't the registers stored when a function is called?
When necessary, the ESP and EBP registers are stored on the stack. The rest can change (functions often return the values through registers).

Quote from: L. Spiro

Quote
Aren't the registers stored when a function is called?

Generally EPB (the functionâ€™s local stack pointer) and ESP (the stack pointer) are stored at the start of a function, changed in some way, then restored at the end of the function, back to whatever they were originally.
Meanwhile, other registers will be changing, and those changes last even after the function returns.
Especially functions that return values, in which case EAX will contain the return value (and possibly EDX).

deja vu?

Also, I was quite inaccurate when talking about these registers. Only EBP (or EPB as L. Spiro names it

- it's a Base Pointer btw.) is stored (on the stack). ESP is restored from the EBP when the call is completed.

Quote from: L. Spiro

However, it will change the EFlags, and this will cause a problem.
Registers are not actually the problem.
The thread flags used for conditional jumps now become the problem.
If you donâ€™t understand why, read up on how CMP works in conjunction with JE, JNZ, or whatever conditional jump you like.

If the CMP is used then conditional jump usually directly follows that opcode (not always, but since the instructions between those two CAN'T change EFlags, I don't see any problem). When using two threads, it's OS job not to mess up two thread's registers/flags.

L. Spiro · « **Reply #26 on:** 2005-09-08 11:59:12 »

Quote

If the CMP is used then conditional jump usually directly follows that opcode (not always, but since the instructions between those two CAN'T change EFlags, I don't see any problem). When using two threads, it's OS job not to mess up two thread's registers/flags.

Normally it wouldnâ€™t be a problem.
If he followed that tutorial page, he would end up recompiling the whole thing and executing it all at once.

But this isnâ€™t what he wants to do.
To emulate with Just-in-Time emulation, he executes instruction-by-instruction.
Although he can work ahead and execute several instructions at once, he would still run the risk of resetting the EFlags flag while in the waiting loop.

The risk I am trying to explain is that when he checks his flag while in the waiting loop (via CMP), that threadâ€™s EFlags is going to change, and unfortunately that is the same thread that is executing the emitted code.

Emitted code performs its own CMP instruction, setting threadâ€™s EFlags a certain way.
Thread goes back to its loop to wait for the next instruction it will execute.
While in the loop, it uses CMP to check a flag that determines if there is new code to execute.
This is all on the same thread, which means when it checked to determine if there is new code, it just overwrote the flags set by the emitted code.
Generally speaking, the conditional jump will immediately or almost immediately follow a CMP, but donâ€™t let that fool you into thinking you can get around this problem 100% safely by simply executing sequentially all the instructions after the CMP up to the first conditional jump.
It IS possible that multiple conditional jumps use the same CMP instruction to determine if they jump or not.

But again, this is not actually a problem.
Your executing thread wonâ€™t actually execute CMP, TEST, CALL, JMP, or any conditional jump instructions.
These few instructions will be interpreted separately.
Since you want your emitted code to always land in the same spot, just think what would happen if you actually executed a CALL instruction!
This wonâ€™t work at all with Just-in-Time emulation, because it would require you to recompile the location where the call goes, all the way up to the return of the CALL, and thus you would also have to recompile every CALL it makes, etc.

Instead, you have one thread that runs through the target code. This thread has two DWORD values it uses to keep track of its â€œemulatedâ€ EIP and EFlags values. It also has its own fake stack which it uses to keep track of CALL and RETN instructions.
When this thread encounters a CMP, it sets the emulated EFlags value accordingly and then continues to the next instruction, without sending it to the secondary thread.
This first thread uses its EIP value to keep track of its position inside the target code.
When this first thread hits a conditional jump, it checks its emulated EFlags value and sets its emulated EIP accordingly, then continues, again without sending the jump to be executed by the secondary thread.
When it encounters a CALL, it pushes the return location into its own fake stack (this is the only purpose of this stack; all other stack operations will be executed in the secondary thread and will use that threadâ€™s real stack) and sets its fake EIP to the location of the call. It then continues, again, without sending the instruction to be executed.
When it encounters a RETN, it pops its fake stack and sets its fake EIP pointer to that location, exactly how a real thread would. It then continues, again, without sending the instruction to be executed.

All other instructions are sent to be actually executed by the secondary thread.
It has to work this way for Just-in-Time emulation to even work.
The secondary thread CANâ€™T execute JMP or CALL instructions because it will send the thread off somewhere that isnâ€™t valid code and you die.
If you want the secondary thread to execute JMP or CALL instruction, then you are going to have to actually recompile ahead, which in turn (by the nature of how it works) will actually force you to recompile the entire program all at once, which is not Just-in-Time emulation.

L. Spiro

dziugo · « **Reply #27 on:** 2005-09-08 12:25:23 »

Quote from: L. Spiro

The risk I am trying to explain is that when he checks his flag while in the waiting loop (via CMP), that threadâ€™s EFlags is going to change, and unfortunately that is the same thread that is executing the emitted code.

Now it makes sense

Sorry, for not-fully-understanding your previous post.

sfx1999 · « **Reply #28 on:** 2005-09-09 03:22:35 »

Using threads seams like a good idea, but won't jumping back and forth from thread to thread cause a performance hit?

Also, would using Windows' thread system be faster than using the pthreads library? I would like to keep this portable.

Another problem will be returning the new registers from one thread to the other, though. That won't necessarily be easy. Since I plan to interpret and compile at the same time, then just run from memory the next time, I will need a way to pass registers, which involves global variables. If a block is really small, your thread system won't necessarily be faster.

So, I was thinking, why not just read the registers and store them at the beginning and end of a block execution?

Also, the memory handling wouldn't be that difficult. It would be hard if the processor had a BIOS for creating pointers. I could just create pointers to system memory and that case, but self modifying code would be a pain in the ass there and I could bring the computer to its knees with a bad program that doesn't run out of memory as fast as the real deal.

L. Spiro · « **Reply #29 on:** 2005-09-09 06:01:47 »

Quote

Using threads seams like a good idea, but won't jumping back and forth from thread to thread cause a performance hit?

The purpose of using two threads together is so that you can do true execution of the emitted code with minimal overhead used to store copies of registers and flags.
Switching back and forth shouldnâ€™t cause any performance loss; based on the setup, you should see an increase in performance.
Without two threads, you would have to back up AND restore registers for each instruction you execute.
You can create a waiting loop that does not modify registers, which alleviates the need for restoring registers before executing emitted code. This alone decreases the actual code size by 8 instructions.
Imagine, for every instruction you execute, you add 8 more instructions to it to save the registers, then 8 more to restore the registers.
Suddenly one instruction becomes 17.
But your main thread needs a way to get the registers from the executing thread quickly, so it may be a good idea to back them up in globals (but not restore them, since they wonâ€™t be modified in your waiting loop).
This adds 8 steps to every instruction you execute, but it is still faster than performing it on one thread, where you will be required to both store and restore them.

But note that you will be at the mercy of WindowsÂ®â€™ thread management.
You must set each thread at the same priority or they will block each other.
But if they simply use signals to each other, there should not be a speed loss by switching back and forth.

As for PThreads, while I have never used them myself, they are popular for a reason.
On the other hand, if you are using WindowsÂ®, you have to go through their API anyway; using a wrapper is only slower.
Unless you plan to make your own device driver.

Quote

I plan to interpret and compile at the same time, then just run from memory the next time

Well that changes things a lot.
Youâ€™re going to have to map out a large space where you can put lots of code.
And create a fast method for determining if a section of code has already been executed.

Lunch time so I canâ€™t finish my essay.
So sad!!!

L. Spiro