Author Topic: JIT Emulation? (Read 11547 times)

sfx1999 · « **on:** 2005-08-24 03:47:59 »

What exactly is the difference between JIT emulation as apposed to a regular interpreter? I plan on writing an emulator some day, so it would help me a lot if I knew this.

Kislinskiy · « **Reply #1 on:** 2005-09-02 07:34:44 »

Shortly a "Just in Time"-Emulator translates the opcodes in real time whereas other approaches does this work before the actual execution.

Qhimm · « **Reply #2 on:** 2005-09-02 17:26:20 »

More precisely, an interpreter simply chews, processes and executes opcode by opcode. It does not produce executable code as output (hence there is no caching), it simply does the appropriate action for each opcode by itself. A compiler (more precisely dynamic recompiler), on the other hand, translates the entire source program into host executable code and then runs this as a normal program on the host. Just-in-time compilation is sort of an optimized recompiler, in that it only recompiles code as needed. When the program jumps into a block of unevaluated code/data, the JIT compiler kicks in and translates the block/function/line/whatever-unit-size into host code, which is then executed. Thus, for large programs only the code that's actually used is translated/recompiled. So in closing, JIT is somewhere between recompiling and interpreting.

sfx1999 · « **Reply #3 on:** 2005-09-03 18:35:40 »

Well, I already found that information. The only problem I can't find a way past is getting C to execute code it just generated. The only thing I can think of is an array of function pointers that have assembly stored in them. I don't even know if that would work.

Then there is self modifying code. That has to be difficult.

NTmatter · « **Reply #4 on:** 2005-09-03 23:49:28 »

Just a thought, but you might not need to use function pointers to build your emulator. Since you'll probably be dealing with the interpretation of opcodes, you can have a setup similar to the following:

Code: [Select]

char* memory = malloc( SOME_LARGE_AMOUNT );
int mem_pointer = 0;
while(true){
   switch(memory[mem_pointer++]){ // Pick a function based upon the opcode in this part of memory
      case 0: // NOP
         break;
      case 1: // Immediate Mode ADD A to B and store into address C
         char a = memory[mem_pointer++];
         char b = memory[mem_pointer++];
         memory[mem_pointer] = a + b;
         break;
       // et cetra
       default: printf("Unknown Opcode encountered!"); exit(-1);
   }
}

As you can see, this allows for C to execute particular code based upon the contents of something loaded into memory. In this manner, it's also possible to write self-modifying code, by selectively altering the contents of the memory array.

If you're in a situation that requires you to create new functions on the fly, I suggest looking into Lisp. It's got great functionality for creating and passing around functions at runtime.

sfx1999 · « **Reply #5 on:** 2005-09-04 01:20:09 »

It is a lot faster to make a bunch of pointers to functions and use the opcodes as a look up table then to use switch/case. That's not what I was talking about, though.

Micky · « **Reply #6 on:** 2005-09-04 12:20:42 »

Quote from: sfx1999

It is a lot faster to make a bunch of pointers to functions and use the opcodes as a look up table then to use switch/case.

Are you sure? At one point I looked at the assembly generated by GCC for a 6502 emulator and it actually created a jump table for a big switch statement.
If you want to compile stuff, for example for a JIT compiler, you'll need to implement the instructions in assembler as position independant code. Then you can copy them into a new block of memory and patch immediate values and jump instructions. Of course, you can make that as complicated as you want, for example with an immediate code that you can optimise.
For old 8 and 16 bit cpus I wouldn't bother with a compiler, though.

sfx1999 · « **Reply #7 on:** 2005-09-04 18:03:59 »

There is an emulator called Generator that uses a JIT. Anyway, it would be better if I started with something simpler as opposed to trying to write a PS2 emulator first.

I would like to start with NES, though.

Micky, when you did that, did you use any optimazation switches?

Micky · « **Reply #8 on:** 2005-09-04 19:19:39 »

Quote from: sfx1999

Micky, when you did that, did you use any optimazation switches?

That was about 7 years ago, when the first emulators came up. Don't ask me which version of gcc, or which switches exactly. I'd guess it was simply -O2.

Cyberman · « **Reply #9 on:** 2005-09-04 19:39:45 »

sfx1999:
suggestion for you and making an NES emulator

Emutalk/Programing/NES Thread
Emutalk is a decent place, but it has some different rules than here. However that thread is relatively active. I would say about 30-40 emulation authors frequent the place.

As for JIT .. there is no point for JIT with a 6502. A modern processor.. can interpret the instructions significantly faster than the original processor could ever run them and have plenty of time left over for graphical and audio update.

Cyb

sfx1999 · « **Reply #10 on:** 2005-09-04 22:16:12 »

It's a learning experience. Anyway, it could help on PDAs and stuff.

Cyberman · « **Reply #11 on:** 2005-09-05 02:25:52 »

Quote from: sfx1999

It's a learning experience. Anyway, it could help on PDAs and stuff.

That's a possibility
interesting note PalmSource is moving too Linux.

The reason why is many Cell systems in China and India etc are going that route. Since they are 2 billion plus consumers guess what?

However on that sort of platform you'll need to do some mindful work.
So JIT might be helpful most importantly is optimization for emulating the graphics system.

Cyb

L. Spiro · « **Reply #12 on:** 2005-09-05 03:02:07 »

If you are working with such a simple instruction set as is in the Nintendo Entertainment System, itâ€™s probably best to aggregate the list of instructions.
Nintendo Entertainment System instruction set has 151 opcodes, which, although tediously, can be fit into an aggregate array where each index into the array represents the relative instruction.
Each entry into the array should contain the size of the code, a bitmask for decoding, possibly text in case you want to print the code (extra feature? Learning experience? Help with debugging?), and a pointer to a function that interprets the code.

Using a switch case is not necessarily as slow as you would think because the compiler will generate a jump table where each four bytes is a DWORD indicating the location where to go to get the appropriate case.
Then the ASM to actually handle the case would be similar to â€œjmp DWORD PTR: [ESI+4*EAX]â€ where ESI already contains the location of the jump table and EAX contains the case number.
It will go to the jump table, go to the correct index in the jump table, and that index will contain a DWORD pointer to the code that has the case where it is jumping. All done in one instruction.

But it is not recommended; it is even more tedious to construct and you get less out of it. Every action you want to perform based on the instruction will have to be manually coded into each switch case.
If you use an aggregate list where the opcode is the actual index into the list, when you read the opcode you instantly know where to go to get all the information regarding the opcode.

L. Spiro

sfx1999 · « **Reply #13 on:** 2005-09-05 15:38:55 »

Now if someone could tell me how to dynamically create code into memory and call it using C, I would really appreciate it.

L. Spiro, I might just use that switch/case if that is how it is done. I can use inline functions to make it a little more readable. Anyway, does that only happen when you turn on optimization?

Micky · « **Reply #14 on:** 2005-09-05 20:25:33 »

Quote from: sfx1999

Now if someone could tell me how to dynamically create code into memory and call it using C, I would really appreciate it.

You'll need to know assembler. You can copy around functions written in C in memory under certain conditions (position-independent code), but I wouldn't depend on it...
1) allocate memory
2) write your machine instructions into memory. Don't forget a return
3) cast a pointer to your memory into a function pointer
4) call the function pointer

For a function without parameters this should work out of the box, if you need parameters you'll have to look up the ABI of the CPU and operating system you're using. Some parameters may be in registers, others are passed on the stack. And you may have to clean up the stack before leaving your generated code.
I don't have a windows box around, so this is from memory:
- don't put code into memory allocated form the stack, some operating systems allocate no-execute pages for that
- memory returned by malloc should be OK. If it isn't you'll have to use VirtualAlloc yourself and make sure you request executable pages

L. Spiro · « **Reply #15 on:** 2005-09-06 03:04:29 »

I am 92% sure that the switch cases will be optimized that way regardless of compiler optimizations.
Once you have 3 or more cases, they get this optimization.

Be sure to look into the __fastcall keyword, and also you can find the source code for Project64 via Google.

L. Spiro

sfx1999 · « **Reply #16 on:** 2005-09-06 03:34:57 »

Actually, you don't need to pass parameters at all. I found a tutorial. When you generate the code, you pass pointers to the generated code. It's somewhat simple.

L. Spiro · « **Reply #17 on:** 2005-09-06 03:57:32 »

Where is the tutorial?

L. Spiro

sfx1999 · « **Reply #18 on:** 2005-09-07 02:20:43 »

http://www.zenogais.net/projects/tutorials/Dynamic%20Recompiler.html

L. Spiro · « **Reply #19 on:** 2005-09-07 02:50:12 »

That link seems to have been short-lived; it is already down.
Either that or it likes to periodically go down and up.

I will keep checking on it periodically, but you may want to upload to mirexâ€™s bin of trash if you like, and if it is fairly small.

L. Spiro

sfx1999 · « **Reply #20 on:** 2005-09-07 03:15:44 »

It works for me. Are you sure that it's not working?

L. Spiro · « **Reply #21 on:** 2005-09-07 07:44:33 »

I got it now.

After looking at it, I have to question if it is really what you want to be using.
That tutorial tells you how to recompile the entire target code all at once and then just execute it.

You said you want to compile on-the-fly.
Well, the tutorial can help a little with this, particularly with emitting x86 code, but you are going to have to do heavy modifications to their base concept.

Essentially the problem is this: They use a single thread for everything. They read the target code and emit x86 code. Of course while this process is happening, the registers in their main thread are changing.
Well, for them, this is no problem, because the last step will be to execute the written code all at once.
Since it is all done at once, the code will be able to set registers and retrieve them without fear of them being modified.

Now here is the problem with trying to do this instruction-by-instruction.
Your method would be to write code, execute code, write code, execute code, etc.
When you write code, your threadâ€™s registers will change.
When you execute the code, the code might set a register to a specific value.
When you write the next set of code, that register may be overwritten because it may be needed for the writing of the code, so when the next set of code is executed, the register it wants is basically some random value, rather than what it had set previously.

You would need a separate thread for the execution of the code, and certainly you will need a way to make sure that thread can wait for new code to be written while maintaining the registers used by the emitted code.
If you use naked functions, this isnâ€™t so difficult.
One solution off the top of my head would be to PUSH each register, then check a flag that determines if there is new code to be executed.
If there is, POP each register back into place and go directly to the new code, making sure not to change any registers on the way, and use a CALL to get to the new code.
The new code would end in a RETN, putting your thread back into its loop where it immediately PUSHâ€™s all its registers again to keep them safe.

You could improve speed by using CMP directly on the value you are checking (to determine if new code has been written), so that nothing is loaded into registers. This would allow you to avoid all the PUSHâ€™ing and POPâ€™ing.

But then there is the problem of other thread flags being modified, such as EFlags.
By using CMP or TEST, you will modify flags in your thread that the emitted code may want to use.
You would have to work around this as well.

There is the option of storing fake copies of registers and thread flags and using them instead of using the real threadâ€™s registers and flags, but you would be interpreting code rather than recompiling it at run-time.

Since you arenâ€™t compiling it all at once and executing it, your challenge is to get your emitted-code thread to wait for new code to be written without changing its flags/registers.

L. Spiro

sfx1999 · « **Reply #22 on:** 2005-09-08 03:56:50 »

I am not sure what you mean exactly. Aren't the registers stored when a function is called? As for making sure that the emulated registers aren't destroyed, you put them in a global variable.

dziugo · « **Reply #23 on:** 2005-09-08 08:47:59 »

Quote from: sfx1999

Aren't the registers stored when a function is called?

When necessary, the ESP and EBP registers are stored on the stack. The rest can change (functions often return the values through registers).

L. Spiro · « **Reply #24 on:** 2005-09-08 10:05:12 »

To sfx1999.

There is no need to store them in globals; it is slower than simply pushing/popping them, and more cumbersome.

If you store to globals, you have to use each global name (don't use array indexes or speed will be compromised) individually; a specific global set aside for a specific register.
Then you would have to code the assembly:
mov g_rEAX, EAX
mov g_rECX, ECX
mov g_rEDX, EDX
etc.

Then before you execute your translated code, you have to put the registers back:
mov EAX, g_rEAX
mov ECX, g_rECX
mov EDX, g_rEDX
etc.

Itâ€™s easier to just do:
PUSH EAX
PUSH ECX
PUSH EDX

â€¦
POP EDX
POP ECX
POP EAX

This is all in regards to storing them to ensure they donâ€™t get overwritten while you are waiting for new code to process.

To answer your questionâ€¦

Quote

Aren't the registers stored when a function is called?

Generally EBP (the functionâ€™s local stack pointer) and ESP (the stack pointer) are stored at the start of a function, changed in some way, then restored at the end of the function, back to whatever they were originally.
Meanwhile, other registers will be changing, and those changes last even after the function returns.
Especially functions that return values, in which case EAX will contain the return value (and possibly EDX).

Well, normally you wouldnâ€™t need to worry about this, since it should (in theory) only be the code you are emulating that will change registers, and of course whatever changes it makes, you donâ€™t want to get in the way.
The problem is actually trying to stay out of the way.

You have one thread that sits in a loop and waits for new code to be written.
It checks a simple flag to determine if there is new code.
When the flag is changed, it goes immediately to the location where the new code was written, executes it, and comes back to the loop.
The new code may have stored a value in EAX that is intended for the next instruction (which has yet to be parsed/executed).
There is no second copy of EAX unless you specifically make one (with a global or by PUSHâ€™ing).
So you go back to your loop.
It checks the global. Maybe the code looks like this:

MOV EAX, g_bCodeWritten
TEST EAX, EAX

Well now your loop has changed the same EAX that the emitted code is supposed to be using.
This is why you will definitely be required to write your loop in ASM yourself, so you know exactly what it is changing.

As I mentioned before, you donâ€™t actually have to store backups of the registers at all, if you just write an ASM loop that uses â€œCMP g_bCodeWritten, 0â€ directly.
This will avoid changing any registers, so you donâ€™t need to have globals or PUSH/POP.

However, it will change the EFlags, and this will cause a problem.
Registers are not actually the problem.
The thread flags used for conditional jumps now become the problem.
If you donâ€™t understand why, read up on how CMP works in conjunction with JE, JNZ, or whatever conditional jump you like.

Luckily, there is a method for you to get around this problem.
Basically youâ€™re going to have your own EIP pointer which keeps track of where you are in the code, so you know what instruction to translate next.
You load the instruction, translate it, write it in x86, set a flag, your second thread executes it, goes back to its loop, and you continue.
Well, youâ€™re always saving the code to a specific location.
That means you actually canâ€™t write CALL or JMP instructions (or conditional jumps).
Actually, these instructions would be parsed directly, and rather than translating them and executing them, you make the call or jump with your original EIP pointer, then continue from there.
To do this correctly, your original interpreter (with the EIP pointer) will be required to correctly interpret TEST and CMP instructions, and any others that modify EFlags. It will keep its own EFlags value for storing these results (I mean it will have a DWORD m_dwEFlags member; I am not suggesting it uses its real EFlags value in its thread context).
Then when it encounters a JNZ or whatever, it will check its m_dwEFlags to determine if the jump is taken.
If it is, it will need to correctly go to the location where it jumps and continue from there.
Remember that all of this is done without the steps of translating/executing the code.
This part is all interpreted.

I know this seems confusing.
You may want to really read up on ASM and thread context members.
You are going to need to code in ASM and set up a working system (partially in ASM) that allows two threads to communicate frequently and stay synchronized perfectly.
Your EIP thread canâ€™t extract an instruction and write it to be executed while the executing thread is still executing the last bit if code it was passed.

You are also going to have to set up your own storage method to hold the values that the emulated code will be using.
The emulated code may try to access 0x003800EC, which quite likely wonâ€™t be a valid address in your program.
So you are going to create a method of mapping memory back and forth from the real location in your program to the respective location in the emulated code.
This part SHOULD be nothing more than a simple offset. Depends on what you are emulating.

L. Spiro

Author Topic: JIT Emulation? (Read 11547 times)

Kislinskiy

NTmatter