Qhimm.com Forums

Miscellaneous Forums => Scripting and Reverse Engineering => Topic started by: Darkness on 2005-10-29 16:31:17

Title: 3d drawing order
Post by: Darkness on 2005-10-29 16:31:17
I'm working on a 3d rendering program. It's built on top of a simple 2d drawing library. I'm trying to make the polygons draw in the correct order, objects appear solid. Currently, am ordering it by the distance between the furthest of a triangle vertex and the camera. It works pretty well, except for instances where two vertices are the same distance away. Then it orders them randomly.

Does anyone have any suggestions on how I should sort this?
Title: Re: 3d drawing order
Post by: Cyberman on 2005-10-29 18:15:03
Quote from: Darkness
I'm working on a 3d rendering program. It's built on top of a simple 2d drawing library. I'm trying to make the polygons draw in the correct order, objects appear solid. Currently, am ordering it by the distance between the furthest of a triangle vertex and the camera. It works pretty well, except for instances where two vertices are the same distance away. Then it orders them randomly.

Does anyone have any suggestions on how I should sort this?

Surface culling is what I suggest.
First order the surfaces. Then reject surfaces that are obscured by closer surfaces. If you have 2 surfaces that are intersect or are at the same distance you should perform an intersection operation and surface splitting if needed. If they do not visually colide everything is fine. If they hit one another, you need to prune off each section (Split them) that is obscured by the other surface.  If they have the SAME location go by SIZE, IE the bigger polygon wins for visability as this makes things simpler.

Cyb
Title: 3d drawing order
Post by: Darkness on 2005-10-29 19:24:25
Thanks for the quick reply. Unfortunately, I think this might be a bit too processor intensive for what I'm doing.

I've been trying something along the lines of painters method.

I've changed it so I average the vertices of a polygon to come up with a center point, and order the polygon by the distance between this point and the position of the 'camera.'

Ex:
Code: [Select]

avg_x = (vertices[triangles[i].a].x + vertices[triangles[i].b].x + vertices[triangles[i].c].x)/3.0;
avg_y = (vertices[triangles[i].a].y + vertices[triangles[i].b].y + vertices[triangles[i].c].y)/3.0;
avg_z = (vertices[triangles[i].a].z + vertices[triangles[i].b].z + vertices[triangles[i].c].z)/3.0;

d = sqrt((avg_x - cam.x)*(avg_x - cam.x) + (avg_y - cam.y)*(avg_y - cam.y) + (avg_z - cam.z)*(avg_z - cam.z));

lengths.push_back(d);


Works right most of the time, but:
(http://www.kadets.com/temp-t/oh_hell.JPG)

Anyway, I'd be interested in any way to correct this method, and efficient way to zbuffer, etc.
Title: 3d drawing order
Post by: halkun on 2005-10-29 19:44:21
You are going to have problems when you start dealing with models that have convex sides.

I don't thing Z biffering is that CPU intensive. Take a look for it.
Title: 3d drawing order
Post by: Micky on 2005-10-29 22:50:49
What you could do is build a BSP tree. Then when drawing you test each node with the viewing direction, then first draw the branch "behind" the node, the node itself and then the branch "in front" of the node. That way you will always get perfect depth ordering without sorting by vertex. Games like Doom and Quake do something like this.
But if you have any chance, try to use OpenGL or DirectX. A Z-buffer makes life a lot easier, and you'll get hardware acceleration.
Title: 3d drawing order
Post by: L. Spiro on 2005-10-30 03:27:28
Just a note on optimization: if you don’t actually need to know the exact distance of objects, for example, if you are only calculating distances for comparison routines, then you should not perform the sqrt() operation.

And you may want to look into operator overloading since it seems you are writing your own vectors (and probably matrices).


(vertices[triangles.a] + vertices[triangles.b] + vertices[triangles.c]) / 3.0f is much easier to write.


The problem you described above can be fixed with culling.
Determine the order of the vertices as they appear on the screen and if they go clockwise, draw them, and if they go counter-clockwise, don’t.
You can switch the order as you desire.


You will also want to implement a z-buffer.
Simply store a FLOAT for every pixel on the screen and when you draw a pixel, store its depth to the respective FLOAT.  Then check the respective FLOAT’s before drawing other pixels.
Before each frame you have to clear the z-buffer.  First you would want to decide how far back to set the z-buffer, or you could use -1.0f and declare it as meaning the z-buffer pixel is empty.  In the first case you just keep drawing and checking distances.
In the second case you would have to add a second check for -1.0f, and if found, draw.
But in either case, the fastest way to set the buffer is to use a hex calculator and calculate the DWORD representation of the FLOAT you desire and use memset() with that value.

So, for example, if I have a 640×480 display and my z-buffer should be set to -1.0f, I would do:
Code: [Select]
memset( g_fZBuffer, 0x000080BF, sizeof( FLOAT ) * 640 * 480 );


L. Spiro
Title: 3d drawing order
Post by: mirex on 2005-10-30 07:33:38
L.Spiro: I think that everything you said is allright, except the
Code: [Select]
memset( g_fZBuffer, 0x000080BF, sizeof( FLOAT ) * 640 * 480 ); .. because I think that memset() casts 2nd parameter to unsigned char usually, so it won't help you to set the floats, I would use the for() loop instead:
Code: [Select]
int  i; float g_fZBuffer[ 640*480 ];
for( i=0; i<640*480; i++ )
  g_fZBuffer[ i ] = -1.0;
Title: 3d drawing order
Post by: L. Spiro on 2005-10-30 08:42:14
Yup.

Better to try this then:


Code: [Select]
for ( INT I = 640 * 480; --I >= 0; ) {
*(DWORD *)&g_fZBuffer[I] = 0x000080BF;
}


The code that is generated should avoid using any form of floating-point registers.

Depending on your project optimization settings, however, it may compile into the same thing even if you use mirex’s code except of course the order (so both codes would work equally well).

The only way to be positive is to write it in assembly.
Clearing the buffer is something that will happen every frame, so you don’t want to half-ass it.


L. Spiro
Title: Abusing poninters properly
Post by: Cyberman on 2005-10-30 16:11:05
This is a good place to use pointers.

First if you use an index into a pointer as an array you are likely adding a lot of additional operations to your code. SO It might (in order to not depend on the compilors optimization capabilities) be a little faster to do this
Code: [Select]

DWORD *Ptr= (DWORD *_&g_fZBuffer;
for ( INT I = 640 * 480; --I >= 0; ) {
   *Ptr++ = 0x000080BF;
}

from what I've seen of most compilor optimizations using the code you provided would end up being something like this
// using psuedo ops
Code: [Select]

load I register with 307200
loop:
Load effective address of g_fZBuff to Ref
move I to Index
multiply index by sizeof(DWORD)
add result to Ref
mov [Ref], 0x000080BF
decrement I
jump if not zero loop

where as the aforementioned code would be more like
Code: [Select]

load effect addres of g_fZBuff to Ref
load I register with 307200
Loop:
move [Ref], 0x000080BF
add Ref, sizeof(DWORD)
decrement I
jump if not zero loop

And does exactly the same thing :D

Cyb
Title: 3d drawing order
Post by: L. Spiro on 2005-10-30 17:34:49
I originally wrote my code similarly to the way you had it, but since it requires a full instruction to decrement I and another instruction to increment the pointer, I decided it would be faster to go the other way, since it will just use a single instruction to access the array location and set its value.

But when I tried to compare the actual compiled code to get the results, the method you posted seems to trick the compiler and with optimizations enabled, it simply isn’t added into the code.
Literally, the compiler, with full optimizations, will think the code is not doing anything and it won’t compile it.
You can get similar results by doing this:
Code: [Select]
for ( DWORD I =0 ; I < 765765; I++ ) {
INT KJHJH = 0;
}

With full optimizations, it will omit “useless” code such as this.
If I use my debug build, with no optimizations, both sets of code are compiled into the .exe.



As a result, I can not show the actual code produced by the method you posted, but here is what is compiled by the method I posted:

Code: [Select]
mov eax, 4B000h
mov ecx, 80BFh
LOOP :
dec eax
mov dword ptr [esp+eax*4], ecx
jns LOOP

Here, the loop consists of three total instructions, including the jns check.


To get the other method I have to use the debug build.
In debug, the method I posted:
Code: [Select]
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [I]
mov dword ptr g_fZBuffer[eax*4], 80BFh
jmp LOOP
END :

Holy crap that is inefficient!
That was the method I posted.


Now the method you posted, using “pVal” as my pointer through the list:
Code: [Select]
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [pVal]
mov dword ptr [eax], 80BFh
mov ecx, dword ptr [pVal]
add ecx, 4
mov dword ptr [pVal], ecx
jmp LOOP
END :

Both sets of code come out terribly in debug compilation.
But the problem I expected was at the end.
In debug there are 3 extra instructions used to increase the pointer.
I expected in retail compilation there would only be one (add [pVal], 4), but that is enough.




This is the code I would suggest:
Code: [Select]
mov eax, 0xBF800000
mov ecx, 4B000h
lea edi, [g_fZBuffer]
rep stos dword ptr [edi]

It is the fastest way to set a large number of bytes to the same value.
Also, it was my mistake above.  You should use 0xBF800000 instead of 0x000080BF.
I saw 0xBF800000 in my mind but typed it in reverse for whatever reason.


L. Spiro
Title: 3d drawing order
Post by: ficedula on 2005-10-30 20:55:20
Oooh, assembly ;)

First rule of assembly programming: Don't do it unless you know more than the compiler does.

Second rule: The compiler always knows something you don't ;)


Case in point; cyberman, your idea that using the pointer is better (because if you use an index, you just have to "recalculate" the pointer anyway, each time around the loop) would be true on some processors ... not on the x86! Or at least a fair few x86 processors, there being a large number of dies out there. Address calculation is practically free on the x86, completely free in terms of instruction count. Hence LSpiro's example with only three instructions in the loop.


Second case in point: LSpiro's suggested code is also sub-optimal for most processors ... isn't rep movs / rep stos the quickest way to move data around? One instruction to blast a whole chunk of data around? Er, well, no ... it moves data in 4 bytes chunks which is frankly pretty small fry. Loading an FPU register with a zero value and then blasting 64 bytes at a time into memory (8 bytes per register, and store it 8 times per loop) may be faster; it's effectively a form of loop unrolling. The more time spent moving data rather than checking "are we done yet?" the better.

But is there a more efficient way? Of course. If you have an Athlon XP, or a P4, you've got SSE1 at your disposal. That's 16 bytes per register. And you can do non-temporal moves, meaning that it'll fire off a request to move the data to the memory controller and then continue on to process subsequent instructions without waiting for the move to finish. Don't want to rely on SSE? Use 3dnow/MMX for the same purpose, although then you're back to the 8 bytes per register of the FPU.

Better yet, find the routine in your runtime that does all of this work for you, I would hope that there was a function which did memset() but for DWORD / QWORD sized quantities. Then let the runtime worry about whether you have SSE, or are running on x86-64, or whatever, it's what it's there for. Not to suggest that learning assembler is completely useless ... but you all know about premature optimisation, right? ;)
Title: 3d drawing order
Post by: L. Spiro on 2005-10-31 02:53:12
Quote
Second rule: The compiler always knows something you don't
I would have agreed with that until last night when the compiler didn’t know well enough to compile Cyberman’s loop.
If it had been in a real-case scenario and found a bug in my program, I wouldn’t have suspected that my buffer-initialization code was simply not compiled into the final .exe.



Quote
Loading an FPU register with a zero value and then blasting 64 bytes at a time into memory (8 bytes per register, and store it 8 times per loop) may be faster; it's effectively a form of loop unrolling. The more time spent moving data rather than checking "are we done yet?" the better.
How many cycles does it take to write from the FPU to an address and then decrement your counter and then check for 0?
I don’t actually know, which is why I am asking.
There is no REP prefix with any of the FPU register operations, so you would have to write a loop with the check for 0.
Now, you’re going to be writing twice the information at once, which means this method can have up to 6 cycles before it becomes slower than REP STOS.
But I can’t find a full table to compare the actual results; I could only find results on REP STOS (3 cycles) because it seems it is the most favored method for filling aligned linear memory buffers.  I suspect it would be close.


L. Spiro



[EDIT]
I finally found some information on using the FPU to transfer 8 bytes at a time:
Quote
Floating point instructions can be used to move 8 bytes at a time:
FILD QWORD PTR [ESI] / FISTP QWORD PTR [EDI]
This is only an advantage if the destination is not in the cache. The
optimal way to move a block of data to uncached memory on the Pentium is:

TopOfLoop:
FILD QWORD PTR [ESI]
FILD QWORD PTR [ESI+8]
FXCH
FISTP QWORD PTR [EDI]
FISTP QWORD PTR [EDI+8]
ADD ESI,16
ADD EDI,16
DEC ECX
JNZ TopOfLoop

The source and destination should of course be aligned by 8. The extra time
used by the slow FILD and FISTP instructions is compensated for by the fact
that you only have to do half as many write operations.  Note that this
method is only advantageous on the Pentium and only if the destination is
not in the cache. On all other processors the optimal way to move blocks of
data is REP MOVSD, or if you have a processor with MMX you may use the MMX
instructions in stead to write 8 bytes at a time.


But this is in regards to copying bytes rather than writing a constant repeatedly.
But it’s all I could find.
[/EDIT]
Title: 3d drawing order
Post by: mirex on 2005-10-31 08:03:14
Hehe we are absolutely off topic with this assembly stuff guys !! But it could be done also like this: ;)
Code: [Select]
mov cx, 4B000h
mov ax, ptr pVal
mov di, ax
push ds
pop es
mov eax, 80BFh
rep stosdw


I hope there is stosdw, I don't remember this anymore.
Title: 3d drawing order
Post by: ficedula on 2005-10-31 18:05:03
LSpiro: Try downloading the processor documentation from the CPU manufacturers. Again, it'll mostly be devoted to copies rather than constant stores; but, here's the figures from my Athlon XP processor manuals;

REP MOVSB: 570MB/s
REP MOVSD: 700MB/s
Simple loop: 720MB/s (so just writing the loop out without any optimisation is quicker than REP MOVS on a modern CPU!)
Unrolled/grouped loop: 750MB/s
MMX registers: 800MB/s
MMX registers, non-temporal move: 1120MB/s
MMX, non-temporal, prefetched: 1250MB/s
MMX, non-temporal, block prefetch: 1630MB/s

Kind of interesting; back in the days of 486 and earlier, the simple rule was: the less instructions the better, most instructions took about the same length of time to execute (not all, of course), so less instructions = less fetching from memory = quicker. Nowadays ... well, you can see from the loop above, not only is the optimised loop over twice as quick as a simple REP MOVS, just writing the loop out manually (using MOV/DEC/JNZ) is quicker too!
Title: 3d drawing order
Post by: Cyberman on 2005-10-31 20:02:08
Quote from: L. Spiro
I originally wrote my code similarly to the way you had it, but since it requires a full instruction to decrement I and another instruction to increment the pointer, I decided it would be faster to go the other way, since it will just use a single instruction to access the array location and set its value.

But when I tried to compare the actual compiled code to get the results, the method you posted seems to trick the compiler and with optimizations enabled, it simply isn’t added into the code.
Literally, the compiler, with full optimizations, will think the code is not doing anything and it won’t compile it.
You can get similar results by doing this:
Code: [Select]
for ( DWORD I =0 ; I < 765765; I++ ) {
INT KJHJH = 0;
}

With full optimizations, it will omit “useless” code such as this.
If I use my debug build, with no optimizations, both sets of code are compiled into the .exe.

That is useless code.. what is it doing inside the loop? Nothing so eliminating it is perfectly legitimate. You are setting a variable to 0 thousands of times that only persists inside the for loop thus it's doing nothing at all.  Techically it would check what it's doing with the variable inside the loop, if the variable affects nothing outside the loop it's elminated and so you have an empty loop as a result. Empty loops are removed and thus it comes out to nothing.

The variable needs defined outside the loop to first persist (IE doing something to the compilor).
This code I compiled
Code: [Select]
void __fastcall TForm1::Button1Click(TObject *Sender)
{
   // clear buffer
   DWORD *Ptr = (DWORD *)&DepthBuffer;
   for(int I = 640*480; --I >=0;)
   {
      *Ptr = 0x000080BF;
   }
}

This is the resulting assembly output suprisingly close I think
Code: [Select]

@6:
push      ebp
mov       ebp,esp
?debug L 29
?live16390@16: ; EAX = this
add       eax,724
?debug L 31
?live16390@32: ; EAX = Ptr
@7:
mov       edx,307200
jmp       short @9
?debug L 33
?live16390@48: ; EDX = I, EAX = Ptr
@8:
mov       dword ptr [eax],32959
add       eax,4
?debug L 31
@10:
@9:
dec       eax
jns       short @8
?debug L 35
?live16390@80: ;
@12:
pop       ebp
ret

Cyb
Title: 3d drawing order
Post by: L. Spiro on 2005-11-01 01:49:11
Quote
Nowadays ... well, you can see from the loop above, not only is the optimised loop over twice as quick as a simple REP MOVS, just writing the loop out manually (using MOV/DEC/JNZ) is quicker too!

That’s copying memory.
Not at all as fast as setting a linear array to a specific value.
Of course I am not going to argue when it comes to MMX instructions, but for setting a linear block of memory to a specific value, REP STOS (clearly, not REP MOV*) is the fastest for Pentium® processors.
I also won’t argue that other routines are faster on other processors, but in general REP STOS is the fastest and most widely used.





Quote
That is useless code.. what is it doing inside the loop? Nothing so eliminating it is perfectly legitimate. You are setting a variable to 0 thousands of times that only persists inside the for loop thus it's doing nothing at all. Techically it would check what it's doing with the variable inside the loop, if the variable affects nothing outside the loop it's elminated and so you have an empty loop as a result. Empty loops are removed and thus it comes out to nothing.

I don’t think you quite got my point.
I posted that code as an example of useless code.
And I explained why it would be omitted already.
The point was that your code is omitted also, because the compiler thinks it is useless, when of course we all know it is not.

Setting the variable outside the loop does nothing.
I have already written your loop with both the float array and the incemental pointer declared outside the loop, and I further went on to make sure the float array was being used outside the loop, but to the point, the compiler has a bug and does not compile your code.
That’s all my point was.



L. Spiro
Title: 3d drawing order
Post by: Cyberman on 2005-11-01 04:59:36
Quote from: L. Spiro
I don’t think you quite got my point.
I posted that code as an example of useless code.
And I explained why it would be omitted already.
The point was that your code is omitted also, because the compiler thinks it is useless, when of course we all know it is not.

Setting the variable outside the loop does nothing.
I have already written your loop with both the float array and the incemental pointer declared outside the loop, and I further went on to make sure the float array was being used outside the loop, but to the point, the compiler has a bug and does not compile your code.
That’s all my point was.



L. Spiro

LOL ok I get it now :D

I assume you are using MS's compilor.. I can guess that the output of there code generation engine is faulty OR there optimization engine is faulty (they aren't supposed to be the same thing).  Either way... it doesn't work correctly. (DOH!)

As for speed... I didn't think to abuse the MMX instruction set myself.

Back to the original subject:
Use a Zbuffer it is not as time consuming as you might think, since it's used all the time as it is :)

Cyb
Title: 3d drawing order
Post by: ficedula on 2005-11-01 06:48:10
Quote from: LSpiro

That’s copying memory.
Not at all as fast as setting a linear array to a specific value.
Of course I am not going to argue when it comes to MMX instructions, but for setting a linear block of memory to a specific value, REP STOS (clearly, not REP MOV*) is the fastest for Pentium® processors.
I also won’t argue that other routines are faster on other processors, but in general REP STOS is the fastest and most widely used.


Of course copying isn't as fast as setting a constant. But: when copying, REP MOVS isn't as fast as copying manually. The obvious implication is that when setting, REP STOS isn't as fast as doing that manually either. Well, it will be on a Pentium 1. But REP STOS is just the easiest to write, by no means the fastest; I just quoted copy figures because, like you did, it was easier to put my hands on them.


Also back to the original question: if you were looking to get clever you could use a hierarchical Z-buffer, which would remove the need to clear the whole block of memory manually ... although it would be more complex!
Title: 3d drawing order
Post by: L. Spiro on 2005-11-01 07:04:02
On the discussion of using a z-buffer to solve your problem, I would be more worried about how it is used during rendering rather than just setting it to some value.
Comparing and writing floats is much slower than comparing and writing DWORDs.
Depending on the needs of your engine, you could consider using fixed-point DWORD z-buffer, but that is something you would have to carefully consider and be aware of its limitations.

But in any case, if you add a z-buffer, you no longer need to order the triangles at all.
However, if you want to be thrifty, you could add a z-buffer and order the triangles in reverse of what you have now.
Draw them from close to far.
The reason for this is that when you use the z-buffer, you are going to check each pixel for distance and write to it only if its distance is less than the new distance.
If you write all the close distances first, you won’t end up writing and rewriting as many pixels.


L. Spiro
Title: 3d drawing order
Post by: ficedula on 2005-11-01 18:41:04
If I had to guess, LSpiro, I would say that your day job involves programming on 10 year old processors ;)

Remember (not just in 3d problems, everywhere in computing), premature optimisation causes cancer. And is the root of all evil.

On a modern processor, floating point operations are perhaps slower than integer work. But your first question should be: does it matter? If your code is fast enough already, no. If the slowdown is elsewhere, no. Even if they are, you get the code working first, writing it in the easiest and clearest manner possible, then you optimise it. After finding out exactly where the slowdown is. Which probably isn't where you first guessed it would be; if it were that easy, anybody could play!

Bear in mind, here, I just ran some quick profiles on my laptop (an Intel Celeron-M), and if you use SSE, floating point operations are over twice as quick as integer...
Title: 3d drawing order
Post by: L. Spiro on 2005-11-02 02:14:33
Quote
On a modern processor, floating point operations are perhaps slower than integer work. But your first question should be: does it matter?
You’re saying slower-than-necessary code is acceptable?
He’s writing a software 3-D engine.  There isn’t hardware acceleration here.  Order triangles, rasterize them to the screen, manually fill them, checking the z-buffer along the way, and you see how fast it goes without all the optimizations you can give it.



Quote
Even if they are, you get the code working first, writing it in the easiest and clearest manner possible, then you optimise it.
Absolutely.
But the optimizations I suggested won’t cause problems, and they are things that may cause problems in the future if you have to change them to optimize.
If he decided to go with an integer z-buffer after having already written it using floats, he is going to have to be very careful about which parts of his engine that will affect.
As for the organizing of the triangles from close to far, I am going off something he is already doing.
He already posted he has a working system for drawing triangles far-to-near, so such a change as I suggested shouldn’t be a problem.
That doesn’t mean I disagree with what you said.
If he looks at his code and thinks it may be something that can wait, or that it may cause problems for whatever reason, then certainly, keep it for last.
I just generally assume people can gauge these types of things for themselves.


Quote
Bear in mind, here, I just ran some quick profiles on my laptop (an Intel Celeron-M), and if you use SSE, floating point operations are over twice as quick as integer...
That’s crazy.
I didn’t expect SSE to be THAT fast.
I have no doubts they would be close or faster by a bit, but SSE is not supported on all instruction sets, and I have noticed that even among these boards many people have been stuck with low-end machines.
I’m not programming on 10-year-old processors, but I prefer compatible code.  That’s all.


L. Spiro
Title: 3d drawing order
Post by: Cyberman on 2005-11-02 04:55:34
Quote from: ficedula
If I had to guess, LSpiro, I would say that your day job involves programming on 10 year old processors ;)

That would be me I tend to program things like ARM7 and ARM9 processors ;)

Cyb
Title: 3d drawing order
Post by: ficedula on 2005-11-02 06:51:08
The assumption I'm objecting to, really (partly because I see it among the developers at work too) is that you should make some changes "because it'll make the code quicker" even though you don't know whether it will make the code quicker! Hence the whole premature optimisation malarky. Rather than wasting time guessing which bits need to be made faster, it's preferable to get it working and then benchmark.

(How could using an integer zbuffer make it slower? Well, quite apart from the fact that if you want maximum speed, you'd later convert it back to floats to use 3dnow or SSE, do you know for sure what the overhead for converting all the incoming data from floats would be?)

I still have memories of the time I optimised all our string parsing code at work only to find out it wasn't the bottleneck after all and it was the database causing the slowdown ... not good.

Cyberman: ARM7+9 says Nintendo DS (http://www.sylphds.net/ev2/contentview.php?id=268) to me... ;)
Title: 3d drawing order
Post by: RPGillespie on 2005-11-02 15:28:42
Slightly off topic, but,
ficedula: how did you make data transfers between your computer and your Nintendo DS? Must've done it wireless or through one of the cartridge ports, correct?
Title: 3d drawing order
Post by: ficedula on 2005-11-02 16:44:21
Yep; I already had a flash2advance cartridge for my GBA, so originally I was using Wifime to boot the DS from the cartridge. I got tired of rewriting the cartridge every time, though, so then I've flashed the firmware on the DS to remove the signature checks; now I can boot the code directly over Wifime. The DS wifi bounty is nearly complete, so soon we'll have a TCP/IP stack on it...
Title: 3d drawing order
Post by: Cyberman on 2005-11-03 06:17:25
ficedula you are having entirely too much fun with your DS LOL

It's too bad the GBA simply doesn't have enough internal memory even fake running FF7 :D

Cyb
Title: 3d drawing order
Post by: Darkness on 2005-11-06 23:24:01
Another (more or less unrelated) question:

I'm working on camera angle now, and im using the equation:

screen_x = distance_from_screen * tan(arccos(a dot b))

where a is the direction of the camera vector and b is the vector from the camera through the vertex.

This always gives a positive angle (at least in the viewable area)

How can I make this give me a negative value when the vertex lies below this camera vector / Is the a more effective way to do this?

Does this make any sense?
Title: 3d drawing order
Post by: mirex on 2005-11-07 09:22:17
I don't get it, what do you want to get as a result ?
Title: 3d drawing order
Post by: Darkness on 2005-11-07 16:58:23
(http://www.x0r.net/images/angle.PNG)

I need to find the distance between the long black line and the red line along the short black line. In this case it needs to be negative because the red line falls below the black one.
Title: 3d drawing order
Post by: mirex on 2005-11-08 10:25:35
Still not enough input. What should be usage of that angle ? Why should it be negative ?

Also, there are tons of math algorithms on the net, try to google for them (aproximate name should be sufficient for google).
Title: 3d drawing order
Post by: dziugo on 2005-11-08 11:47:11
The point? Displaying this on screen. You can add 400 (if the output is in range <-400 ; 400>) to the result and you'll get X coordinate in 800x600 resolution. Now... since that algorithm always gives positive numbers, after displaying, everything will be on the right side of the screen leaving the left side blank.

I think, that using arccos there is no easy way to get negative results. You'll need to check if the point is actually on the left "side" of that black line and correct the result if necessary.

dziugo
Title: 3d drawing order
Post by: L. Spiro on 2005-11-08 15:10:52
If you are referring to rasterization, you may find it easier to use these:


Code: [Select]
INT iHalfScreenX = g_piResolution.x >> 1;
INT iHalfScreenY = g_piResolution.y >> 1;
pntReturn.x = (LONG)((FLOAT)iHalfScreenX + (FLOAT)iHalfScreenX * (d3vTarg.fX / d3vTarg.fZ));
pntReturn.y = (LONG)((FLOAT)iHalfScreenY - (FLOAT)iHalfScreenX * (d3vTarg.fY / d3vTarg.fZ));

Do not perform this equation if d3vTarg.fZ is 0.

This small section does not account for the rotation of the camera.
This is the whole code written for my Doom® 3 perfect-lock auto-aim.
Code: [Select]
// Transform a 3-D point into screen coordinates.  Returns TRUE if the coordinate is actually on the screen.
BOOL Rasterize( PDoom3Vector pd3vPlayerPos, PDoom3Direction pd3dPlayerDirection, PDoom3Vector pd3vTargetPos, POINT &pntReturn ) {
Doom3Vector d3vTarg;
// Copy the target vector so we don’t change it.
memcpy( &d3vTarg, pd3vTargetPos, sizeof( d3vTarg ) );

// Translate the vector to the center of the world, which would literally mean
// our player.
d3vTarg.fX -= pd3vPlayerPos->fX;
d3vTarg.fY -= pd3vPlayerPos->fY;
d3vTarg.fZ -= pd3vPlayerPos->fZ;

// The pitch and yaw are in radians.
FLOAT fPitch = 0.0f - D3DXToRadian( pd3dPlayerDirection->fPitch );
FLOAT fYaw = D3DXToRadian( pd3dPlayerDirection->fYaw - 90.0f );

// Rotate accordingly.  This must be done in this order!
if ( fYaw != 0.0f ) {
FLOAT fTemp = d3vTarg.fZ * (FLOAT)cos( fYaw ) - d3vTarg.fX * (FLOAT)sin( fYaw );
d3vTarg.fX = d3vTarg.fX * (FLOAT)cos( fYaw ) + d3vTarg.fZ * (FLOAT)sin( fYaw );
d3vTarg.fZ = fTemp;
}
if ( fPitch != 0.0f ) {
FLOAT fTemp = d3vTarg.fY * (FLOAT)cos( fPitch ) - d3vTarg.fZ * (FLOAT)sin( fPitch );
d3vTarg.fZ = d3vTarg.fZ * (FLOAT)cos( fPitch ) + d3vTarg.fY * (FLOAT)sin( fPitch );
d3vTarg.fY = fTemp;
}

// The result will be where the enemy is in relation to our player.

// From this 3-D point, we can directly determine the 2-D screen point where the 3-D point would be drawn.
// This is useful for drawing targets over objects or determining if an object is on the screen.
if ( d3vTarg.fZ == 0.0f ) {
pntReturn.x = (LONG)(d3vTarg.fX * (FLOAT)(g_pntResolution.x >> 1) + (FLOAT)(g_pntResolution.x >> 1));
pntReturn.y = (LONG)(d3vTarg.fY * (FLOAT)(g_pntResolution.y >> 1) + (FLOAT)(g_pntResolution.x >> 1));
}
else {
pntReturn.x = (LONG)((FLOAT)(g_pntResolution.x >> 1) + (FLOAT)(g_pntResolution.x >> 1) * (d3vTarg.fX / d3vTarg.fZ));
pntReturn.y = (LONG)((FLOAT)(g_pntResolution.y >> 1) - (FLOAT)(g_pntResolution.x >> 1) * (d3vTarg.fY / d3vTarg.fZ));
}

if ( d3vTarg.fZ < 0.0f ) { return FALSE; }
if ( pntReturn.x < 0 || pntReturn.x >= g_pntResolution.x ) { return FALSE; }
if ( pntReturn.y < 0 || pntReturn.y >= g_pntResolution.y ) { return FALSE; }


return TRUE;
}



For my auto-aim, I manually transform according to the rotation of my player’s head (camera [math shown]) but in your engine you are going to have more control over how your rotations work and you should use an optimized rotation routine.
This routine is also not optimized for handling a massive number of points.
You would want to handle your rotations separately from your rasterization, using the first code I posted as your primary rasterization method.
Pass it the final X, Y, and Z points of each vertex (or whatever) after you have rotated the whole scene according to the camera.

You can modify the formula to handle view ports also, and note that I used iHalfScreenX in both sides of the formula.  This is to keep a 1-to-1 aspect ratio.  Aspect ratios can be added to the equation as well.

I have not benchmarked it for speed.


L. Spiro