Author Topic: 3d drawing order  (Read 14553 times)

Darkness

  • *
  • Posts: 2181
    • View Profile
    • http://www.x0r.net
3d drawing order
« on: 2005-10-29 16:31:17 »
I'm working on a 3d rendering program. It's built on top of a simple 2d drawing library. I'm trying to make the polygons draw in the correct order, objects appear solid. Currently, am ordering it by the distance between the furthest of a triangle vertex and the camera. It works pretty well, except for instances where two vertices are the same distance away. Then it orders them randomly.

Does anyone have any suggestions on how I should sort this?

Cyberman

  • *
  • Posts: 1572
    • View Profile
Re: 3d drawing order
« Reply #1 on: 2005-10-29 18:15:03 »
Quote from: Darkness
I'm working on a 3d rendering program. It's built on top of a simple 2d drawing library. I'm trying to make the polygons draw in the correct order, objects appear solid. Currently, am ordering it by the distance between the furthest of a triangle vertex and the camera. It works pretty well, except for instances where two vertices are the same distance away. Then it orders them randomly.

Does anyone have any suggestions on how I should sort this?

Surface culling is what I suggest.
First order the surfaces. Then reject surfaces that are obscured by closer surfaces. If you have 2 surfaces that are intersect or are at the same distance you should perform an intersection operation and surface splitting if needed. If they do not visually colide everything is fine. If they hit one another, you need to prune off each section (Split them) that is obscured by the other surface.  If they have the SAME location go by SIZE, IE the bigger polygon wins for visability as this makes things simpler.

Cyb

Darkness

  • *
  • Posts: 2181
    • View Profile
    • http://www.x0r.net
3d drawing order
« Reply #2 on: 2005-10-29 19:24:25 »
Thanks for the quick reply. Unfortunately, I think this might be a bit too processor intensive for what I'm doing.

I've been trying something along the lines of painters method.

I've changed it so I average the vertices of a polygon to come up with a center point, and order the polygon by the distance between this point and the position of the 'camera.'

Ex:
Code: [Select]

avg_x = (vertices[triangles[i].a].x + vertices[triangles[i].b].x + vertices[triangles[i].c].x)/3.0;
avg_y = (vertices[triangles[i].a].y + vertices[triangles[i].b].y + vertices[triangles[i].c].y)/3.0;
avg_z = (vertices[triangles[i].a].z + vertices[triangles[i].b].z + vertices[triangles[i].c].z)/3.0;

d = sqrt((avg_x - cam.x)*(avg_x - cam.x) + (avg_y - cam.y)*(avg_y - cam.y) + (avg_z - cam.z)*(avg_z - cam.z));

lengths.push_back(d);


Works right most of the time, but:


Anyway, I'd be interested in any way to correct this method, and efficient way to zbuffer, etc.

halkun

  • Global moderator
  • *
  • Posts: 2097
  • NicoNico :)
    • View Profile
    • Q-Gears Homepage
3d drawing order
« Reply #3 on: 2005-10-29 19:44:21 »
You are going to have problems when you start dealing with models that have convex sides.

I don't thing Z biffering is that CPU intensive. Take a look for it.

Micky

  • *
  • Posts: 300
    • View Profile
3d drawing order
« Reply #4 on: 2005-10-29 22:50:49 »
What you could do is build a BSP tree. Then when drawing you test each node with the viewing direction, then first draw the branch "behind" the node, the node itself and then the branch "in front" of the node. That way you will always get perfect depth ordering without sorting by vertex. Games like Doom and Quake do something like this.
But if you have any chance, try to use OpenGL or DirectX. A Z-buffer makes life a lot easier, and you'll get hardware acceleration.

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #5 on: 2005-10-30 03:27:28 »
Just a note on optimization: if you don’t actually need to know the exact distance of objects, for example, if you are only calculating distances for comparison routines, then you should not perform the sqrt() operation.

And you may want to look into operator overloading since it seems you are writing your own vectors (and probably matrices).


(vertices[triangles.a] + vertices[triangles.b] + vertices[triangles.c]) / 3.0f is much easier to write.


The problem you described above can be fixed with culling.
Determine the order of the vertices as they appear on the screen and if they go clockwise, draw them, and if they go counter-clockwise, don’t.
You can switch the order as you desire.


You will also want to implement a z-buffer.
Simply store a FLOAT for every pixel on the screen and when you draw a pixel, store its depth to the respective FLOAT.  Then check the respective FLOAT’s before drawing other pixels.
Before each frame you have to clear the z-buffer.  First you would want to decide how far back to set the z-buffer, or you could use -1.0f and declare it as meaning the z-buffer pixel is empty.  In the first case you just keep drawing and checking distances.
In the second case you would have to add a second check for -1.0f, and if found, draw.
But in either case, the fastest way to set the buffer is to use a hex calculator and calculate the DWORD representation of the FLOAT you desire and use memset() with that value.

So, for example, if I have a 640×480 display and my z-buffer should be set to -1.0f, I would do:
Code: [Select]
memset( g_fZBuffer, 0x000080BF, sizeof( FLOAT ) * 640 * 480 );


L. Spiro

mirex

  • *
  • Posts: 1645
    • View Profile
    • http://mirex.mypage.sk
3d drawing order
« Reply #6 on: 2005-10-30 07:33:38 »
L.Spiro: I think that everything you said is allright, except the
Code: [Select]
memset( g_fZBuffer, 0x000080BF, sizeof( FLOAT ) * 640 * 480 ); .. because I think that memset() casts 2nd parameter to unsigned char usually, so it won't help you to set the floats, I would use the for() loop instead:
Code: [Select]
int  i; float g_fZBuffer[ 640*480 ];
for( i=0; i<640*480; i++ )
  g_fZBuffer[ i ] = -1.0;

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #7 on: 2005-10-30 08:42:14 »
Yup.

Better to try this then:


Code: [Select]
for ( INT I = 640 * 480; --I >= 0; ) {
*(DWORD *)&g_fZBuffer[I] = 0x000080BF;
}


The code that is generated should avoid using any form of floating-point registers.

Depending on your project optimization settings, however, it may compile into the same thing even if you use mirex’s code except of course the order (so both codes would work equally well).

The only way to be positive is to write it in assembly.
Clearing the buffer is something that will happen every frame, so you don’t want to half-ass it.


L. Spiro

Cyberman

  • *
  • Posts: 1572
    • View Profile
Abusing poninters properly
« Reply #8 on: 2005-10-30 16:11:05 »
This is a good place to use pointers.

First if you use an index into a pointer as an array you are likely adding a lot of additional operations to your code. SO It might (in order to not depend on the compilors optimization capabilities) be a little faster to do this
Code: [Select]

DWORD *Ptr= (DWORD *_&g_fZBuffer;
for ( INT I = 640 * 480; --I >= 0; ) {
   *Ptr++ = 0x000080BF;
}

from what I've seen of most compilor optimizations using the code you provided would end up being something like this
// using psuedo ops
Code: [Select]

load I register with 307200
loop:
Load effective address of g_fZBuff to Ref
move I to Index
multiply index by sizeof(DWORD)
add result to Ref
mov [Ref], 0x000080BF
decrement I
jump if not zero loop

where as the aforementioned code would be more like
Code: [Select]

load effect addres of g_fZBuff to Ref
load I register with 307200
Loop:
move [Ref], 0x000080BF
add Ref, sizeof(DWORD)
decrement I
jump if not zero loop

And does exactly the same thing :D

Cyb

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #9 on: 2005-10-30 17:34:49 »
I originally wrote my code similarly to the way you had it, but since it requires a full instruction to decrement I and another instruction to increment the pointer, I decided it would be faster to go the other way, since it will just use a single instruction to access the array location and set its value.

But when I tried to compare the actual compiled code to get the results, the method you posted seems to trick the compiler and with optimizations enabled, it simply isn’t added into the code.
Literally, the compiler, with full optimizations, will think the code is not doing anything and it won’t compile it.
You can get similar results by doing this:
Code: [Select]
for ( DWORD I =0 ; I < 765765; I++ ) {
INT KJHJH = 0;
}

With full optimizations, it will omit “useless” code such as this.
If I use my debug build, with no optimizations, both sets of code are compiled into the .exe.



As a result, I can not show the actual code produced by the method you posted, but here is what is compiled by the method I posted:

Code: [Select]
mov eax, 4B000h
mov ecx, 80BFh
LOOP :
dec eax
mov dword ptr [esp+eax*4], ecx
jns LOOP

Here, the loop consists of three total instructions, including the jns check.


To get the other method I have to use the debug build.
In debug, the method I posted:
Code: [Select]
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [I]
mov dword ptr g_fZBuffer[eax*4], 80BFh
jmp LOOP
END :

Holy crap that is inefficient!
That was the method I posted.


Now the method you posted, using “pVal” as my pointer through the list:
Code: [Select]
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [pVal]
mov dword ptr [eax], 80BFh
mov ecx, dword ptr [pVal]
add ecx, 4
mov dword ptr [pVal], ecx
jmp LOOP
END :

Both sets of code come out terribly in debug compilation.
But the problem I expected was at the end.
In debug there are 3 extra instructions used to increase the pointer.
I expected in retail compilation there would only be one (add [pVal], 4), but that is enough.




This is the code I would suggest:
Code: [Select]
mov eax, 0xBF800000
mov ecx, 4B000h
lea edi, [g_fZBuffer]
rep stos dword ptr [edi]

It is the fastest way to set a large number of bytes to the same value.
Also, it was my mistake above.  You should use 0xBF800000 instead of 0x000080BF.
I saw 0xBF800000 in my mind but typed it in reverse for whatever reason.


L. Spiro

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #10 on: 2005-10-30 20:55:20 »
Oooh, assembly ;)

First rule of assembly programming: Don't do it unless you know more than the compiler does.

Second rule: The compiler always knows something you don't ;)


Case in point; cyberman, your idea that using the pointer is better (because if you use an index, you just have to "recalculate" the pointer anyway, each time around the loop) would be true on some processors ... not on the x86! Or at least a fair few x86 processors, there being a large number of dies out there. Address calculation is practically free on the x86, completely free in terms of instruction count. Hence LSpiro's example with only three instructions in the loop.


Second case in point: LSpiro's suggested code is also sub-optimal for most processors ... isn't rep movs / rep stos the quickest way to move data around? One instruction to blast a whole chunk of data around? Er, well, no ... it moves data in 4 bytes chunks which is frankly pretty small fry. Loading an FPU register with a zero value and then blasting 64 bytes at a time into memory (8 bytes per register, and store it 8 times per loop) may be faster; it's effectively a form of loop unrolling. The more time spent moving data rather than checking "are we done yet?" the better.

But is there a more efficient way? Of course. If you have an Athlon XP, or a P4, you've got SSE1 at your disposal. That's 16 bytes per register. And you can do non-temporal moves, meaning that it'll fire off a request to move the data to the memory controller and then continue on to process subsequent instructions without waiting for the move to finish. Don't want to rely on SSE? Use 3dnow/MMX for the same purpose, although then you're back to the 8 bytes per register of the FPU.

Better yet, find the routine in your runtime that does all of this work for you, I would hope that there was a function which did memset() but for DWORD / QWORD sized quantities. Then let the runtime worry about whether you have SSE, or are running on x86-64, or whatever, it's what it's there for. Not to suggest that learning assembler is completely useless ... but you all know about premature optimisation, right? ;)

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #11 on: 2005-10-31 02:53:12 »
Quote
Second rule: The compiler always knows something you don't
I would have agreed with that until last night when the compiler didn’t know well enough to compile Cyberman’s loop.
If it had been in a real-case scenario and found a bug in my program, I wouldn’t have suspected that my buffer-initialization code was simply not compiled into the final .exe.



Quote
Loading an FPU register with a zero value and then blasting 64 bytes at a time into memory (8 bytes per register, and store it 8 times per loop) may be faster; it's effectively a form of loop unrolling. The more time spent moving data rather than checking "are we done yet?" the better.
How many cycles does it take to write from the FPU to an address and then decrement your counter and then check for 0?
I don’t actually know, which is why I am asking.
There is no REP prefix with any of the FPU register operations, so you would have to write a loop with the check for 0.
Now, you’re going to be writing twice the information at once, which means this method can have up to 6 cycles before it becomes slower than REP STOS.
But I can’t find a full table to compare the actual results; I could only find results on REP STOS (3 cycles) because it seems it is the most favored method for filling aligned linear memory buffers.  I suspect it would be close.


L. Spiro



[EDIT]
I finally found some information on using the FPU to transfer 8 bytes at a time:
Quote
Floating point instructions can be used to move 8 bytes at a time:
FILD QWORD PTR [ESI] / FISTP QWORD PTR [EDI]
This is only an advantage if the destination is not in the cache. The
optimal way to move a block of data to uncached memory on the Pentium is:

TopOfLoop:
FILD QWORD PTR [ESI]
FILD QWORD PTR [ESI+8]
FXCH
FISTP QWORD PTR [EDI]
FISTP QWORD PTR [EDI+8]
ADD ESI,16
ADD EDI,16
DEC ECX
JNZ TopOfLoop

The source and destination should of course be aligned by 8. The extra time
used by the slow FILD and FISTP instructions is compensated for by the fact
that you only have to do half as many write operations.  Note that this
method is only advantageous on the Pentium and only if the destination is
not in the cache. On all other processors the optimal way to move blocks of
data is REP MOVSD, or if you have a processor with MMX you may use the MMX
instructions in stead to write 8 bytes at a time.


But this is in regards to copying bytes rather than writing a constant repeatedly.
But it’s all I could find.
[/EDIT]

mirex

  • *
  • Posts: 1645
    • View Profile
    • http://mirex.mypage.sk
3d drawing order
« Reply #12 on: 2005-10-31 08:03:14 »
Hehe we are absolutely off topic with this assembly stuff guys !! But it could be done also like this: ;)
Code: [Select]
mov cx, 4B000h
mov ax, ptr pVal
mov di, ax
push ds
pop es
mov eax, 80BFh
rep stosdw


I hope there is stosdw, I don't remember this anymore.

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #13 on: 2005-10-31 18:05:03 »
LSpiro: Try downloading the processor documentation from the CPU manufacturers. Again, it'll mostly be devoted to copies rather than constant stores; but, here's the figures from my Athlon XP processor manuals;

REP MOVSB: 570MB/s
REP MOVSD: 700MB/s
Simple loop: 720MB/s (so just writing the loop out without any optimisation is quicker than REP MOVS on a modern CPU!)
Unrolled/grouped loop: 750MB/s
MMX registers: 800MB/s
MMX registers, non-temporal move: 1120MB/s
MMX, non-temporal, prefetched: 1250MB/s
MMX, non-temporal, block prefetch: 1630MB/s

Kind of interesting; back in the days of 486 and earlier, the simple rule was: the less instructions the better, most instructions took about the same length of time to execute (not all, of course), so less instructions = less fetching from memory = quicker. Nowadays ... well, you can see from the loop above, not only is the optimised loop over twice as quick as a simple REP MOVS, just writing the loop out manually (using MOV/DEC/JNZ) is quicker too!

Cyberman

  • *
  • Posts: 1572
    • View Profile
3d drawing order
« Reply #14 on: 2005-10-31 20:02:08 »
Quote from: L. Spiro
I originally wrote my code similarly to the way you had it, but since it requires a full instruction to decrement I and another instruction to increment the pointer, I decided it would be faster to go the other way, since it will just use a single instruction to access the array location and set its value.

But when I tried to compare the actual compiled code to get the results, the method you posted seems to trick the compiler and with optimizations enabled, it simply isn’t added into the code.
Literally, the compiler, with full optimizations, will think the code is not doing anything and it won’t compile it.
You can get similar results by doing this:
Code: [Select]
for ( DWORD I =0 ; I < 765765; I++ ) {
INT KJHJH = 0;
}

With full optimizations, it will omit “useless” code such as this.
If I use my debug build, with no optimizations, both sets of code are compiled into the .exe.

That is useless code.. what is it doing inside the loop? Nothing so eliminating it is perfectly legitimate. You are setting a variable to 0 thousands of times that only persists inside the for loop thus it's doing nothing at all.  Techically it would check what it's doing with the variable inside the loop, if the variable affects nothing outside the loop it's elminated and so you have an empty loop as a result. Empty loops are removed and thus it comes out to nothing.

The variable needs defined outside the loop to first persist (IE doing something to the compilor).
This code I compiled
Code: [Select]
void __fastcall TForm1::Button1Click(TObject *Sender)
{
   // clear buffer
   DWORD *Ptr = (DWORD *)&DepthBuffer;
   for(int I = 640*480; --I >=0;)
   {
      *Ptr = 0x000080BF;
   }
}

This is the resulting assembly output suprisingly close I think
Code: [Select]

@6:
push      ebp
mov       ebp,esp
?debug L 29
?live16390@16: ; EAX = this
add       eax,724
?debug L 31
?live16390@32: ; EAX = Ptr
@7:
mov       edx,307200
jmp       short @9
?debug L 33
?live16390@48: ; EDX = I, EAX = Ptr
@8:
mov       dword ptr [eax],32959
add       eax,4
?debug L 31
@10:
@9:
dec       eax
jns       short @8
?debug L 35
?live16390@80: ;
@12:
pop       ebp
ret

Cyb

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #15 on: 2005-11-01 01:49:11 »
Quote
Nowadays ... well, you can see from the loop above, not only is the optimised loop over twice as quick as a simple REP MOVS, just writing the loop out manually (using MOV/DEC/JNZ) is quicker too!

That’s copying memory.
Not at all as fast as setting a linear array to a specific value.
Of course I am not going to argue when it comes to MMX instructions, but for setting a linear block of memory to a specific value, REP STOS (clearly, not REP MOV*) is the fastest for Pentium® processors.
I also won’t argue that other routines are faster on other processors, but in general REP STOS is the fastest and most widely used.





Quote
That is useless code.. what is it doing inside the loop? Nothing so eliminating it is perfectly legitimate. You are setting a variable to 0 thousands of times that only persists inside the for loop thus it's doing nothing at all. Techically it would check what it's doing with the variable inside the loop, if the variable affects nothing outside the loop it's elminated and so you have an empty loop as a result. Empty loops are removed and thus it comes out to nothing.

I don’t think you quite got my point.
I posted that code as an example of useless code.
And I explained why it would be omitted already.
The point was that your code is omitted also, because the compiler thinks it is useless, when of course we all know it is not.

Setting the variable outside the loop does nothing.
I have already written your loop with both the float array and the incemental pointer declared outside the loop, and I further went on to make sure the float array was being used outside the loop, but to the point, the compiler has a bug and does not compile your code.
That’s all my point was.



L. Spiro

Cyberman

  • *
  • Posts: 1572
    • View Profile
3d drawing order
« Reply #16 on: 2005-11-01 04:59:36 »
Quote from: L. Spiro
I don’t think you quite got my point.
I posted that code as an example of useless code.
And I explained why it would be omitted already.
The point was that your code is omitted also, because the compiler thinks it is useless, when of course we all know it is not.

Setting the variable outside the loop does nothing.
I have already written your loop with both the float array and the incemental pointer declared outside the loop, and I further went on to make sure the float array was being used outside the loop, but to the point, the compiler has a bug and does not compile your code.
That’s all my point was.



L. Spiro

LOL ok I get it now :D

I assume you are using MS's compilor.. I can guess that the output of there code generation engine is faulty OR there optimization engine is faulty (they aren't supposed to be the same thing).  Either way... it doesn't work correctly. (DOH!)

As for speed... I didn't think to abuse the MMX instruction set myself.

Back to the original subject:
Use a Zbuffer it is not as time consuming as you might think, since it's used all the time as it is :)

Cyb

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #17 on: 2005-11-01 06:48:10 »
Quote from: LSpiro

That’s copying memory.
Not at all as fast as setting a linear array to a specific value.
Of course I am not going to argue when it comes to MMX instructions, but for setting a linear block of memory to a specific value, REP STOS (clearly, not REP MOV*) is the fastest for Pentium® processors.
I also won’t argue that other routines are faster on other processors, but in general REP STOS is the fastest and most widely used.


Of course copying isn't as fast as setting a constant. But: when copying, REP MOVS isn't as fast as copying manually. The obvious implication is that when setting, REP STOS isn't as fast as doing that manually either. Well, it will be on a Pentium 1. But REP STOS is just the easiest to write, by no means the fastest; I just quoted copy figures because, like you did, it was easier to put my hands on them.


Also back to the original question: if you were looking to get clever you could use a hierarchical Z-buffer, which would remove the need to clear the whole block of memory manually ... although it would be more complex!

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #18 on: 2005-11-01 07:04:02 »
On the discussion of using a z-buffer to solve your problem, I would be more worried about how it is used during rendering rather than just setting it to some value.
Comparing and writing floats is much slower than comparing and writing DWORDs.
Depending on the needs of your engine, you could consider using fixed-point DWORD z-buffer, but that is something you would have to carefully consider and be aware of its limitations.

But in any case, if you add a z-buffer, you no longer need to order the triangles at all.
However, if you want to be thrifty, you could add a z-buffer and order the triangles in reverse of what you have now.
Draw them from close to far.
The reason for this is that when you use the z-buffer, you are going to check each pixel for distance and write to it only if its distance is less than the new distance.
If you write all the close distances first, you won’t end up writing and rewriting as many pixels.


L. Spiro

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #19 on: 2005-11-01 18:41:04 »
If I had to guess, LSpiro, I would say that your day job involves programming on 10 year old processors ;)

Remember (not just in 3d problems, everywhere in computing), premature optimisation causes cancer. And is the root of all evil.

On a modern processor, floating point operations are perhaps slower than integer work. But your first question should be: does it matter? If your code is fast enough already, no. If the slowdown is elsewhere, no. Even if they are, you get the code working first, writing it in the easiest and clearest manner possible, then you optimise it. After finding out exactly where the slowdown is. Which probably isn't where you first guessed it would be; if it were that easy, anybody could play!

Bear in mind, here, I just ran some quick profiles on my laptop (an Intel Celeron-M), and if you use SSE, floating point operations are over twice as quick as integer...

L. Spiro

  • *
  • Posts: 797
    • View Profile
    • http://www.memoryhacking.com/index.php
3d drawing order
« Reply #20 on: 2005-11-02 02:14:33 »
Quote
On a modern processor, floating point operations are perhaps slower than integer work. But your first question should be: does it matter?
You’re saying slower-than-necessary code is acceptable?
He’s writing a software 3-D engine.  There isn’t hardware acceleration here.  Order triangles, rasterize them to the screen, manually fill them, checking the z-buffer along the way, and you see how fast it goes without all the optimizations you can give it.



Quote
Even if they are, you get the code working first, writing it in the easiest and clearest manner possible, then you optimise it.
Absolutely.
But the optimizations I suggested won’t cause problems, and they are things that may cause problems in the future if you have to change them to optimize.
If he decided to go with an integer z-buffer after having already written it using floats, he is going to have to be very careful about which parts of his engine that will affect.
As for the organizing of the triangles from close to far, I am going off something he is already doing.
He already posted he has a working system for drawing triangles far-to-near, so such a change as I suggested shouldn’t be a problem.
That doesn’t mean I disagree with what you said.
If he looks at his code and thinks it may be something that can wait, or that it may cause problems for whatever reason, then certainly, keep it for last.
I just generally assume people can gauge these types of things for themselves.


Quote
Bear in mind, here, I just ran some quick profiles on my laptop (an Intel Celeron-M), and if you use SSE, floating point operations are over twice as quick as integer...
That’s crazy.
I didn’t expect SSE to be THAT fast.
I have no doubts they would be close or faster by a bit, but SSE is not supported on all instruction sets, and I have noticed that even among these boards many people have been stuck with low-end machines.
I’m not programming on 10-year-old processors, but I prefer compatible code.  That’s all.


L. Spiro

Cyberman

  • *
  • Posts: 1572
    • View Profile
3d drawing order
« Reply #21 on: 2005-11-02 04:55:34 »
Quote from: ficedula
If I had to guess, LSpiro, I would say that your day job involves programming on 10 year old processors ;)

That would be me I tend to program things like ARM7 and ARM9 processors ;)

Cyb

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #22 on: 2005-11-02 06:51:08 »
The assumption I'm objecting to, really (partly because I see it among the developers at work too) is that you should make some changes "because it'll make the code quicker" even though you don't know whether it will make the code quicker! Hence the whole premature optimisation malarky. Rather than wasting time guessing which bits need to be made faster, it's preferable to get it working and then benchmark.

(How could using an integer zbuffer make it slower? Well, quite apart from the fact that if you want maximum speed, you'd later convert it back to floats to use 3dnow or SSE, do you know for sure what the overhead for converting all the incoming data from floats would be?)

I still have memories of the time I optimised all our string parsing code at work only to find out it wasn't the bottleneck after all and it was the database causing the slowdown ... not good.

Cyberman: ARM7+9 says Nintendo DS to me... ;)

RPGillespie

  • *
  • Posts: 427
    • View Profile
    • http://www.geocities.com/rpgillespie6
3d drawing order
« Reply #23 on: 2005-11-02 15:28:42 »
Slightly off topic, but,
ficedula: how did you make data transfers between your computer and your Nintendo DS? Must've done it wireless or through one of the cartridge ports, correct?

ficedula

  • *
  • Posts: 2178
    • View Profile
    • http://www.ficedula.co.uk
3d drawing order
« Reply #24 on: 2005-11-02 16:44:21 »
Yep; I already had a flash2advance cartridge for my GBA, so originally I was using Wifime to boot the DS from the cartridge. I got tired of rewriting the cartridge every time, though, so then I've flashed the firmware on the DS to remove the signature checks; now I can boot the code directly over Wifime. The DS wifi bounty is nearly complete, so soon we'll have a TCP/IP stack on it...