I originally wrote my code similarly to the way you had it, but since it requires a full instruction to decrement I and another instruction to increment the pointer, I decided it would be faster to go the other way, since it will just use a single instruction to access the array location and set its value.
But when I tried to compare the actual compiled code to get the results, the method you posted seems to trick the compiler and with optimizations enabled, it simply isn’t added into the code.
Literally, the compiler, with full optimizations, will think the code is not doing anything and it won’t compile it.
You can get similar results by doing this:
for ( DWORD I =0 ; I < 765765; I++ ) {
INT KJHJH = 0;
}
With full optimizations, it will omit “useless†code such as this.
If I use my debug build, with no optimizations, both sets of code are compiled into the .exe.
As a result, I can not show the actual code produced by the method you posted, but here is what is compiled by the method I posted:
mov eax, 4B000h
mov ecx, 80BFh
LOOP :
dec eax
mov dword ptr [esp+eax*4], ecx
jns LOOP
Here, the loop consists of three total instructions, including the jns check.
To get the other method I have to use the debug build.
In debug, the method I posted:
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [I]
mov dword ptr g_fZBuffer[eax*4], 80BFh
jmp LOOP
END :
Holy crap that is inefficient!
That was the method I posted.
Now the method you posted, using “pVal†as my pointer through the list:
mov dword ptr [I], 4B000h
LOOP :
mov eax, dword ptr [I]
sub eax, 1
mov dword ptr [I], eax
js END
mov eax, dword ptr [pVal]
mov dword ptr [eax], 80BFh
mov ecx, dword ptr [pVal]
add ecx, 4
mov dword ptr [pVal], ecx
jmp LOOP
END :
Both sets of code come out terribly in debug compilation.
But the problem I expected was at the end.
In debug there are 3 extra instructions used to increase the pointer.
I expected in retail compilation there would only be one (add [pVal], 4), but that is enough.
This is the code I would suggest:
mov eax, 0xBF800000
mov ecx, 4B000h
lea edi, [g_fZBuffer]
rep stos dword ptr [edi]
It is the fastest way to set a large number of bytes to the same value.
Also, it was my mistake above. You should use 0xBF800000 instead of 0x000080BF.
I saw 0xBF800000 in my mind but typed it in reverse for whatever reason.
L. Spiro