[Freeciv-Dev] Re: [RFC] Path finding implementation.
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
At 04:31 PM 02/06/30 +0200, Raimar Falke wrote:
>On Sun, Jun 30, 2002 at 06:22:45AM -0400, Ross W. Wetmore wrote:
>> At 05:09 PM 02/06/29 +0200, Raimar Falke wrote:
>> >On Sat, Jun 29, 2002 at 09:56:31AM -0400, Ross W. Wetmore wrote:
[...]
>> >So in the dynamic case we no imull but an extra memory access. The
>> >other commands are cheap.
>>
>> Which is basically the same code as before because it still does the
>> index memory lookup, and doesn't have to precompute the value, i.e. no
>> compiler optimization was broken here because this is the step that
>> you took over from the compiler.
>
>It looks like I have made my point not clear in the previous post. The
No, you just aren't understanding/listening to what others are saying
about it :-).
>dynamic_but_indexed code has to do a multiplication. However if the
>compiler knows the position and/or the sizes it can optimize this. In
>the path_finding case the compiler knows nothing. To model this I made
>the array sizes globals.
Bad model ...
>While I agree with you in general that
>globals should be avoided they solve here the problem and prevent the
>compiler to assume anything about the sizes. You can also pass the
>sizes as parameters...you get the same result.
No you don't ... arguments are locals not globals and are handled
correctly as far as optimizations are concerned.
>So in general the dynamic_but_indexed has to do a multiplication with
>unknown factors. And this is the problem spot because such imull
>instructions are expensive. More expensive than a memory access. I
>know that the last statement is only valid for certain architectures
>and than in the future the imull may be cheaper.
And if you carry your logic through, in the general case your dynamic
precomputed multiplications cannot be used for this case because they
make assumptions about the values i.e. don't recompute them on the spot.
So you are just fooling everyone with an apples and oranges comparison
or setting up for some subtle runtime errors if you believe what you
just said about the need to force the sizes to be truly "unknown".
>> The other bad thing about the dynamic reference system you propose is
>> that it is very easy to generate memory leaks, and very clumsy to set up
>> and tear down. It mistakenly gives people the idea they are dealing with
>> a single double indexed array, rather than a subtly complex set of
>> references. Coding bugs and errors are nasty.
>
>I don't see a problem here.
I know (sigh) ...
>> >> root@wetrec[318] ./dynarr_00
>> >> declared = 15220000
>> >> indexed = 10680000
>> >> dynamic = 15010000
>> >
>> >$ ./a.out
>> >declared = 4000000
>> >indexed = 6430000
>> >dynamic = 4060000
>
>If I run the attached program I get (P2-400):
>
>mull = 3.860000s
>access = 3.800000s
>
>On a sparc ultra-2 (RISC so no mull opcode) the numbers are much
>worse:
>
>mull = 31.469999s
>access = 7.880000s
All totally irrelevant numbers :-)
> Raimar
>--
Here is a tweaked example with fewer case specific optimization
differences, and no false comparisons.
The key difference is down to the leal (computations) vs movl
(memory lookups) with a couple extra register variable ops in
the leal case. The latter are probably eaten by a dual integer
pipeline (i.e. executed in parallel) so they don't impact the
overall clock cycle times. But even on an older processor one
cycle register ops vs a single memory stall probably wins.
The key differences are in the "outer" loop in all cases.
It uses global variables for the index limits, but accesses them
through locals within the loop to avoid Raimar's selective
optimization breaking practices.
This has the loops arranged to do count down so this optimization
is not dependent on any differences in compiler pattern recognition.
It is probably the cmpl instructions for the upper bound in normal
loop coding practices vs test against zero that gives the benefit.
This can be exacerbated if the upper bound is a global or computed
value.
It also times just key loops so the malloc/free times are excluded.
This removes any noise level initialization penalty for the dynamic
case.
Finally it runs things three times in case the runtime order had
some iniital load or memory management side effect (with a reduced
loop count to offset my impatience at waiting for results).
The fascinating element this time is that the key code for static
and dynamic_but_indexed is absolutely identical, so one doesn't have
to look to subtleties of flow to understand differences here :-).
Again this shows that if you leave the index computations to the
compiler/optimizer and don't do known bad things that destroy its
effectiveness, it will find the best solution in a given context.
Indexed (linearized) arrays are just as effective as static unless
you introduce type differences in the index/bounds values that
defeat optimization.
Cheers,
RossW
=====
rww@wetrec[296] gcc -O3 -o dynarr dynarr.c
rww@wetrec[297] gcc -O3 -S dynarr.c
rww@wetrec[298] ./dynarr
dynamic: 1960000
dynindex: 1520000
declared: 1520000
dynamic: 1960000
dynindex: 1510000
declared: 1520000
dynamic: 1970000
dynindex: 1520000
declared: 1520000
=====
#include<stdlib.h>
int Xsize=80, Ysize=50;
long clock0;
/********************************/
long declared()
{
int i, j;
unsigned int k, xsize = Xsize, ysize = Ysize;
int tiles[xsize][ysize];
clock0 = clock();
for(k = 0; k < 100000; k++) {
for(i = xsize; --i >= 0; ) {
for(j = ysize; --j >= 0; ) {
tiles[i][j] = i + j;
}
}
}
return (clock0 = clock() - clock0);
}
/********************************/
long dynamic_but_indexed()
{
int i, j;
unsigned int k, xsize = Xsize, ysize = Ysize;
int *tiles;
tiles = malloc(ysize * xsize * sizeof(int));
clock0 = clock();
for(k = 0; k < 100000; k++) {
for(i = xsize; --i >= 0; ) {
for(j = ysize; --j >= 0; ) {
tiles[ysize * i + j] = i + j;
}
}
}
clock0 = clock() - clock0;
free(tiles);
return clock0;
}
/********************************/
long dynamic()
{
int i, j;
unsigned int k, xsize = Xsize, ysize = Ysize;
int *tiles_storage;
int **tiles;
tiles_storage = malloc(xsize * ysize * sizeof(int));
tiles = malloc(xsize * sizeof(int *));
for(i = 0; i < xsize; i++) {
tiles[i] = tiles_storage + (i * ysize);
}
clock0 = clock();
for(k = 0; k < 100000; k++) {
for(i = xsize; --i >= 0; ) {
for(j = ysize; --j >= 0; ) {
tiles[i][j] = i + j;
}
}
}
clock0 = clock() - clock0;
free(tiles);
free(tiles_storage);
return clock0;
}
/********************************/
int main()
{
int k;
for( k = 3; --k >= 0; ) {
// clock0 = clock();
dynamic();
// clock0 = clock() - clock0;
printf("dynamic: %d\n", clock0);
// clock0 = clock();
dynamic_but_indexed();
// clock0 = clock() - clock0;
printf("dynindex: %d\n", clock0);
// long clock0 = clock();
declared();
// clock0 = clock() - clock0;
printf("declared: %d\n", clock0);
}
return 0;
}
=====
.file "dynarr.c"
.version "01.01"
gcc2_compiled.:
.globl Xsize
.data
.align 4
.type Xsize,@object
.size Xsize,4
Xsize:
.long 80
.globl Ysize
.align 4
.type Ysize,@object
.size Ysize,4
Ysize:
.long 50
.text
.align 4
.globl declared
.type declared,@function
declared:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl Xsize, %eax
subl $28, %esp
movl %eax, -20(%ebp)
movl Ysize, %eax
movl %eax, -24(%ebp)
sall $2, %eax
movl %eax, -28(%ebp)
imull -20(%ebp), %eax
addl $15, %eax
andl $-16, %eax
movl %esp, -32(%ebp)
subl %eax, %esp
movl %esp, -36(%ebp)
call clock
movl %eax, clock0
movl $0, -16(%ebp)
.p2align 2
.L21:
movl -20(%ebp), %ebx
decl %ebx
js .L34
movl %ebx, %esi
movl -28(%ebp), %edi
imull -28(%ebp), %esi
negl %edi
.p2align 2
.L25:
movl -24(%ebp), %edx
decl %edx
js .L35
movl -36(%ebp), %ecx
leal (%ecx,%esi), %eax
leal (%eax,%edx,4), %ecx
.p2align 2
.L29:
leal (%edx,%ebx), %eax
movl %eax, (%ecx)
subl $4, %ecx
decl %edx
jns .L29
.L35:
addl %edi, %esi
decl %ebx
jns .L25
.L34:
incl -16(%ebp)
cmpl $99999, -16(%ebp)
jbe .L21
call clock
subl clock0, %eax
movl %eax, clock0
movl -32(%ebp), %esp
leal -12(%ebp), %esp
popl %ebx
popl %esi
popl %edi
popl %ebp
ret
.Lfe1:
.size declared,.Lfe1-declared
.align 4
.globl dynamic
.type dynamic,@function
dynamic:
pushl %ebp
movl %esp, %ebp
pushl %edi
movl Xsize, %edi
pushl %esi
movl %edi, %eax
movl Ysize, %esi
pushl %ebx
imull %esi, %eax
subl $24, %esp
sall $2, %eax
pushl %eax
call malloc
movl %eax, -20(%ebp)
leal 0(,%edi,4), %eax
movl %eax, (%esp)
xorl %ebx, %ebx
call malloc
addl $16, %esp
cmpl %edi, %ebx
movl %eax, -24(%ebp)
jae .L73
xorl %edx, %edx
.p2align 2
.L56:
movl -20(%ebp), %ecx
leal (%ecx,%edx,4), %eax
movl -24(%ebp), %ecx
movl %eax, (%ecx,%ebx,4)
incl %ebx
addl %esi, %edx
cmpl %edi, %ebx
jb .L56
.L73:
call clock
movl %eax, clock0
movl $0, -16(%ebp)
.p2align 2
.L61:
movl %edi, %ebx
decl %ebx
js .L75
.p2align 2
.L65:
movl %esi, %edx
decl %edx
js .L76
movl -24(%ebp), %eax
movl (%eax,%ebx,4), %ecx
.p2align 2
.L69:
leal (%edx,%ebx), %eax
movl %eax, (%ecx,%edx,4)
decl %edx
jns .L69
.L76:
decl %ebx
jns .L65
.L75:
incl -16(%ebp)
cmpl $99999, -16(%ebp)
jbe .L61
call clock
subl clock0, %eax
subl $12, %esp
pushl -24(%ebp)
movl %eax, clock0
call free
popl %eax
pushl -20(%ebp)
call free
addl $16, %esp
leal -12(%ebp), %esp
popl %ebx
popl %esi
movl clock0, %eax
popl %edi
popl %ebp
ret
.Lfe2:
.size dynamic,.Lfe2-dynamic
.section .rodata
.LC0:
.string "dynamic: %d\n"
.LC1:
.string "dynindex: %d\n"
.LC2:
.string "declared: %d\n"
.text
.align 4
.globl main
.type main,@function
main:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $28, %esp
movl $2, -16(%ebp)
.p2align 2
.L81:
call dynamic
subl $8, %esp
pushl clock0
pushl $.LC0
call printf
movl Xsize, %eax
movl %eax, -24(%ebp)
movl Ysize, %eax
movl %eax, -28(%ebp)
imull -24(%ebp), %eax
sall $2, %eax
movl %eax, (%esp)
call malloc
movl %eax, -32(%ebp)
call clock
movl %eax, clock0
movl $0, -20(%ebp)
addl $16, %esp
.p2align 2
.L84:
movl -24(%ebp), %ebx
decl %ebx
js .L101
movl %ebx, %esi
movl -28(%ebp), %edi
imull -28(%ebp), %esi
negl %edi
.p2align 2
.L87:
movl -28(%ebp), %edx
decl %edx
js .L102
leal (%edx,%esi), %ecx
movl -32(%ebp), %eax
leal (%eax,%ecx,4), %ecx
.p2align 2
.L90:
leal (%edx,%ebx), %eax
movl %eax, (%ecx)
subl $4, %ecx
decl %edx
jns .L90
.L102:
addl %edi, %esi
decl %ebx
jns .L87
.L101:
incl -20(%ebp)
cmpl $99999, -20(%ebp)
jbe .L84
call clock
subl clock0, %eax
subl $12, %esp
movl %eax, clock0
pushl -32(%ebp)
call free
popl %ebx
popl %esi
pushl clock0
pushl $.LC1
call printf
call declared
popl %edx
popl %ecx
pushl clock0
pushl $.LC2
call printf
addl $16, %esp
decl -16(%ebp)
jns .L81
leal -12(%ebp), %esp
popl %ebx
popl %esi
xorl %eax, %eax
popl %edi
popl %ebp
ret
.Lfe3:
.size main,.Lfe3-main
.comm clock0,4,4
.align 4
.globl dynamic_but_indexed
.type dynamic_but_indexed,@function
dynamic_but_indexed:
pushl %ebp
movl %esp, %ebp
pushl %edi
pushl %esi
pushl %ebx
movl Xsize, %eax
subl $40, %esp
movl %eax, -20(%ebp)
movl Ysize, %eax
movl %eax, -24(%ebp)
imull -20(%ebp), %eax
sall $2, %eax
pushl %eax
call malloc
movl %eax, -28(%ebp)
call clock
movl %eax, clock0
movl $0, -16(%ebp)
addl $16, %esp
.p2align 2
.L40:
movl -20(%ebp), %ebx
decl %ebx
js .L104
movl %ebx, %esi
movl -24(%ebp), %edi
imull -24(%ebp), %esi
negl %edi
.p2align 2
.L44:
movl -24(%ebp), %edx
decl %edx
js .L105
leal (%edx,%esi), %ecx
movl -28(%ebp), %eax
leal (%eax,%ecx,4), %ecx
.p2align 2
.L48:
leal (%edx,%ebx), %eax
movl %eax, (%ecx)
subl $4, %ecx
decl %edx
jns .L48
.L105:
addl %edi, %esi
decl %ebx
jns .L44
.L104:
incl -16(%ebp)
cmpl $99999, -16(%ebp)
jbe .L40
call clock
subl clock0, %eax
subl $12, %esp
pushl -28(%ebp)
movl %eax, clock0
call free
addl $16, %esp
leal -12(%ebp), %esp
popl %ebx
popl %esi
movl clock0, %eax
popl %edi
popl %ebp
ret
.Lfe4:
.size dynamic_but_indexed,.Lfe4-dynamic_but_indexed
.ident "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.0)"
=====
- [Freeciv-Dev] Re: [RFC] Path finding implementation., (continued)
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/28
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., G. Dyke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Ross W. Wetmore, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Ross W. Wetmore, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Ross W. Wetmore, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation.,
Ross W. Wetmore <=
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/29
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Raimar Falke, 2002/06/30
- [Freeciv-Dev] Re: [RFC] Path finding implementation., Gregory Berkolaiko, 2002/06/30
|
|