Complete.Org: Mailing Lists: Archives: freeciv-dev: June 2002:
[Freeciv-Dev] Re: [RFC] Path finding implementation.
Home

[Freeciv-Dev] Re: [RFC] Path finding implementation.

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Raimar Falke <rf13@xxxxxxxxxxxxxxxxx>
Cc: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>, Gregory Berkolaiko <Gregory.Berkolaiko@xxxxxxxxxxxx>, Freeciv Development List <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: [RFC] Path finding implementation.
From: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>
Date: Sun, 30 Jun 2002 16:17:00 -0400

At 04:31 PM 02/06/30 +0200, Raimar Falke wrote:
>On Sun, Jun 30, 2002 at 06:22:45AM -0400, Ross W. Wetmore wrote:
>> At 05:09 PM 02/06/29 +0200, Raimar Falke wrote:
>> >On Sat, Jun 29, 2002 at 09:56:31AM -0400, Ross W. Wetmore wrote:
[...]
>> >So in the dynamic case we no imull but an extra memory access. The
>> >other commands are cheap.
>> 
>> Which is basically the same code as before because it still does the
>> index memory lookup, and doesn't have to precompute the value, i.e. no
>> compiler optimization was broken here because this is the step that
>> you took over from the compiler.
>
>It looks like I have made my point not clear in the previous post. The

No, you just aren't understanding/listening to what others are saying 
about it :-).

>dynamic_but_indexed code has to do a multiplication. However if the
>compiler knows the position and/or the sizes it can optimize this. In
>the path_finding case the compiler knows nothing. To model this I made
>the array sizes globals. 

Bad model ...

>While I agree with you in general that
>globals should be avoided they solve here the problem and prevent the
>compiler to assume anything about the sizes. You can also pass the
>sizes as parameters...you get the same result.

No you don't ... arguments are locals not globals and are handled 
correctly as far as optimizations are concerned.

>So in general the dynamic_but_indexed has to do a multiplication with
>unknown factors. And this is the problem spot because such imull
>instructions are expensive. More expensive than a memory access. I
>know that the last statement is only valid for certain architectures
>and than in the future the imull may be cheaper.

And if you carry your logic through, in the general case your dynamic
precomputed multiplications cannot be used for this case because they
make assumptions about the values i.e. don't recompute them on the spot.

So you are just fooling everyone with an apples and oranges comparison
or setting up for some subtle runtime errors if you believe what you
just said about the need to force the sizes to be truly "unknown".

>> The other bad thing about the dynamic reference system you propose is 
>> that it is very easy to generate memory leaks, and very clumsy to set up 
>> and tear down. It mistakenly gives people the idea they are dealing with
>> a single double indexed array, rather than a subtly complex set of
>> references. Coding bugs and errors are nasty.
>
>I don't see a problem here.

I know (sigh) ...

>> >> root@wetrec[318] ./dynarr_00
>> >> declared =  15220000
>> >> indexed =   10680000
>> >> dynamic =   15010000
>> >
>> >$ ./a.out
>> >declared =   4000000
>> >indexed =    6430000
>> >dynamic =    4060000
>
>If I run the attached program I get (P2-400):
>
>mull   = 3.860000s
>access = 3.800000s
>
>On a sparc ultra-2 (RISC so no mull opcode) the numbers are much
>worse:
>
>mull   = 31.469999s
>access = 7.880000s

All totally irrelevant numbers :-)

>       Raimar
>-- 

Here is a tweaked example with fewer case specific optimization
differences, and no false comparisons. 

The key difference is down to the leal (computations) vs movl
(memory lookups) with a couple extra register variable ops in
the leal case. The latter are probably eaten by a dual integer
pipeline (i.e. executed in parallel) so they don't impact the
overall clock cycle times. But even on an older processor one
cycle register ops vs a single memory stall probably wins.

The key differences are in the "outer" loop in all cases.

It uses global variables for the index limits, but accesses them
through locals within the loop to avoid Raimar's selective
optimization breaking practices.

This has the loops arranged to do count down so this optimization
is not dependent on any differences in compiler pattern recognition.
It is probably the cmpl instructions for the upper bound in normal
loop coding practices vs test against zero that gives the benefit.
This can be exacerbated if the upper bound is a global or computed
value. 

It also times just key loops so the malloc/free times are excluded.
This removes any noise level initialization penalty for the dynamic
case.

Finally it runs things three times in case the runtime order had
some iniital load or memory management side effect (with a reduced
loop count to offset my impatience at waiting for results).


The fascinating element this time is that the key code for static
and dynamic_but_indexed is absolutely identical, so one doesn't have
to look to subtleties of flow to understand differences here :-).

Again this shows that if you leave the index computations to the 
compiler/optimizer and don't do known bad things that destroy its
effectiveness, it will find the best solution in a given context.

Indexed (linearized) arrays are just as effective as static unless
you introduce type differences in the index/bounds values that
defeat optimization.

Cheers,
RossW
=====

rww@wetrec[296] gcc -O3 -o dynarr dynarr.c
rww@wetrec[297] gcc -O3 -S dynarr.c
rww@wetrec[298] ./dynarr
dynamic:  1960000
dynindex: 1520000
declared: 1520000
dynamic:  1960000
dynindex: 1510000
declared: 1520000
dynamic:  1970000
dynindex: 1520000
declared: 1520000
=====

#include<stdlib.h>

int Xsize=80, Ysize=50;
long clock0;

/********************************/
long declared()
{
  int i, j;
  unsigned int k, xsize = Xsize, ysize = Ysize;
  int tiles[xsize][ysize];

  clock0 = clock();
  for(k = 0; k < 100000; k++) {

    for(i = xsize; --i >= 0; ) {
      for(j = ysize; --j >= 0; ) {
        tiles[i][j] = i + j;
      }
    }
  }
  return (clock0 = clock() - clock0);
}

/********************************/
long dynamic_but_indexed()
{
  int i, j;
  unsigned int k, xsize = Xsize, ysize = Ysize;
  int *tiles;

  tiles = malloc(ysize * xsize * sizeof(int));

  clock0 = clock();
  for(k = 0; k < 100000; k++) {

    for(i = xsize; --i >= 0; ) {
      for(j = ysize; --j >= 0; ) {
        tiles[ysize * i + j] = i + j;
      }
    }
  }
  clock0 = clock() - clock0;

  free(tiles);
  return clock0;
}

/********************************/
long dynamic()
{
  int i, j;
  unsigned int k, xsize = Xsize, ysize = Ysize;
  int *tiles_storage;
  int **tiles;

  tiles_storage = malloc(xsize * ysize * sizeof(int));
  tiles = malloc(xsize * sizeof(int *));

  for(i = 0; i < xsize; i++) {
    tiles[i] = tiles_storage + (i * ysize);
  }

  clock0 = clock();
  for(k = 0; k < 100000; k++) {
 
    for(i = xsize; --i >= 0; ) {
      for(j = ysize; --j >= 0; ) {
        tiles[i][j] = i + j;
      }
    }
  }
  clock0 = clock() - clock0;

  free(tiles);
  free(tiles_storage);
  return clock0;
}

/********************************/
int main()
{
  int k;
  for( k = 3; --k >= 0; ) {
    // clock0 = clock();
    dynamic();
    // clock0 = clock() - clock0;
    printf("dynamic:  %d\n", clock0);
    // clock0 = clock();
    dynamic_but_indexed();
    // clock0 = clock() - clock0;
    printf("dynindex: %d\n", clock0);
    // long clock0 = clock();
    declared();
    // clock0 = clock() - clock0;
    printf("declared: %d\n", clock0);
  }

  return 0;
}
=====
        .file   "dynarr.c"
        .version        "01.01"
gcc2_compiled.:
.globl Xsize
.data
        .align 4
        .type    Xsize,@object
        .size    Xsize,4
Xsize:
        .long   80
.globl Ysize
        .align 4
        .type    Ysize,@object
        .size    Ysize,4
Ysize:
        .long   50
.text
        .align 4
.globl declared
        .type    declared,@function
declared:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    Xsize, %eax
        subl    $28, %esp
        movl    %eax, -20(%ebp)
        movl    Ysize, %eax
        movl    %eax, -24(%ebp)
        sall    $2, %eax
        movl    %eax, -28(%ebp)
        imull   -20(%ebp), %eax
        addl    $15, %eax
        andl    $-16, %eax
        movl    %esp, -32(%ebp)
        subl    %eax, %esp
        movl    %esp, -36(%ebp)
        call    clock
        movl    %eax, clock0
        movl    $0, -16(%ebp)
        .p2align 2
.L21:
        movl    -20(%ebp), %ebx
        decl    %ebx
        js      .L34
        movl    %ebx, %esi
        movl    -28(%ebp), %edi
        imull   -28(%ebp), %esi
        negl    %edi
        .p2align 2
.L25:
        movl    -24(%ebp), %edx
        decl    %edx
        js      .L35
        movl    -36(%ebp), %ecx
        leal    (%ecx,%esi), %eax
        leal    (%eax,%edx,4), %ecx
        .p2align 2
.L29:
        leal    (%edx,%ebx), %eax
        movl    %eax, (%ecx)
        subl    $4, %ecx
        decl    %edx
        jns     .L29
.L35:
        addl    %edi, %esi
        decl    %ebx
        jns     .L25
.L34:
        incl    -16(%ebp)
        cmpl    $99999, -16(%ebp)
        jbe     .L21
        call    clock
        subl    clock0, %eax
        movl    %eax, clock0
        movl    -32(%ebp), %esp
        leal    -12(%ebp), %esp
        popl    %ebx
        popl    %esi
        popl    %edi
        popl    %ebp
        ret
.Lfe1:
        .size    declared,.Lfe1-declared
        .align 4
.globl dynamic
        .type    dynamic,@function
dynamic:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        movl    Xsize, %edi
        pushl   %esi
        movl    %edi, %eax
        movl    Ysize, %esi
        pushl   %ebx
        imull   %esi, %eax
        subl    $24, %esp
        sall    $2, %eax
        pushl   %eax
        call    malloc
        movl    %eax, -20(%ebp)
        leal    0(,%edi,4), %eax
        movl    %eax, (%esp)
        xorl    %ebx, %ebx
        call    malloc
        addl    $16, %esp
        cmpl    %edi, %ebx
        movl    %eax, -24(%ebp)
        jae     .L73
        xorl    %edx, %edx
        .p2align 2
.L56:
        movl    -20(%ebp), %ecx
        leal    (%ecx,%edx,4), %eax
        movl    -24(%ebp), %ecx
        movl    %eax, (%ecx,%ebx,4)
        incl    %ebx
        addl    %esi, %edx
        cmpl    %edi, %ebx
        jb      .L56
.L73:
        call    clock
        movl    %eax, clock0
        movl    $0, -16(%ebp)
        .p2align 2
.L61:
        movl    %edi, %ebx
        decl    %ebx
        js      .L75
        .p2align 2
.L65:
        movl    %esi, %edx
        decl    %edx
        js      .L76
        movl    -24(%ebp), %eax
        movl    (%eax,%ebx,4), %ecx
        .p2align 2
.L69:
        leal    (%edx,%ebx), %eax
        movl    %eax, (%ecx,%edx,4)
        decl    %edx
        jns     .L69
.L76:
        decl    %ebx
        jns     .L65
.L75:
        incl    -16(%ebp)
        cmpl    $99999, -16(%ebp)
        jbe     .L61
        call    clock
        subl    clock0, %eax
        subl    $12, %esp
        pushl   -24(%ebp)
        movl    %eax, clock0
        call    free
        popl    %eax
        pushl   -20(%ebp)
        call    free
        addl    $16, %esp
        leal    -12(%ebp), %esp
        popl    %ebx
        popl    %esi
        movl    clock0, %eax
        popl    %edi
        popl    %ebp
        ret
.Lfe2:
        .size    dynamic,.Lfe2-dynamic
                .section        .rodata
.LC0:
        .string "dynamic:  %d\n"
.LC1:
        .string "dynindex: %d\n"
.LC2:
        .string "declared: %d\n"
.text
        .align 4
.globl main
        .type    main,@function
main:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        subl    $28, %esp
        movl    $2, -16(%ebp)
        .p2align 2
.L81:
        call    dynamic
        subl    $8, %esp
        pushl   clock0
        pushl   $.LC0
        call    printf
        movl    Xsize, %eax
        movl    %eax, -24(%ebp)
        movl    Ysize, %eax
        movl    %eax, -28(%ebp)
        imull   -24(%ebp), %eax
        sall    $2, %eax
        movl    %eax, (%esp)
        call    malloc
        movl    %eax, -32(%ebp)
        call    clock
        movl    %eax, clock0
        movl    $0, -20(%ebp)
        addl    $16, %esp
        .p2align 2
.L84:
        movl    -24(%ebp), %ebx
        decl    %ebx
        js      .L101
        movl    %ebx, %esi
        movl    -28(%ebp), %edi
        imull   -28(%ebp), %esi
        negl    %edi
        .p2align 2
.L87:
        movl    -28(%ebp), %edx
        decl    %edx
        js      .L102
        leal    (%edx,%esi), %ecx
        movl    -32(%ebp), %eax
        leal    (%eax,%ecx,4), %ecx
        .p2align 2
.L90:
        leal    (%edx,%ebx), %eax
        movl    %eax, (%ecx)
        subl    $4, %ecx
        decl    %edx
        jns     .L90
.L102:
        addl    %edi, %esi
        decl    %ebx
        jns     .L87
.L101:
        incl    -20(%ebp)
        cmpl    $99999, -20(%ebp)
        jbe     .L84
        call    clock
        subl    clock0, %eax
        subl    $12, %esp
        movl    %eax, clock0
        pushl   -32(%ebp)
        call    free
        popl    %ebx
        popl    %esi
        pushl   clock0
        pushl   $.LC1
        call    printf
        call    declared
        popl    %edx
        popl    %ecx
        pushl   clock0
        pushl   $.LC2
        call    printf
        addl    $16, %esp
        decl    -16(%ebp)
        jns     .L81
        leal    -12(%ebp), %esp
        popl    %ebx
        popl    %esi
        xorl    %eax, %eax
        popl    %edi
        popl    %ebp
        ret
.Lfe3:
        .size    main,.Lfe3-main
        .comm   clock0,4,4
        .align 4
.globl dynamic_but_indexed
        .type    dynamic_but_indexed,@function
dynamic_but_indexed:
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %edi
        pushl   %esi
        pushl   %ebx
        movl    Xsize, %eax
        subl    $40, %esp
        movl    %eax, -20(%ebp)
        movl    Ysize, %eax
        movl    %eax, -24(%ebp)
        imull   -20(%ebp), %eax
        sall    $2, %eax
        pushl   %eax
        call    malloc
        movl    %eax, -28(%ebp)
        call    clock
        movl    %eax, clock0
        movl    $0, -16(%ebp)
        addl    $16, %esp
        .p2align 2
.L40:
        movl    -20(%ebp), %ebx
        decl    %ebx
        js      .L104
        movl    %ebx, %esi
        movl    -24(%ebp), %edi
        imull   -24(%ebp), %esi
        negl    %edi
        .p2align 2
.L44:
        movl    -24(%ebp), %edx
        decl    %edx
        js      .L105
        leal    (%edx,%esi), %ecx
        movl    -28(%ebp), %eax
        leal    (%eax,%ecx,4), %ecx
        .p2align 2
.L48:
        leal    (%edx,%ebx), %eax
        movl    %eax, (%ecx)
        subl    $4, %ecx
        decl    %edx
        jns     .L48
.L105:
        addl    %edi, %esi
        decl    %ebx
        jns     .L44
.L104:
        incl    -16(%ebp)
        cmpl    $99999, -16(%ebp)
        jbe     .L40
        call    clock
        subl    clock0, %eax
        subl    $12, %esp
        pushl   -28(%ebp)
        movl    %eax, clock0
        call    free
        addl    $16, %esp
        leal    -12(%ebp), %esp
        popl    %ebx
        popl    %esi
        movl    clock0, %eax
        popl    %edi
        popl    %ebp
        ret
.Lfe4:
        .size    dynamic_but_indexed,.Lfe4-dynamic_but_indexed
        .ident  "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.0)"
=====




[Prev in Thread] Current Thread [Next in Thread]