Complete.Org: Mailing Lists: Archives: freeciv-dev: June 2002:
[Freeciv-Dev] Re: [RFC] Path finding implementation.
Home

[Freeciv-Dev] Re: [RFC] Path finding implementation.

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Raimar Falke <rf13@xxxxxxxxxxxxxxxxx>
Cc: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>, Gregory Berkolaiko <Gregory.Berkolaiko@xxxxxxxxxxxx>, Freeciv Development List <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: [RFC] Path finding implementation.
From: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>
Date: Sun, 30 Jun 2002 06:22:45 -0400

At 05:09 PM 02/06/29 +0200, Raimar Falke wrote:
>On Sat, Jun 29, 2002 at 09:56:31AM -0400, Ross W. Wetmore wrote:
>> 
>> This sort of (unnecessary) attempt to duplicate compiler optimizations
>> has been tried and shown to be less efficient. Compiler writers and
>> chip makers have worked really hard over the years to make these things
>> as efficient as possible as part of the "benchmark" wars. The assumption
>> that one can hack code to improve on their efforts is alwasy interesting
>> to watch.
>> 
>> The best thing to do is stay out of the way of the compiler by not
>> inserting special algorithmic hacks, or optimization killers like
>> (inline) function calls.
>> 
>> 
>> When using functions or inline functions (which are the worst
>> of all worlds) rather than macros to do the address computations,
>> one blocks the compiler from generating the sort of efficient
>> LEA instructions that do (index * base +offset) calculations in a 
>> single instruction cycle.
>
>> In general the compiler will sort out common expressions and cache
>> the results in registers unless one does things to defeat its pattern
>> recognition, or allowed optimization constraints.
>
>Ack.

Glad you showed you recognized this before using it to write some poor
code to try and make a worthless point :-).

[...]
>Long explanation. But misses the point.

Actually, it hit the point. You are missing the point about letting
the compiler handle things when you use bad code example to try and
reverse the timings.

>> /********************************/
>> void dynamic_but_indexed()
>> {
>...
>>     for(i = 0; i < 80; i++) {
>>       for(j = 0; j < 50; j++) {
>>         tiles[j + i * 50] = i + j;
>>       }
>>     }
>
>is compiled to
>
>> .L41:
>>       movl    $49, %eax
>>       leal    196(%ebx), %ecx
>>       leal    49(%esi), %edx
>> .L45:
>>       movl    %edx, (%ecx)
>>       subl    $4, %ecx
>>       decl    %edx
>>       decl    %eax
>>       jns     .L45
>>       incl    %esi
>>       addl    $200, %ebx
>>       cmpl    $79, %esi
>>       jle     .L41
>
>which is nice work from the compiler. It is essentially:
>
>  int *p=&tiles[80*50-1];
>  int x,y;
>  for(x=0;x<80;x++) {
>    for(y=49;y>=0;y--) {
>      *p--=y;
>    }
>  }
>
>but this optimization is only possible because the compiler know the
>array sizes.

It doesn't know anything about array sizes. This was a malloc'd block.
It does know about loop values though.

However, you correctly state the point, if you let the compiler do the
optimizations it will choose an appropriate optimization for this case
from many different possibilities.

>If the 80 and 50 is replaced with global vars (unsigned int to get an
>easier assembly) you get somthing like this:

Which is a very stupid thing to do, as globals block the compiler from
optimizing by forcing a load from memory. It is the same performance 
killer as using a function call to do a one liner. One presumes most
people design code to do the efficient thing and not the inefficient, 
and will choose the former over the latter when given a choice.

Globals are evil. Repeat it to yourself 10 times ...

Store the loop values in a local first, and you will find they get placed 
in registers with simpler precomputed form in the outer loop, rather than 
loaded from memory everytime and the computation redone explicitly in 
the inner loop. 

This code is essentially the same as the original case except the hard
constants in the outer loop are replaced by stack memory loads (which 
are almost as fast as registers). The extra memory access as well as the 
more expensive imull instructions are now gone.

>-24 == ysize
>-20 == tiles
>-36 == xsize
>
>.L43:
>        movl    -24(%ebp), %ebx
>        xorl    %ecx, %ecx
>        cmpl    %ebx, %ecx
>        jae     .L53
>.L47:
>        movl    %esi, %eax
>        movl    -20(%ebp), %edi
>        imull   %ebx, %eax
>        leal    (%ecx,%esi), %edx
>        leal    (%eax,%ecx), %eax
>        incl    %ecx
>        movl    %edx, (%edi,%eax,4)
>        cmpl    %ebx, %ecx
>        jb      .L47
>        movl    -36(%ebp), %edx
>.L53:
>        incl    %esi
>        cmpl    %edx, %esi
>        jb      .L43
>
>And now the dynamic case:
>
>        movl    ysize, %eax
>.L67:
>        xorl    %edx, %edx
>        cmpl    %eax, %edx
>        jae     .L78
>        movl    (%esi,%ebx,4), %ecx
>.L71:
>        leal    (%edx,%ebx), %eax
>        movl    %eax, (%ecx,%edx,4)
>        incl    %edx
>        movl    ysize, %eax
>        cmpl    %eax, %edx
>        jb      .L71
>        movl    xsize, %ecx
>.L78:
>        incl    %ebx
>        cmpl    %ecx, %ebx
>        jb      .L67
>
>So in the dynamic case we no imull but an extra memory access. The
>other commands are cheap.

Which is basically the same code as before because it still does the
index memory lookup, and doesn't have to precompute the value, i.e. no
compiler optimization was broken here because this is the step that
you took over from the compiler.


The other bad thing about the dynamic reference system you propose is 
that it is very easy to generate memory leaks, and very clumsy to set up 
and tear down. It mistakenly gives people the idea they are dealing with
a single double indexed array, rather than a subtly complex set of
references. Coding bugs and errors are nasty.

[...]
>> root@wetrec[318] ./dynarr_00
>> declared =  15220000
>> indexed =   10680000
>> dynamic =   15010000
>
>$ ./a.out
>declared =   4000000
>indexed =    6430000
>dynamic =    4060000
>
>       Raimar
>-- 

As I said, this has been thrashed through before, many times :-).

The lesson is the same ... keep the code simple and let the compiler
handle this problem.

Cheers,
RossW
=====




[Prev in Thread] Current Thread [Next in Thread]