Complete.Org: Mailing Lists: Archives: freeciv-dev: July 2002:
[Freeciv-Dev] Re: [RFC] Path finding implementation.
Home

[Freeciv-Dev] Re: [RFC] Path finding implementation.

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>
Cc: Gregory Berkolaiko <Gregory.Berkolaiko@xxxxxxxxxxxx>, Freeciv Development List <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: [RFC] Path finding implementation.
From: Raimar Falke <rf13@xxxxxxxxxxxxxxxxx>
Date: Mon, 1 Jul 2002 22:09:00 +0200

On Sun, Jun 30, 2002 at 04:17:00PM -0400, Ross W. Wetmore wrote:
> At 04:31 PM 02/06/30 +0200, Raimar Falke wrote:
> >On Sun, Jun 30, 2002 at 06:22:45AM -0400, Ross W. Wetmore wrote:
> >> At 05:09 PM 02/06/29 +0200, Raimar Falke wrote:
> >> >On Sat, Jun 29, 2002 at 09:56:31AM -0400, Ross W. Wetmore wrote:
> [...]
> >> >So in the dynamic case we no imull but an extra memory access. The
> >> >other commands are cheap.
> >> 
> >> Which is basically the same code as before because it still does the
> >> index memory lookup, and doesn't have to precompute the value, i.e. no
> >> compiler optimization was broken here because this is the step that
> >> you took over from the compiler.
> >
> >It looks like I have made my point not clear in the previous post. The
> 
> No, you just aren't understanding/listening to what others are saying 
> about it :-).
> 
> >dynamic_but_indexed code has to do a multiplication. However if the
> >compiler knows the position and/or the sizes it can optimize this. In
> >the path_finding case the compiler knows nothing. To model this I made
> >the array sizes globals. 
> 
> Bad model ...
> 
> >While I agree with you in general that
> >globals should be avoided they solve here the problem and prevent the
> >compiler to assume anything about the sizes. You can also pass the
> >sizes as parameters...you get the same result.
> 
> No you don't ... arguments are locals not globals and are handled 
> correctly as far as optimizations are concerned.
> 
> >So in general the dynamic_but_indexed has to do a multiplication with
> >unknown factors. And this is the problem spot because such imull
> >instructions are expensive. More expensive than a memory access. I
> >know that the last statement is only valid for certain architectures
> >and than in the future the imull may be cheaper.
> 
> And if you carry your logic through, in the general case your dynamic
> precomputed multiplications cannot be used for this case because they
> make assumptions about the values i.e. don't recompute them on the spot.
> 
> So you are just fooling everyone with an apples and oranges comparison
> or setting up for some subtle runtime errors if you believe what you
> just said about the need to force the sizes to be truly "unknown".
> 
> >> The other bad thing about the dynamic reference system you propose is 
> >> that it is very easy to generate memory leaks, and very clumsy to set up 
> >> and tear down. It mistakenly gives people the idea they are dealing with
> >> a single double indexed array, rather than a subtly complex set of
> >> references. Coding bugs and errors are nasty.
> >
> >I don't see a problem here.
> 
> I know (sigh) ...
> 
> >> >> root@wetrec[318] ./dynarr_00
> >> >> declared =  15220000
> >> >> indexed =   10680000
> >> >> dynamic =   15010000
> >> >
> >> >$ ./a.out
> >> >declared =   4000000
> >> >indexed =    6430000
> >> >dynamic =    4060000
> >
> >If I run the attached program I get (P2-400):
> >
> >mull   = 3.860000s
> >access = 3.800000s
> >
> >On a sparc ultra-2 (RISC so no mull opcode) the numbers are much
> >worse:
> >
> >mull   = 31.469999s
> >access = 7.880000s
> 
> All totally irrelevant numbers :-)
> 
> >     Raimar
> >-- 
> 
> Here is a tweaked example with fewer case specific optimization
> differences, and no false comparisons. 
> 
> The key difference is down to the leal (computations) vs movl
> (memory lookups) with a couple extra register variable ops in
> the leal case. The latter are probably eaten by a dual integer
> pipeline (i.e. executed in parallel) so they don't impact the
> overall clock cycle times. But even on an older processor one
> cycle register ops vs a single memory stall probably wins.
> 
> The key differences are in the "outer" loop in all cases.
> 
> It uses global variables for the index limits, but accesses them
> through locals within the loop to avoid Raimar's selective
> optimization breaking practices.
> 
> This has the loops arranged to do count down so this optimization
> is not dependent on any differences in compiler pattern recognition.
> It is probably the cmpl instructions for the upper bound in normal
> loop coding practices vs test against zero that gives the benefit.
> This can be exacerbated if the upper bound is a global or computed
> value. 
> 
> It also times just key loops so the malloc/free times are excluded.
> This removes any noise level initialization penalty for the dynamic
> case.
> 
> Finally it runs things three times in case the runtime order had
> some iniital load or memory management side effect (with a reduced
> loop count to offset my impatience at waiting for results).
> 
> 
> The fascinating element this time is that the key code for static
> and dynamic_but_indexed is absolutely identical, so one doesn't have
> to look to subtleties of flow to understand differences here :-).
> 
> Again this shows that if you leave the index computations to the 
> compiler/optimizer and don't do known bad things that destroy its
> effectiveness, it will find the best solution in a given context.
> 
> Indexed (linearized) arrays are just as effective as static unless
> you introduce type differences in the index/bounds values that
> defeat optimization.

Ok and another time. Repeat please: the variable map of type struct
civ_map is a global one. The compiler have to assume that the value
has changed for every call of a function foo.

Also note that the user of this code isn't a tight loop but something
like this:

bool pf_get_next_location(pf_map_t pf_map, pf_location_t * location)
{
...
  adjc_dir_iterate(current_cell->pos.x, current_cell->pos.y, x1, y1, dir) {
    struct cell tmp, *possible_neighbour = &tmp, *neighbour;
    int extra_cost2;
...
    neighbour = &pf_map->cells[x1][y1];

The last line is the only time the cells array is accessed except in
built-up and tear down. The compiler can do clever things in tight
loops but not in one like this. Well except if you can change the way
the map is stored and so adjc_dir_iterate becomes a simply loop.

> rww@wetrec[296] gcc -O3 -o dynarr dynarr.c
> rww@wetrec[297] gcc -O3 -S dynarr.c
> rww@wetrec[298] ./dynarr
> dynamic:  1960000
> dynindex: 1520000
> declared: 1520000
> dynamic:  1960000
> dynindex: 1510000
> declared: 1520000
> dynamic:  1970000
> dynindex: 1520000
> declared: 1520000

The following is with -O2 since there is no gain from inlining:

P2-400:

dynamic:  2360000
dynindex: 5580000
declared: 3420000
dynamic:  2360000
dynindex: 5580000
declared: 3420000
dynamic:  2370000
dynindex: 5580000
declared: 3420000

        Raimar

-- 
 email: rf13@xxxxxxxxxxxxxxxxx
  "Heuer's Law: Any feature is a bug unless it can be turned off."


[Prev in Thread] Current Thread [Next in Thread]