Complete.Org: Mailing Lists: Archives: freeciv-dev: July 2002:
[Freeciv-Dev] Re: [RFC] Path finding implementation.
Home

[Freeciv-Dev] Re: [RFC] Path finding implementation.

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>
Cc: Gregory Berkolaiko <Gregory.Berkolaiko@xxxxxxxxxxxx>, Freeciv Development List <freeciv-dev@xxxxxxxxxxx>
Subject: [Freeciv-Dev] Re: [RFC] Path finding implementation.
From: Raimar Falke <rf13@xxxxxxxxxxxxxxxxx>
Date: Sun, 7 Jul 2002 12:26:48 +0200

On Sat, Jul 06, 2002 at 12:43:52PM -0400, Ross W. Wetmore wrote:
> At 10:09 PM 02/07/01 +0200, Raimar Falke wrote:
> [...]
> >> Again this shows that if you leave the index computations to the 
> >> compiler/optimizer and don't do known bad things that destroy its
> >> effectiveness, it will find the best solution in a given context.
> >> 
> >> Indexed (linearized) arrays are just as effective as static unless
> >> you introduce type differences in the index/bounds values that
> >> defeat optimization.
> >
> >Ok and another time. Repeat please: the variable map of type struct
> >civ_map is a global one. The compiler have to assume that the value
> >has changed for every call of a function foo.
> 
> Ok, to repeat one last time ...
> 
> If you know that the loop limit/array bounds values are not going to
> change, then you should store them into local variables before entering
> the loop, and access the local variables. 
> 
> If you *insist* on doing things in ways that defeat optimizations, then
> also do this for your precomputed array so that it needs to be accessed
> rather than cached in the inner loop as well.
> 
> The effect is now to move the leal (computation) and movl (lookup) 
> operations into the inner loop and kill performance in comparable ways.
> 
> If you make a "fair" comparison, then the (computed) index will almost 
> always win as the memory access is far more damaging than the computation. 
> This should be obvious from the example, even if you want to dispute the
> environmental conditions in general cases.
> 
> The memory access also causes cache line flushes and cache capacity issues
> that internal or local variable register/stack values don't.
> 
> >Also note that the user of this code isn't a tight loop but something
> >like this:
> >
> >bool pf_get_next_location(pf_map_t pf_map, pf_location_t * location)
> >{
> >...
> >  adjc_dir_iterate(current_cell->pos.x, current_cell->pos.y, x1, y1, dir) {
> >    struct cell tmp, *possible_neighbour = &tmp, *neighbour;
> >    int extra_cost2;
> >...
> >    neighbour = &pf_map->cells[x1][y1];
> >
> >The last line is the only time the cells array is accessed except in
> >built-up and tear down. The compiler can do clever things in tight
> >loops but not in one like this. 
> 
> Actually it can ... as long as you don't actively seek to defeat it.

The Dijkstra algorithm doesn't contain a simple loop that you the
compiler can optimize.

> >Well except if you can change the way
> >the map is stored and so adjc_dir_iterate becomes a simply loop.
> 
> The code to optimize adjacent iterations has been around for a year now.
> It doesn't involve any changes to the map storage.

Also corecleanup14-cvs-May11.dif doesn't help here.

Sample code:

int foo(int x,int y)
{
  adjc_iterate(x, y, x1, y1) {
      if(x1==y1) {return 12;}
  } adjc_iterate_end;
  return 34;
}

is preprocessed to (indented):

int foo(int x, int y)
{
  { {
      int x1;
      int y1;
      int _nn = (8);
      int _ni = (1);
      int _x0 = (x);
      int _y0 = (y);
      int _border =
          (((map.
             type & (MAP_TYPE_ISO)) ? (*(&_FC_MAP_Y) =
                                       ((((_x0))) + (((_y0))) - map.xsize),
                                       *(&_FC_MAP_X) =
                                       ((((_x0))) + (((_x0))) -
                                        *(&_FC_MAP_Y) -
                                        (*(&_FC_MAP_Y) & 1)) /
                                       2) : (*(&_FC_MAP_Y) =
                                             (((_y0))), *(&_FC_MAP_X) =
                                             (((_x0))))),
           (!((((map.
                 type & (MAP_TYPE_ISO)) ? 3 : 1)) <= _FC_MAP_Y
              && _FC_MAP_Y <
              map.ysize - (((map.type & (MAP_TYPE_ISO)) ? 3 : 1)))
            || !((((map.type & (MAP_TYPE_ISO)) ? 3 : 1)) <= _FC_MAP_X
                 && _FC_MAP_X <
                 map.xsize - (((map.type & (MAP_TYPE_ISO)) ? 3 : 1)))));
      while ((_nn -= _ni) >= 0) {
        (*(&x1) = DIR_dx[(_nn + 1)], *(&y1) = DIR_dy[(_nn + 1)]), x1 +=
                  ^^^^^^^^^^^^^^^^^           ^^^^^^^^^^^^^^^^^
Problem
            _x0, y1 += _y0;
        if (!_border
            ||
            ((((map.
                type & (MAP_TYPE_ISO)) ? (*(&_FC_MAP_Y) =
                                          (((*(&x1))) + ((*(&y1))) -
                                           map.xsize), *(&_FC_MAP_X) =
                                          (((*(&x1))) + ((*(&x1))) -
                                           *(&_FC_MAP_Y) -
                                           (*(&_FC_MAP_Y) & 1)) /
                                          2) : (*(&_FC_MAP_Y) =
                                                ((*(&y1))), *(&_FC_MAP_X) =
                                                ((*(&x1))))),
              (((map.type & (MAP_TYPE_WRAPX)) == (MAP_TYPE_WRAPX))
               || ((unsigned) (_FC_MAP_X) < (unsigned) (map.xsize)))
              && (((map.type & (MAP_TYPE_WRAPY)) == (MAP_TYPE_WRAPY))
                  || ((unsigned) (_FC_MAP_Y) < (unsigned) (map.ysize))))
             && (!((map.type & (MAP_TYPE_WRAPX)) == (MAP_TYPE_WRAPX))
                 || ((unsigned) (*(&_FC_MAP_X)) < (unsigned) ((map.xsize)))
                 || (*(&_FC_MAP_X) =
                     ((*(&_FC_MAP_X)) % (map.xsize) <
                      0 ? (*(&_FC_MAP_X)) % (map.xsize) +
                      (map.xsize) : (*(&_FC_MAP_X)) % (map.xsize)), (1)))
             && (!((map.type & (MAP_TYPE_WRAPY)) == (MAP_TYPE_WRAPY))
                 || ((unsigned) (*(&_FC_MAP_Y)) < (unsigned) ((map.ysize)))
                 || (*(&_FC_MAP_Y) =
                     ((*(&_FC_MAP_Y)) % (map.ysize) <
                      0 ? (*(&_FC_MAP_Y)) % (map.ysize) +
                      (map.ysize) : (*(&_FC_MAP_Y)) % (map.ysize)), (1)))
             &&
             (((map.
                type & (MAP_TYPE_ISO)) ? (*((&x1)) =
                                          ((_FC_MAP_Y) +
                                           ((_FC_MAP_Y) & 1)) / 2 +
                                          (_FC_MAP_X), *((&y1)) =
                                          (_FC_MAP_Y) - *((&x1)) +
                                          map.xsize) : (*((&y1)) =
                                                        (_FC_MAP_Y),
                                                        *((&x1)) =
                                                        (_FC_MAP_X))), (1)))) { 
{
            if (x1 == y1) {
              return 12;
            }
        }
        }
      }
  }
  };
  return 34;
}

This is huge compared to the current code:

int foo(int x, int y)
{
  { { {
        int _dummy_x, _dummy_y;
        bool _is_border = (((x)) < ((1)) || ((x)) >= map.xsize - ((1))
                           || ((y)) < ((1)) || ((y)) >= map.ysize - ((1)));
        ((void) 0);
        for (_dummy_y = -(1); _dummy_y <= (1); _dummy_y++) {
          for (_dummy_x = -(1); _dummy_x <= (1); _dummy_x++) {
            int x1 = _dummy_x + (x), y1 = _dummy_y + (y);
            if (_is_border && !normalize_map_pos(&x1, &y1)) {
              continue;
            };

            {
              if (x1 == x && y1 == y) {
                continue;
              }
              {
                if (x1 == y1) {
                  return 12;
                }
              }
            }
          }
        }
  }
  };
  };
  return 34;
}

So while I agree that your code does more (so the code is "allow to be
bigger" you have the heart of the problem is here (manually
simplified):

      while ((_nn -= _ni) >= 0) {
        x1 = DIR_dx[(_nn + 1)];
        y1 = DIR_dy[(_nn + 1)];

So the compiler has no knowledge what x1 and y1 are
afterwards. Without any idea it can optimize it. This is the
difference to a simple loop.

> Once you get past the performance aspects, there are still the bad coding 
> practice issues from using an array of refernces to simulate a doubly 
> indexed array.

        Raimar

-- 
 email: rf13@xxxxxxxxxxxxxxxxx
  The trick is to keep breathing.


[Prev in Thread] Current Thread [Next in Thread]