Complete.Org: Mailing Lists: Archives: freeciv-dev: April 1999:
Re: [Freeciv-Dev] Some performance issues
Home

Re: [Freeciv-Dev] Some performance issues

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Fabrice.Noilhan@xxxxxx (Fabrice Noilhan)
Cc: freeciv-dev@xxxxxxxxxxx
Subject: Re: [Freeciv-Dev] Some performance issues
From: Rizos Sakellariou <rizos@xxxxxxxxxxx>
Date: Mon, 19 Apr 1999 18:04:12 -0500 (CDT)

> 
> On Sat, Apr 17, 1999 at 10:48:59PM -0500, Rizos Sakellariou wrote:
> > Let me say here that multiplication is still an expensive operation on most 
> > modern processors with a cost that varies from 10 to 14 CPU cycles in many 
> > cases. Compilers are typically smart enough to convert multiplications of a 
> > variable by an integer to a series of ADDs and SHIFTs (thereby reducing
> > the cost of a MUL), but they are left with no choice in the case of
> > the multiplication of two unknowns as it is our case here. As a solution,
> > i suggest to replace all occurrences of y*map.xsize with something like 
> > an array ymult[y] whose elements (from 0 to map.ysize) are precomputed
> > at the beginning of each game. Depending on the machine used this may
> > have a drastical effect on reducing the time spent here.
> 
> Since I'm working on this subject (cryptography and modular arithmeric),
> I think I can have some useful comments:
> 
> - integer multiplication is an expensive operation on UltraSparc
> processors (20 cycles for a 32*32 -> 64 bits operation). On Pentium
> Pro/II/III, it only takes 4 cycles. It is really faster (in terms of
> cycles) on old sparc processors (4-6 cycles).
> 
> - Sun's C compiler does NOT replace multiplications by a constant integer
> by shifts and ADDS (even if it is stated in the documentation)
> 
> - Floating point operations are really faster on UltraSparc (26*26->52)
> but the cost of conversion (using casts or assembly code) is really huge.
> Anyway, you can do it properly if you know that you're handling integers
> with appropriate substraction and masks.
> 
> - The cost of a lookup-table is similar to the cost of a multiplication
> on most other processors. This table should be tiny if you want to stay
> in the cache.
> 
> Now, it's up to you to decide if you want portability (in case you can
> just leave these multiplications), efficiency (in case you can define
> a macro for doing these multiplications) or use the not-most-efficient-but-
> cool-on-all-processors solution (lookup table).

Ok - let's get some things straight:
- Portability is defined as whether your code can be easily compiled
  and run on different architectures. Changing  y * map.xsize  with
  something like  yoffset[y] for instance is not a machine dependent
  operation and has nothing to do with portability.
- Efficiency is associated with the performance of the code which 
  results from the code that the compiler generates. In the particular 
  case, there is scope of replacing a generally expensive multiplication 
  with a not so expensive load.

Finally: Performance tuning, especially for a variety of platforms, 
is a tedious process; if we also take into account the complexity 
of modern processors all this implies that one has to be investigative 
rather than authoritative.

--rizos

[Prev in Thread] Current Thread [Next in Thread]