Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2004:
[Freeciv-Dev] Re: (PR#7279) Macro optimizations
Home

[Freeciv-Dev] Re: (PR#7279) Macro optimizations

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: a-l@xxxxxxx
Subject: [Freeciv-Dev] Re: (PR#7279) Macro optimizations
From: "rwetmore@xxxxxxxxxxxx" <rwetmore@xxxxxxxxxxxx>
Date: Tue, 20 Jan 2004 17:10:39 -0800
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=7279 >


Arnstein Lindgard wrote:
> <URL: http://rt.freeciv.org/Ticket/Display.html?id=7279 >
> 
> This patch makes the server 17.6% faster.

Corecleanups generally ran 50% faster than CVS.

> I wrote 3 macros to see if I could learn something about the
> preprocessor etc, looking at the Corecleanups and the profile
> data (PR#7278).
> 
> 1.  Profile shows that 96.7% of the calls to
>         normalize_map_pos()
>     are generated by
>         is_normal_map_pos()
>     so let's optimize the latter. This small macro alone is
>     responsible for 10.2% speedup.

is_normal_map_pos is a completely unnecessary function and like income
taxes was only implemented as a temporary hack to detect unnormal
coordinates during the normalization purges. Although born deprecated,
its (ab)use has continued to grow for various religious reasons.

It really is time to get rid of it, though.

> 2.  Turned normalize_map_pos() into a macro as well. Another 5% gain.
>     (It was neccessary to change "pf_map *map" into "pf_map *pf_map"
>     throughout the code, since the global variable "map" is now used
>     more often.)

Global variables are bad.

Though this is a repeat, it bears repeating since the message seems to
be very difficult to absorb for some people.

> 3.  Macro base_city_map_to_map(). Another 2.4%.
> 
> It wasn't that hard, presuming I didn't completely miss something.

No, not hard at all.

> It looks like the
>     map_pos_to_index()
> macro, which recently was inlined, was more complex. When I tried to
> write it as a single macro in the same fashion, I got warnings about
> undefined behaviour. Didn't figure out why yet.

Post your code and the compiler messages, answer is probably a quickie.

> I am a bit surprised by the results; I had assumed Intel, AMD
> wizardry would speculatively pre-proccess all of these functions
> and put them more or less permanently in the CPU cache.

The overhead of a function call tends to far outweigh the code in these
simple macros.

When "C" code is optimized, macro operations can often be collapsed and
optimized to the top of a code block. This cannot be done with a function
that will be called over and over again even if the arguments and return
results are the same.

> Furthermore, I additionally tried macrofying 4 more functions
> which are not included in this patch:
>     map_get_tile()
>     map_get_terrain()
>     map_get_continent()
>     map_get_city()

A lot of code is written as map_get_*(x,y) rather than caching the
map_get_tile() as ptile and calling a tile_get_*(ptile) function instead.
Some simple code changes in a few places here can reduce times significantly
as well.

map_get_tile() even if not being fully utilized should still show up unless
there is some extra garbage being thrown in that destroys performance.

But more importantly one should document a few of these "best practices" and
try to make sure future coding uses efficient and not inefficient templates.
People will usually follow current examples rather than invent new ones.

> ...and guess what, I got NO additional speed improvement at all. Is
> it conceivable that the previous macros freed the CPU cache, so that
> the wizardry now handles these functions properly?

I don't think CPU cache wizardry is really the magic spell involved :-).

> If that's correct, it just underlines the obvious - optimize the few
> core functions that matter, and that's it. Still I think the 4 should
> also be macros, since cache and other architectural features will
> vary.

Yes this is the important point. Measure ... then optimize ... then
measure. You save more time by first finding out the few problem areas
then verifying that you have found them.

But it still doesn't hurt to always use good coding practices rather than
bad ones plus a rationalization that it doesn't matter.

> Question:
> To make the last 4 work, I had to modify this:
> #define MAP_TILE(x,y) (map.tiles + map_pos_to_index(x, y))
> #define MAP_TILE(x,y) (map.tiles + map_pos_to_index((x), (y)))
> 
> Is it always correct to use paranthesis in macros?

Yes, all arguments should be enclosed in parentheses to insure that
the result is unambiguously interpreted as a single expression and not
split into separate parts. There can never be too many brackets :-).

#define multiply(x,y)  x * y

   val = multiply( 2+3, 3+4 )

gives you the idea when you look at the expanded "C" code.

   val = 2 + 3 * 3 + 4;

> Why not start using these simple tricks to gain free speed. The
> Corecleanup is an old patch. Of course, inlining should give exactly
> the same results, if you're using the gcc compiler :-) I guess
> the trick is to decide on something.

Inlining is always inferior to macro optimizations as Stroustrup states
in several books. I have included a few quotes from this master below
that may help you understand why even he does not recommend it for "C"
coding.

> * All numbers are percentage points off of the current cvs, measured
> "user time" by the command "time" on my box. Savegames identical.
> Compiled with gcc 3.3.2, CFLAGS="-pipe -O2 -march=athlon-tbird" and
> --enable-debug.
> 
> Arnstein

Some interesting comments on inline functions from Stroustrup.

   As stated, a compiler is free to ignore the hint to inline a function.

   Limitations on inlining will vary from implementation to implementation.

   Determining whether a function can be inlined or not is in general not
   possible.

   A call of an inline function through a pointer to function typically
   results in a non-inlined call.

   Ordinary calls may be generated for inline functions for more mundane
   reasons. For example,
   - An inline function was too large for inlining to be worthwhile.
   - An inline function was recursive.
   - An inline function was invoked twice within an expression.
   - An inline function contained a loop, a switch or a goto.

   The order of evaluation of actual arguments to functions is undefined.
   In particular, an inline expansion is not required to preserve the
   order of evaluation of actual arguments that the implementation would
   have produced for an out-of-line call. Since one call of a function in a
   program may be inlined while another is handled by the normal function
   call mechanism, this can lead to two calls to the same function with
   identical actual arguments yielding different results.

Clearly staying away from inline avoids a host of random coding factors
and problems even for compilers where it is supported in some form. Since
it provides no additional technical value over other coding constructs
available in "C", the certainty of macro code being used in performance
optimizations with loops or switches for instance, using it anytime would
appear to be a foolish mistake.

Having it used in some places and not in others would be a recipe for
code non-standardization that also has dubious merit. The Freeciv coding
rule against using inline in any way or form is thus the only sensible
policy.

Cheers,
RossW
=====




[Prev in Thread] Current Thread [Next in Thread]