Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2004:
[Freeciv-Dev] Re: (PR#7279) Macro optimizations
Home

[Freeciv-Dev] Re: (PR#7279) Macro optimizations

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: a-l@xxxxxxx
Subject: [Freeciv-Dev] Re: (PR#7279) Macro optimizations
From: "Jason Short" <jdorje@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 20 Jan 2004 20:18:15 -0800
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=7279 >

rwetmore@xxxxxxxxxxxx wrote:
> Arnstein Lindgard wrote:
> 
>><URL: http://rt.freeciv.org/Ticket/Display.html?id=7279 >
>>
>>This patch makes the server 17.6% faster.
> 
> Corecleanups generally ran 50% faster than CVS.

You obviously never actually *played* with the corecleanups.  The client 
was unusably slow.

>>It looks like the
>>    map_pos_to_index()
>>macro, which recently was inlined, was more complex. When I tried to
>>write it as a single macro in the same fashion, I got warnings about
>>undefined behaviour. Didn't figure out why yet.
> 
> Post your code and the compiler messages, answer is probably a quickie.

map_pos_to_index, as written, needs temp variables.  If written in macro 
form (without any change to the interface) these temp variables must be 
made global.  This doesn't work since multiple calls to this function 
can be made within one sequence.  For instance

   hmap(x, y) = hmap(x1, y1) + foo(); /* hmap calls map_pos_to_index */

is not valid C if map_pos_to_index isn't coded properly.

This is, of course, why it was written as an inline function in the 
first place.

>>I am a bit surprised by the results; I had assumed Intel, AMD
>>wizardry would speculatively pre-proccess all of these functions
>>and put them more or less permanently in the CPU cache.
> 
> The overhead of a function call tends to far outweigh the code in these
> simple macros.
> 
> When "C" code is optimized, macro operations can often be collapsed and
> optimized to the top of a code block. This cannot be done with a function
> that will be called over and over again even if the arguments and return
> results are the same.

Gcc will auto-inline small functions if compiled with -O3.  However it 
cannot inline functions that are included externally.

For instance normalize_map_pos is compiled into a function in map.o. 
Them mapgen.c (which calls normalize_map_pos) is compiled into mapgen.o. 
  But since this second compilation doesn't know about map.o (or map.c 
at all) there is no possibility for inlining.

Theoretically you can get around this if you "manually" compile the 
sources.  Something like

   gcc $CFLAGS -O3 *.c  ../ai/*.c ../common/*.c

should do it.  However I've done this before and there is no significant 
increase.  So obviously I don't know everything about gcc.

[$CFLAGS can be determined by looking at a standard gcc line that is 
called by make.]

> A lot of code is written as map_get_*(x,y) rather than caching the
> map_get_tile() as ptile and calling a tile_get_*(ptile) function instead.

Yep.

>>...and guess what, I got NO additional speed improvement at all. Is
>>it conceivable that the previous macros freed the CPU cache, so that
>>the wizardry now handles these functions properly?
> 
> I don't think CPU cache wizardry is really the magic spell involved :-).

Aside from agreeing with Ross, I have no idea on this one.

You could test it, however, my macro-izing just these functions while 
leaving the others intact.

>>Why not start using these simple tricks to gain free speed. The
>>Corecleanup is an old patch. Of course, inlining should give exactly
>>the same results, if you're using the gcc compiler :-) I guess
>>the trick is to decide on something.

Fine with me.

Nobody's every really objected to macro-ing/inlining the commonly called 
functions, but neither has anyone taken the initiative to do it.  My 
personal feeling is that such optimization will speed up the server, but 
will make *no difference* in actual game speed (because the client is 
the limiting factor).

[from a book by Stroustrup]
>    A call of an inline function through a pointer to function typically
>    results in a non-inlined call.

This is funny.  Are we supposed to use pointers to macros instead?

jason




[Prev in Thread] Current Thread [Next in Thread]