Complete.Org: Mailing Lists: Archives: freeciv-dev: August 2001:
[Freeciv-Dev] Re: Profiling Civserver again
Home

[Freeciv-Dev] Re: Profiling Civserver again

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: Profiling Civserver again
From: Vasco Alexandre Da Silva Costa <vasc@xxxxxxxxxxxxxx>
Date: Thu, 2 Aug 2001 00:30:53 +0100 (WET DST)

On Thu, 2 Aug 2001, Gaute B Strokkenes wrote:

> On Wed, 1 Aug 2001, vasc@xxxxxxxxxxxxxx wrote:
> > Well does gcc optimize it this way?
> > 
> > #define map_adjust_x(X) \
> >   (((X) % map.xsize) + (((X >= 0) - 1) & map.xsize))
> 
> You mean * rather than &, right?

No. Try the macro and you'll see. It IS unreadable. Don't say i didn't
warn you. :-)   Tip: 0-1=a mask of ones 0xFFFFFFFF, 1-1=a mask of zeros
0x00000000.

> > This is faster. Its faster on all CPUs. Notice it doesn't have any
> > branches. Of course its also pretty darn unreadable :-)
> 
> I'm sceptical.  Note that Gregory's patch, which replaced the modulus
> operations with two while loops, doubled the speed of

I've read a bit on this. It seems the divide operation takes 18 clocks or
so on an i686 and a branch in the worst case takes 3 clocks or so. If you
do few loop iterations it is cheaper than the divide. &, 1-, +, take 1
clock each. So what the unreadable macro above should shave off 1 clock
when on the worse case for branch penalty. Of course i don't want us to
use this unreadable trash instead of what we have. It doesn't have any
justification. I was just pointing out gcc isn't perfect.

> normalize_map_pos().  It's too bad that there's no easy way to use
> loop constructs in macros, short of using inline functions instead.

Actually you can use loops in macros. The problem is this macro in
particular must return a value :)

> Perhaps the best thing would be to change it to trigger on
> 
>   (X) < 0 || (X) >= map.xsize)
> 
> instead.  Off course, the proof of the pudding is in the eating:
> Somebody really ought to profile all these different approaches,
> etc. etc.

Well i think map_adjust_x() is good enough as it is since someone shown
here that gcc optimizes out the first % op. About normalize_map_pos()...
well...
It seems in most places it isn't either needed or a simpler test works. So
i guess we should optimize it by using it less instead of trying to
improve it more.

---
Vasco Alexandre da Silva Costa @ Instituto Superior Tecnico, Lisboa



[Prev in Thread] Current Thread [Next in Thread]