Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2002:
[Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208)
Home

[Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208)

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: freeciv-dev@xxxxxxxxxxx
Cc: bugs@xxxxxxxxxxxxxxxxxxx
Subject: [Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208)
From: jdorje@xxxxxxxxxxxxxxxxxxxxx
Date: Wed, 9 Jan 2002 13:53:09 -0800 (PST)

Raimar Falke wrote:

> On Mon, Jan 07, 2002 at 09:32:44PM -0500, Ross W. Wetmore wrote:

>>g)  RANGE_CHECK macros are added and used to improve performance by
>>    reducing execution in most cases where applicable by a factor of two.
>>
>
> I need numbers. It looks like I have to gather them myself. Either
> from the mail archive or by testing.

I implemented RANGE_CHECK_0, and used it in as many places as I could
quickly find (mostly normalize_map_pos and a bunch of assertions).  In
two autogames runtimes were decreased by 1.2% and 0.7%.

I also did some simple disassembling to look at the compiled forms.  The
results varied - under gcc with -O1, in the first test I did the
standard and range_check_0 versions were identical, while the
range_check variant was actually longer than the standard one.  But when
I moved the code into a separate function (range_check() and
range_check_0()), gcc no longer optimized quite the same way.  Now using
the range_check_0/unsigned-int form the optimization stayed the same (7
operations, no branching), but the the range_check_0/straight check was
optimized differently (10 operatios, with branches so that it'd be
slightly faster if the first check failed).  The range_check test had 11
operations under the optimization and 10 with the straight check.  In
most freeciv uses both tests will pass, so short-circuit testing won't
gain much.  Hence the speedup.

Here's the x86 assembly for normalize_map_pos under -O3.

1.  Current CVS:

0x809e998 <normalize_map_pos>:    push   %ebp
0x809e999 <normalize_map_pos+1>:  mov    %esp,%ebp
0x809e99b <normalize_map_pos+3>:  mov    0x8(%ebp),%ecx
0x809e99e <normalize_map_pos+6>:  mov    (%ecx),%eax
0x809e9a0 <normalize_map_pos+8>:  test   %eax,%eax
0x809e9a2 <normalize_map_pos+10>: mov    %eax,%edx
0x809e9a4 <normalize_map_pos+12>: jns    0x809e9b8 <normalize_map_pos+32>
0x809e9a6 <normalize_map_pos+14>: mov    %esi,%esi
0x809e9a8 <normalize_map_pos+16>: mov    %edx,%eax
0x809e9aa <normalize_map_pos+18>: add    0x816fe40,%eax
0x809e9b0 <normalize_map_pos+24>: test   %eax,%eax
0x809e9b2 <normalize_map_pos+26>: mov    %eax,(%ecx)
0x809e9b4 <normalize_map_pos+28>: mov    %eax,%edx
0x809e9b6 <normalize_map_pos+30>: js     0x809e9a8 <normalize_map_pos+16>
0x809e9b8 <normalize_map_pos+32>: mov    %eax,%edx
0x809e9ba <normalize_map_pos+34>: jmp    0x809e9c2 <normalize_map_pos+42>
0x809e9bc <normalize_map_pos+36>: sub    %eax,%edx
0x809e9be <normalize_map_pos+38>: mov    %edx,%eax
0x809e9c0 <normalize_map_pos+40>: mov    %eax,(%ecx)
0x809e9c2 <normalize_map_pos+42>: mov    0x816fe40,%eax
0x809e9c7 <normalize_map_pos+47>: cmp    %eax,%edx
0x809e9c9 <normalize_map_pos+49>: jge    0x809e9bc <normalize_map_pos+36>
0x809e9cb <normalize_map_pos+51>: mov    0xc(%ebp),%eax
0x809e9ce <normalize_map_pos+54>: mov    (%eax),%eax
0x809e9d0 <normalize_map_pos+56>: xor    %edx,%edx
0x809e9d2 <normalize_map_pos+58>: test   %eax,%eax
0x809e9d4 <normalize_map_pos+60>: js     0x809e9e3 <normalize_map_pos+75>
0x809e9d6 <normalize_map_pos+62>: cmp    0x816fe44,%eax
0x809e9dc <normalize_map_pos+68>: jge    0x809e9e3 <normalize_map_pos+75>
0x809e9de <normalize_map_pos+70>: mov    $0x1,%edx
0x809e9e3 <normalize_map_pos+75>: mov    %edx,%eax
0x809e9e5 <normalize_map_pos+77>: pop    %ebp
0x809e9e6 <normalize_map_pos+78>: ret

2.  With range_check_0:

0x809e97c <normalize_map_pos>:    push   %ebp
0x809e97d <normalize_map_pos+1>:  mov    %esp,%ebp
0x809e97f <normalize_map_pos+3>:  mov    0x8(%ebp),%ecx
0x809e982 <normalize_map_pos+6>:  mov    (%ecx),%eax
0x809e984 <normalize_map_pos+8>:  test   %eax,%eax
0x809e986 <normalize_map_pos+10>: mov    %eax,%edx
0x809e988 <normalize_map_pos+12>: jns    0x809e99c <normalize_map_pos+32>
0x809e98a <normalize_map_pos+14>: mov    %esi,%esi
0x809e98c <normalize_map_pos+16>: mov    %edx,%eax
0x809e98e <normalize_map_pos+18>: add    0x816fec0,%eax
0x809e994 <normalize_map_pos+24>: test   %eax,%eax
0x809e996 <normalize_map_pos+26>: mov    %eax,(%ecx)
0x809e998 <normalize_map_pos+28>: mov    %eax,%edx
0x809e99a <normalize_map_pos+30>: js     0x809e98c <normalize_map_pos+16>
0x809e99c <normalize_map_pos+32>: mov    %eax,%edx
0x809e99e <normalize_map_pos+34>: jmp    0x809e9a6 <normalize_map_pos+42>
0x809e9a0 <normalize_map_pos+36>: sub    %eax,%edx
0x809e9a2 <normalize_map_pos+38>: mov    %edx,%eax
0x809e9a4 <normalize_map_pos+40>: mov    %eax,(%ecx)
0x809e9a6 <normalize_map_pos+42>: mov    0x816fec0,%eax
0x809e9ab <normalize_map_pos+47>: cmp    %eax,%edx
0x809e9ad <normalize_map_pos+49>: jge    0x809e9a0 <normalize_map_pos+36>
0x809e9af <normalize_map_pos+51>: mov    0xc(%ebp),%ecx
0x809e9b2 <normalize_map_pos+54>: mov    0x816fec4,%edx
0x809e9b8 <normalize_map_pos+60>: xor    %eax,%eax
0x809e9ba <normalize_map_pos+62>: cmp    %edx,(%ecx)
0x809e9bc <normalize_map_pos+64>: setb   %al
0x809e9bf <normalize_map_pos+67>: pop    %ebp
0x809e9c0 <normalize_map_pos+68>: ret

Conclusion?  The 1% savings isn't insignificant, but there are a lot of
other optimizations that will net a much larger gain.  The cost here
isn't as high as that of macro-izing every function, though.

I still strongly believe that the macros should be placed into shared.h
rather than map.h.

jason





[Prev in Thread] Current Thread [Next in Thread]