[Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208)
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
Raimar Falke wrote:
> On Mon, Jan 07, 2002 at 09:32:44PM -0500, Ross W. Wetmore wrote:
>>g) RANGE_CHECK macros are added and used to improve performance by
>> reducing execution in most cases where applicable by a factor of two.
>>
>
> I need numbers. It looks like I have to gather them myself. Either
> from the mail archive or by testing.
I implemented RANGE_CHECK_0, and used it in as many places as I could
quickly find (mostly normalize_map_pos and a bunch of assertions). In
two autogames runtimes were decreased by 1.2% and 0.7%.
I also did some simple disassembling to look at the compiled forms. The
results varied - under gcc with -O1, in the first test I did the
standard and range_check_0 versions were identical, while the
range_check variant was actually longer than the standard one. But when
I moved the code into a separate function (range_check() and
range_check_0()), gcc no longer optimized quite the same way. Now using
the range_check_0/unsigned-int form the optimization stayed the same (7
operations, no branching), but the the range_check_0/straight check was
optimized differently (10 operatios, with branches so that it'd be
slightly faster if the first check failed). The range_check test had 11
operations under the optimization and 10 with the straight check. In
most freeciv uses both tests will pass, so short-circuit testing won't
gain much. Hence the speedup.
Here's the x86 assembly for normalize_map_pos under -O3.
1. Current CVS:
0x809e998 <normalize_map_pos>: push %ebp
0x809e999 <normalize_map_pos+1>: mov %esp,%ebp
0x809e99b <normalize_map_pos+3>: mov 0x8(%ebp),%ecx
0x809e99e <normalize_map_pos+6>: mov (%ecx),%eax
0x809e9a0 <normalize_map_pos+8>: test %eax,%eax
0x809e9a2 <normalize_map_pos+10>: mov %eax,%edx
0x809e9a4 <normalize_map_pos+12>: jns 0x809e9b8 <normalize_map_pos+32>
0x809e9a6 <normalize_map_pos+14>: mov %esi,%esi
0x809e9a8 <normalize_map_pos+16>: mov %edx,%eax
0x809e9aa <normalize_map_pos+18>: add 0x816fe40,%eax
0x809e9b0 <normalize_map_pos+24>: test %eax,%eax
0x809e9b2 <normalize_map_pos+26>: mov %eax,(%ecx)
0x809e9b4 <normalize_map_pos+28>: mov %eax,%edx
0x809e9b6 <normalize_map_pos+30>: js 0x809e9a8 <normalize_map_pos+16>
0x809e9b8 <normalize_map_pos+32>: mov %eax,%edx
0x809e9ba <normalize_map_pos+34>: jmp 0x809e9c2 <normalize_map_pos+42>
0x809e9bc <normalize_map_pos+36>: sub %eax,%edx
0x809e9be <normalize_map_pos+38>: mov %edx,%eax
0x809e9c0 <normalize_map_pos+40>: mov %eax,(%ecx)
0x809e9c2 <normalize_map_pos+42>: mov 0x816fe40,%eax
0x809e9c7 <normalize_map_pos+47>: cmp %eax,%edx
0x809e9c9 <normalize_map_pos+49>: jge 0x809e9bc <normalize_map_pos+36>
0x809e9cb <normalize_map_pos+51>: mov 0xc(%ebp),%eax
0x809e9ce <normalize_map_pos+54>: mov (%eax),%eax
0x809e9d0 <normalize_map_pos+56>: xor %edx,%edx
0x809e9d2 <normalize_map_pos+58>: test %eax,%eax
0x809e9d4 <normalize_map_pos+60>: js 0x809e9e3 <normalize_map_pos+75>
0x809e9d6 <normalize_map_pos+62>: cmp 0x816fe44,%eax
0x809e9dc <normalize_map_pos+68>: jge 0x809e9e3 <normalize_map_pos+75>
0x809e9de <normalize_map_pos+70>: mov $0x1,%edx
0x809e9e3 <normalize_map_pos+75>: mov %edx,%eax
0x809e9e5 <normalize_map_pos+77>: pop %ebp
0x809e9e6 <normalize_map_pos+78>: ret
2. With range_check_0:
0x809e97c <normalize_map_pos>: push %ebp
0x809e97d <normalize_map_pos+1>: mov %esp,%ebp
0x809e97f <normalize_map_pos+3>: mov 0x8(%ebp),%ecx
0x809e982 <normalize_map_pos+6>: mov (%ecx),%eax
0x809e984 <normalize_map_pos+8>: test %eax,%eax
0x809e986 <normalize_map_pos+10>: mov %eax,%edx
0x809e988 <normalize_map_pos+12>: jns 0x809e99c <normalize_map_pos+32>
0x809e98a <normalize_map_pos+14>: mov %esi,%esi
0x809e98c <normalize_map_pos+16>: mov %edx,%eax
0x809e98e <normalize_map_pos+18>: add 0x816fec0,%eax
0x809e994 <normalize_map_pos+24>: test %eax,%eax
0x809e996 <normalize_map_pos+26>: mov %eax,(%ecx)
0x809e998 <normalize_map_pos+28>: mov %eax,%edx
0x809e99a <normalize_map_pos+30>: js 0x809e98c <normalize_map_pos+16>
0x809e99c <normalize_map_pos+32>: mov %eax,%edx
0x809e99e <normalize_map_pos+34>: jmp 0x809e9a6 <normalize_map_pos+42>
0x809e9a0 <normalize_map_pos+36>: sub %eax,%edx
0x809e9a2 <normalize_map_pos+38>: mov %edx,%eax
0x809e9a4 <normalize_map_pos+40>: mov %eax,(%ecx)
0x809e9a6 <normalize_map_pos+42>: mov 0x816fec0,%eax
0x809e9ab <normalize_map_pos+47>: cmp %eax,%edx
0x809e9ad <normalize_map_pos+49>: jge 0x809e9a0 <normalize_map_pos+36>
0x809e9af <normalize_map_pos+51>: mov 0xc(%ebp),%ecx
0x809e9b2 <normalize_map_pos+54>: mov 0x816fec4,%edx
0x809e9b8 <normalize_map_pos+60>: xor %eax,%eax
0x809e9ba <normalize_map_pos+62>: cmp %edx,(%ecx)
0x809e9bc <normalize_map_pos+64>: setb %al
0x809e9bf <normalize_map_pos+67>: pop %ebp
0x809e9c0 <normalize_map_pos+68>: ret
Conclusion? The 1% savings isn't insignificant, but there are a lot of
other optimizations that will net a much larger gain. The cost here
isn't as high as that of macro-izing every function, though.
I still strongly believe that the macros should be placed into shared.h
rather than map.h.
jason
- [Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208), (continued)
- [Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208), Raimar Falke, 2002/01/07
- [Freeciv-Dev] Native Coordinates [Was:[PATCH] Map cleanups (PR#1208), Ross W. Wetmore, 2002/01/07
- [Freeciv-Dev] Re: Native Coordinates [Was:[PATCH] Map cleanups (PR#1208), Jason Short, 2002/01/09
- [Freeciv-Dev] Re: Native Coordinates [Was:[PATCH] Map cleanups (PR#1208), Ross W. Wetmore, 2002/01/10
- [Freeciv-Dev] Re: Native Coordinates [Was:[PATCH] Map cleanups (PR#1208), Jason Short, 2002/01/10
- [Freeciv-Dev] Re: Native Coordinates [Was:[PATCH] Map cleanups (PR#1208), Ross W. Wetmore, 2002/01/10
[Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208),
jdorje <=
[Freeciv-Dev] Re: [PATCH] Map cleanups (PR#1208), jdorje, 2002/01/14
|
|