Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2002:
[Freeciv-Dev] Re: [PATCH] bug in vnotify_conn_ex()
Home

[Freeciv-Dev] Re: [PATCH] bug in vnotify_conn_ex()

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: jdorje@xxxxxxxxxxxx
Cc: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: [PATCH] bug in vnotify_conn_ex()
From: "Ross W. Wetmore" <rwetmore@xxxxxxxxxxxx>
Date: Fri, 04 Jan 2002 00:20:18 -0500

At 07:51 AM 02/01/03 -0500, Jason Short wrote:
>Ross W. Wetmore wrote:
>
>> At 03:44 AM 02/01/02 -0500, Jason Short wrote:
>> 
[...]
>> Remember, is_normal_map_pos() was only deemed useful in asserts and 
>> allowed for the case of CHECK_POS(), i.e. to find passed coordinates 
>> now considered bogus during the process of debugging the new policy. 
>
>That was your position, but not mine.  I agreed that that was mostly true.

But since then you have repeatedly used it everywhere, it might possibly 
(but usually wrongly) has a vague connection.

Something this useless (as you mostly agreed) should be used sparingly and 
only after serious thought, especially in all the cases where it has been 
decided it was bad :-).

Seriously, this was Gaute's original mistake which was roundly trashed by
the mailing list. You have picked it up and really need to let it go :-).


>
>> The general rule is ... 
>> is_normal_map_pos() use is almost always a mistake or hiding one :-).
                              ^^^^^^^^^^^^^^^^^^^^^^^

>In this case, the mistake is that (-1,-1) is being used to indicate that 
>there is no tile position, when it is possible (in theory, and hopefully 
>in practice soon) that (-1,-1) is also a real, valid position.  Our 
>choices then become:

It is the generic (old) flag for a filler, unknown or bogus position
depending on the event context. This is in the codebase, and is not
going to change quickly, certainly not by these fixes. It is required
that the old behaviour remain for use in those contexts.

For now one just has to live with it, and add a new meaning of valid
tile to the mix. Sometimes shit happens :-).

You will note there was no coordinate context in the case that caused
the original known failure.

>1.  Use is_real_tile to check the tile's validity, and special-case 
>(-1,-1) as a nonexistent position.  This is totally wrong IMO; 

Only partly correct. (-1,-1) is special with several meanings some of
which have nothing to do with coordinates. YO isn't worth much here.

Note either codepath from this point will result in (-1,-1) being
passed on. It used to do this. It should still do this. There is no
problem with just doing it. There are no side effects in the test
code that doing it directly bypasses, as failure there would send
it into the (-1,-1) case directly as well. Downstream code understands 
how to deal with (-1,-1) since it has been dealing with it fine up to
now - so there is no detrimental effect from just passing it on.

The only bug, is the bogus access problems that have been created for 
non-normal coordinates by other foolish changes. That is all that needs 
fixing here.

>if 
>non-normal positions are ever passed in eventually (-1,-1) will be 
>legitimately passed in and will be treated as a nonexistent position. 

Non-normal is meaningless. I repeat, non-normal is meaningless, I repeat
non-normal is meaningless ... get it through your skull :-)

First, this doesn't apply to (-1,-1). This is a special case and should
be handled specially. For the rest ...

is_real_tile() is the correct, necessary and sufficient test to separate 
real from unreal coordinates. It is correctly used here, and should not 
be replaced with a useless ambiguous and incomplete soft condition test.

The code is correct in this regard. Your fix will break it.

What has changed is that map_get_known() no longer deals with real but
not normalized coordinates because of incomplete changes put in. This 
is one of the bugs those changes created and the fix is to move the 
normalization step that was dropped up into this code.

One could move it further up (to the caller code or original client
codepaths), but that is probably not wise. One needs to be sure that all 
code paths that get here are fixed including backwards compatible old 
client packets that may trigger this, which are by definition unfixable. 

It is safest and not unreasonable to limit the problems caused by earlier 
changes to this point, and not propagate them further.

One day when all the other cases are dealt with one can consider relaxing
the normalization here. IMHO, this would still likely be a mistake.

> I 
>am baffled as to why you have chosen this way...

Non-normal is meaningless. I repeat, non-normal is meaningless, I repeat
non-normal is meaningless ... get it through your skull :-)

>2.  Force all map positions passed in to be "normal", and check for 
>is_normal_map_pos.  This is the same as #1 except that we force the 
>caller to deal with calling normalize_map_pos, and thus avoid the 
>potential introduced by #1.  It _does_ present a problem 

Yes, you are facing the problems from the previous ill-considered
actions and there is no sensible reason to make things worse without
first cleanly up the upstream issues. 

>because on the 
>client side the set of normal positions may be different, but it's safe 
>enough to special-case (-1,-1) there because we trust that the server 
>has taken care of any problems.

Only for the clients you can fix. You break all current clients by this
which is not particularly good policy. Current clients send various
event/packets and expect various event/replies. Part of what they
understand is how to ignore or deal with (-1,-1).

>3.  Do a real cleanup and pass in extra data to let us know whether the 
>position is relevant.  For instance, we could pass in pointer to a 
>struct map_position (which may be NULL) or simply an extra flag 
>containing the data (which is better since it can be sent across the 
>network as such).  This would be the "correct" solution, but it would 
>take more work.  Raimar has expressed the opinion that the fix isn't 
>worth the effort.

Then follow Raimar's advice and apply the tourniquet solution to the 
problem here :-).

It's quick, and restores previous behaviour to full functionality.

>Now, you may point out that all we're doing is sending (-1,-1) as the 
>position to the client in place of (-1,-1).  But suppose such an event 
>happens and (-1,-1) is sent to the client. 

This is what the code is currently designed to do ... send (-1,-1) to
the client. Your concern about "sending (-1,-1) ... in place of (-1,-1)"
definitely seems a few marbles short of a full set :-).

> Should the client think that 
>this event has no coordinates associated with it (for instance, 

That is one of the ways it thinks of it, yes. ENOEVENTs must be sent
this way. The coordinates are ignored.

>if it 
>happens out-of-sight), or that the coordinates (-1,-1) are associated 
>with the event?  If we use your system, there is no way to tell. 

Not my system ... it is THE CURRENT SYSTEM. That's the way it is designed 
to work, the way it is currently coded to work.

> But if 
>we force these positions to be "normal" and specify that (-1,-1) can 
>never be "normal" then there is no ambiguity.

It works fine as is/was. The correct restoration fix has been thoroughly 
exercised in both Torus world and current scenarios and there is no problem 
here that needs any "forcing".

You will just breaks 29 different existing codepaths and correctly 
functionning behaviour that is just waiting there to receive this packet 
if you now decide they shouldn't have it :-).

All this just to ram another is_normal_map_pos() call into code ... sheesh!

><change of topic>

>When we talk about the similar goto_dest problem, the use of #1 is even 
>worse, because the solution you propose doesn't even special-case out 
>the "nonexistent position" values! 

It most certainly does! Go figure out what normalize_map_pos() does.
Even in the current buggy version it still does this :-).

Normalize_map_pos() is what is_normal_map_pos() keeps trying to be, but
can't because it doesn't understand the critical hard condition realness.

> Under an arbitrary topology, when 
>position (0,0) (or (-1,-1) which would be better) is initialized as the 
>goto_dest, your code will call normalize_map_pos on it to check for 
>existence.  But this is totally wrong! 

No, it is perfectly correct, and the check is for realness in case you
are trying to regularize it with a new 1984ish term existence.

> A position is either (1) 
>non-existent, in which case the coordinates will be a fixed (0,0) which 
>may either be real or not or (2) existent, in which case the coordinates 
>had better be real and should probably be normal as well. 

A position is either unreal, in which case you can't do anything with
it and better have coded the failure path, or it is real. This is an
unambiguous hard condition requiring real separate codepaths. 

If you want to use it for things that require a further soft condition 
of "normalized" then you can *always* do this if it is real. The code 
did this all the time until the practice was recently ruled out-of-favour.
But, there is nothing to stop you from doing it.

In these, cases you normalize it. Since a correctly functionning
normalize_map_pos() does all this, tests for realness, tests for normal 
and normalizes only if necessary it can be called to return the unreal 
condition, or the correct useful coordinates with which to continue.

In such a world, there is no need to abort the server because a faulty
incomplete test failed to detect a perfectly reasonable and fixable 
condition it just wasn't designed to detect or fix. Gaute never would 
admit the uselessness of the false condition, and the problems that 
ignoring the ambiguities in it raise. 

> Again, making 
>the code "correct" would be the best choice, but choice #1 must be 
>avoided at all costs.  Essentially, you are "initializing" the goto_dest 
>to just point at some random position on the map.

No, the return condition correctly detects the unreal state and does
the correct thing. That there are side effects to current buggy version
of normalize_map_pos() is 1) a separate problem to be fixed, 2) not of
concern here because even it doesn't yet convert unreal to real, it just
scrambles the current values, 3) not a problem with the fields because
if it had garbage before and has different garbage now, then this could
not possibly affect correctly functionning code.

></change of topic>
>
>Perhaps we can just get off this argument and do the full fix instead?

Full fix to what? 

>>>There are still some minor pitfalls: (1) it 
>>>obviously won't work if is_normal_map_pos(-1,-1), 

No problem. This isn't a bug in the current code until after you have
made it one.

>and (2) we need to be 
>>>careful about non-normal positions other than (-1,-1) being passed in; 
>>>these would probably indicate a bug.  Both of these can be detected with 
>>>a well-placed assertion.

Both are currently detected fine, thank you. The results are then sent
to the client. There is no need to break this code by introducing a panic
fix.

>> The use of assertions is really unnecessary. This is a reasonable point
>> to do full checks and fixups as an entry/exit data portal. Killing a server
>> everytime a client does something bogus is not good form.
>
>This data is being sent to the client, not taken from it.  It is the 
>server that is generating these positions, and it should not be possible 
>for the client to get the server to generate a bogus position.  But if 
>it is possible, then this is the place to catch it.

  This is the basis for following notify_conn* and notify_player* functions.
  Notify specified connections of an event of specified type (from events.h)
  and specified (x,y) coords associated with the event.  Coords will only
  apply if game has started and the conn's player knows that tile (or
  pconn->player==NULL && pconn->observer).  If coords are not required,
  caller should specify (x,y) = (-1,-1).  For generic event use E_NOEVENT.
  (But current clients do not use (x,y) data for E_NOEVENT events.)
   

I think the original error case was generated as a response to a client.
Yes, the server is sending back appropriate data here including the 
ENOEVENT (-1,-1) event in which the coordinate values are fillers.

No "position" is being generated. Nothing is "wrong" here. The problem
was simply that is_real_tile() passed this on as a real coordinate for
further checking and the earlier changes broke the checking code. This
particular coordinate pair never needs to be checked. Whether real or
unreal, it will always be sent on as (-1,-1) if you read the code.

>> The original code did full checks. There might actually be a reason for 
>> this, other than happenstance. Certainly it has not caused problems, so
>> there is no reason to change it here.
>
>Currently it is safe to pass in (-1,-1) to mark a "non-existent" 
>position, because (-1,-1) will always be unreal.  Under different 
>topologies, (-1,-1) will not always be unreal.  Thus, a change is necessary.

No, change is necesssary ... if the code detected a failure condition
it would still send (-1,-1), if it passed all the checks it would send
(-1,-1).

No problems with this code ... there maybe some elsewhere though :-).

>The reason the original code did the full check is that coordinates were 
>not assumed to be normal when they were passed in, so the code knew it 
>had better check for full realness (and normalize them too). 

Yup.

> This is 
>the situation that CHECK_MAP_POS has changed. 

Nope. 

> You may argue that the 
>change is unnecessary, but it has three definite advantages: (1) less 
>code overall, 

Dropping out required checks is not a particularly good way to reduce 
code.

(2) the code is faster, 

Versions of is_normal_map_pos(), I have seen do a full normalize_map_pos()
as part of their check, but don't return either all or complete results.
This would be slower even when incomplete.

>and (3) hacks like this will work 
>without introducing bugs.

Your hack introduces bugs, by propagating the current problem upstream
without dealing with the upstream first.

Restoring the original functionality with a safe realness and proper 
normalization here will put things back to the way they were before
1984 broke them.

>jason

Cheers,
RossW
=====




[Prev in Thread] Current Thread [Next in Thread]