Complete.Org: Mailing Lists: Archives: freeciv-dev: May 2004:
[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Kenn.Munro@xxxxxxxxxxxxxx, jdwheeler42@xxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, pawel@xxxxxxxxxxxxxxx, per@xxxxxxxxxxx
Cc: mrproper@xxxxxxxxxx, jlangley@xxxxxxx
Subject: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
From: "Jason Short" <jdorje@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 7 May 2004 10:00:51 -0700
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >

Vasco Alexandre Da Silva Costa wrote:
> On Fri, 7 May 2004, Raimar Falke wrote:
> 
>><URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >
> 
>>Can you explain the reason for ascii? I have the feeling that you have
>>too much of the current model in your mind. If we do this we should do
>>it right. So instead of adding two types (black+blue and black+red) we
>>should only have one (blue in the server, red in the client). This
>>will help the understanding a lot ("no field name of struct nation is
>>ascii because it is read from the ruleset but field name of struct
>>team is unicode because ..." better would be "all strings are foo at
>>the server, bar on the wire and foobar at the client").
> 
> Same here. I think the server should use UTF-8, the wire UTF-8 and the
> client whatever it likes. UTF-8 for GTK+ 2.2, UTF-16 for SDL, locale
> encoding for the others. Easier this way.

Some strings are in ascii.  My design takes advantage of the fact that 
ascii will be a subset of both the unicode (blue) and the locale (green) 
encodings.  So although the rulesets are in unicode, for most strings we 
can just load them into memory and treat them as ascii.

You're right that this is not strictly correct, but if you think it's 
easier to do the full conversion I think you're quite mistaken.  For 
instance the ruleset contains numerous sprite tags.  These tags are 
currently in ascii (converting them to unicode leaves them unchanged). 
If we do it my way they are loaded as ascii, sent as ascii, and used as 
ascii.  But if we do it your way they are loaded as unicode (which they 
are), sent as unicode, converted to the GUI encoding at the client end, 
and now must be converted _back_ to unicode to be used (we cannot use 
these in the GUI encoding because it may be incompatible, e.g., UTF-16). 
  This means many lines of code in tilespec.c must be changed.  Code like

   ut->sprite = lookup_sprite_tag_alt(ut->graphic_str, ut->graphic_alt,
                                     unit_type_exists(id), "unit_type",
                                     ut->name);

must be changed to

   ut->sprite = lookup_sprite_tag_alt(W(ut->graphic_str),
                                      W(ut->graphic_alt),
                                     unit_type_exists(id), "unit_type",
                                     W(ut->name));

where W() is some function that converts from the GUI encoding to the 
local encoding.  This conversion cannot be done inside 
lookup_sprite_tag_alt because most strings that it takes are ascii.  It 
could be done in the network code somewhere but that wouldn't be any 
easier (you'd still need to change 2 lines of code to do it; this can't 
be automated AFAICT); also this means some strings would be stored in 
ASCII which you said you didn't want.

You may have noticed that ut->name is also passed to W().  This is 
because it also comes from the server and is also in the GUI encoding. 
However unlike the sprite tags this string is used EVERYWHERE.  It's 
even used by the common code, where we can't easily call W() (because 
doing so for the server would be incorrect).  So for this string the 
only reasonable solution is to call W() on it when it's received by the 
network.

But if we do that, then what's the point of converting it to the GUI 
encoding just so that we can convert it right back?  We have to have 
these strings be stored in ASCII inside the client, so why not automate 
the process?  Give the net code a list of which strings are ascii and 
which are not.  Then they can all be converted automatically.

This whole argument is based on the idea that the GUI encoding may not 
necessarily be a superset of ascii.  If it were, then all 
transformations of ascii strings would be identities and we can be 
sloppy with what encoding we're pretending the string is in right now. 
This sloppiness isn't really better - it's sloppy, after all - but it is 
certainly easier.  So another alternative is that we require the GUI 
encoding to be a superset of ascii.  This would mean if gui-sdl wants to 
use UTF-16 it must make the conversion internally (I imagine this is 
what the GTK2 library does).  Other clients probably won't be affected.

jason




[Prev in Thread] Current Thread [Next in Thread]