Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2003:
[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: jdorje@xxxxxxxxxxxxxxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, Kenn.Munro@xxxxxxxxxxxxxx
Cc: jlangley@xxxxxxx, mrproper@xxxxxxxxxx, freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
From: "Raimar Falke via RT" <rt@xxxxxxxxxxxxxx>
Date: Sat, 25 Jan 2003 01:57:37 -0800
Reply-to: rt@xxxxxxxxxxxxxx

On Fri, Jan 24, 2003 at 02:36:51PM -0800, Jason Short via RT wrote:
> 
> After discussing with Vasco and Per, I think we should do the following...
> 
> There are three encodings freeciv uses: local_encoding,
> freeciv_encoding, and display_encoding.  

> The local encoding is determined by $LANG, and is the encoding
> gettext translates into. printf's should use this encoding.

According to
<http://www.gnu.org/manual/gettext/html_chapter/gettext_10.html#SEC149>
we can change the output charset of gettext to another charset
(freeciv_encoding for example). This leaves the printf used by
freelog, scorelog and gamelog.

> The freeciv_encoding is used for
> storage (including in savegames, rulesets, in memory, and across the
> network) of post-translation or untranslated strings (that is, strings
> that have already been translated or those that don't need to be); this
> should be UTF-8.  display_encoding is that used by the frontend - this
> will be UTF-8 for the GTK2 client, UTF-16 for the SDL client, and
> (AFAIK) local_encoding for all other clients and the server.
> 
> We provide conversion function for switching between encodings.  Three
> forms of the functions are possible:
> 
>   char *xxx_to_xxx_string(char *buf, size_t bufsz, const char *text);
>   char *xxx_to_xxx_string_malloc(const char *text);
>   char *xxx_to_xxx_string_static(const char *text);

I'm for providing only functions like: (sc=string_convert)

  void sc_xxx_to_yyy(void *dest, size_t dest_size_in_bytes, 
                     const void *src);

The others can be build with macros/inline functions.

> The first one converts the string into the provided buffer (as much of
> it as is possible).  The second calls fc_malloc to allocate an
> appropriately-sized buffer for the string.  The third one may not be
> desirable; this is just a shortcut to place the text into a static
> buffer (it saves a few lines at the cost of 512 bytes or so).
> 
> As many conversion functions as are needed will be provided, but I
> suspect we will only need to convert local->freeciv->display.  This
> means three types of conversions, times 2-3 conversion forms, is 6-9
> functions.  Of course most of these are just wrappers for the base
> conversion functions.
> 
> Note that there is really a fourth type of encoding, which I will call
> ascii_encoding, used for not-yet-translated strings (i.e., those that
> need to be translated).  We don't need to do any explicit conversion on
> this since gettext will take care of it.  It is likely (but not certain)
> that this encoding needs to be ascii so that these strings will work for
> people without NLS.
> 
> Then, we simply convert between forms as needed.  When a translated
> string is created and used directly in the GUI, we call
> local_to_display_string() on it.  When a string is loaded from a
> ruleset, translated, and stored, we call gettext and then
> local_to_freeciv_string() on it.  When we get a city name from the
> network and want to display it, we call freeciv_to_display_string() on
> it.  As long as we follow the basic tenet that any string stored beyond
> local scope is in freeciv_encoding, it should be easy for anybody who
> needs to use this string to convert it.
> 
> It may even be desirable to provide super-short wrapper macros for these
> operations - particularly the static one - so that we could do something
> like
> 
>    gtk_label_set_text(FtoD(pcity->name));
> 
> 
> All of this sounds crazy and complicated.  But it is the only way we
> have hit upon to simplify the handling of the different encodings that
> may be involved.  It makes the current complicated global problem a much
> simpler local one.

About the choice of the freeciv_encoding: if we don't want to change a
lot of code we have to keep the '\0' property:

  While utf-8 and other charsets/encodings like latin-1 only contain
  the byte 0 ('\0') as a terminating value utf-16 may contain this
  byte inside the normal text.

Otherwise we would have to replace a lot of strlen, strdup and so on
calls.

Also gettext returns the msgid unchanged it no translation could be
found. This means that we need to be neutral to the charset of msgid
(these are english and so the charset is ascii).

Both properties are satisfied are ascii, latin-1 and utf-8 but not by
utf-16 or ucs-4. See also http://czyborra.com/utf/ for an overview.

        Raimar

-- 
 email: rf13@xxxxxxxxxxxxxxxxx
 Make a software that is foolproof, and only fools will want to use it.




[Prev in Thread] Current Thread [Next in Thread]