Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2003:
[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: jrg45@xxxxxxxxxxxxxxxxx
Cc: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
From: "Jason Short via RT" <rt@xxxxxxxxxxxxxx>
Date: Fri, 3 Jan 2003 05:02:09 -0800
Reply-to: rt@xxxxxxxxxxxxxx

Freeciv's ruleset data is mostly in ASCII.  But some is in latin1.  Some
may be in other charsets (latin2).

These charsets are generally not compatible with anything except
themselves.  And without manual conversion of the strings into the local
charset, a user that is using a different charset is likely to see
garbage when they are displayed.

A simple solution would be to say all data must be in ASCII.  Then any
ascii-compatible charset will work.  But this is inferior.

Yet, if we have latin1 and latin2 characters for some nations, why
shouldn't we have chinese or japanese characters for those nations? 
Where, if anywhere, do we draw the line?  And if we do allow this, how
much work are will willing to expend to get it to work?

The ruleset currently has the concept of two "types" of strings:
translatable and non-translatable.  The name strings are, rightly,
marked as non-translatable.  But there are two types of non-translatable
strings: name strings (such as leader, nation, and city names) and data
strings (such as names of other ruleset files).  To handle this
correctly the ruleset needs a way to mark all three types - or the
program needs to know which is which.  The translatable strings should
be in ascii, and are translated into the local charset by gettext.  The
name strings are in their own locale (specified in the ruleset), and
should be converted into the local charset (by iconv) when loaded.  The
data strings are in ascii, and shouldn't be touched.

This is all simple if the server and client have the same charset (which
 in most cases they do).  But if that's not the case then we have a
problem.  The best/only way to solve this is to say all network
communications must be in UTF (probably UTF-8); each end of the
connection may convert the UTF into their local encoding.  Again we have
different types of strings: name strings (which should be sent in UTF-8,
then converted) and data strings (which are sent in ASCII and left
untouched).  (This may be affected by patches that aim to improve the
translation of server-side messages.)  Even with this, you may end up
with an impossible conversion (for instance latin1 into japanese
characters), but at least iconv will should give you a valid string (as
opposed to the current situation where latin1 strings aren't even valid
in utf8 and are not displayed at all), and if the user uses UTF they
should be able to deal with anything.

jason



[Prev in Thread] Current Thread [Next in Thread]