Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2003:
[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: jrg45@xxxxxxxxxxxxxxxxx
Cc: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
From: "Jason Short via RT" <rt@xxxxxxxxxxxxxx>
Date: Mon, 6 Jan 2003 11:21:26 -0800
Reply-to: rt@xxxxxxxxxxxxxx

Please send your responses to RT, so we have a full record of the
discussion there...

On Mon, 2003-01-06 at 12:15, Reinier Post wrote:
> On Fri, Jan 03, 2003 at 05:02:09AM -0800, Jason Short via RT wrote:
> > 
> > Freeciv's ruleset data is mostly in ASCII.  But some is in latin1.  Some
> > may be in other charsets (latin2).
> > 
> > These charsets are generally not compatible with anything except
> > themselves.  And without manual conversion of the strings into the local
> > charset, a user that is using a different charset is likely to see
> > garbage when they are displayed.
> > 
> > A simple solution would be to say all data must be in ASCII.  Then any
> > ascii-compatible charset will work.  But this is inferior.
> >
> > Yet, if we have latin1 and latin2 characters for some nations, why
> > shouldn't we have chinese or japanese characters for those nations? 
> > Where, if anywhere, do we draw the line?  And if we do allow this, how
> > much work are will willing to expend to get it to work?
> 
> I know little about this, but my gut feeling is: the goal should be
> to handle everything in terms of GNU gettext locales and GNU iconv
> character set encodings, and to provide support for any locale that this
> software will support, with the constraint that any incarnation of a
> Freeciv client or server supports only a single locale (and charset)
> at the same time (plus the default C locale for unlocalized parts).
> This means you can see stuff from the C locale and from the German or
> Japanese locale in the same client incarnation, but you can never see
> both German and Japanese in the same client incarnation.

Remember unicode may be able to show both.

> BTW, while a full locale consists of
> 
>   + a character set encoding
>   + a message catalog
>   + a notation for numbers
>   + a notation for time
>   + a notation for money
>   + a collating sequence for string sorting
> 
> (and perhaps even more?) Freeciv aims to support only the first two,
> as far as I know.

This bug is caused by the charset problem that freeciv has, although
there is also a translation problem that is closely related.

> > The ruleset currently has the concept of two "types" of strings:
> > translatable and non-translatable.  The name strings are, rightly,
> > marked as non-translatable.  But there are two types of non-translatable
> > strings: name strings (such as leader, nation, and city names) and data
> > strings (such as names of other ruleset files).
> 
> I don't think there should be.  There should be only two types:
> "translatable" strings (marked for localization) and other strings.
> How to mark is defined for the Freeciv source code; for documentation,
> localization is supposed to be done on a per-file basis.  This breaks down
> for the website, which has many pages that mix code and translatable text.
> This is being worked on.

So you think all the name strings should be ascii?

> > To handle this
> > correctly the ruleset needs a way to mark all three types - or the
> > program needs to know which is which.  The translatable strings should
> > be in ascii, and are translated into the local charset by gettext.  The
> > name strings are in their own locale (specified in the ruleset), and
> > should be converted into the local charset (by iconv) when loaded.  The
> > data strings are in ascii, and shouldn't be touched.
> 
> I think letting names appear in a "native" locale will complicate matters
> and will also confuse users.  Most English speakers won't like to see
> Japanese city names in Japanese script.  It's much better to let, in the
> Japansese client for example, all translatable strings be either localized
> in the same (Japanese) locale, or if no translation is available, leave
> them untranslated (i.e. in English).  That way you only have two types
> of strings.  I don't know how the Japanese client handles untranslated
> strings at the moment.
> 
> Unmarked text should not be called "ascii" text.  

So all names are marked for translation?  That's a lot of work for 
translators.

A user using latin1 will not be able to display a Japanese-encoded name.
I wonder if iconv is able convert non-Kanji Japanese characters into
ascii ones (thus forming words like "sushi" and "mitsubishi" from their
Japanese originals)?

And, if we don't call it 'ascii' text what should we call it?

> > This is all simple if the server and client have the same charset (which
> > in most cases they do).  But if that's not the case then we have a
> > problem.
> 
> I don't see why the charset makes a difference.  What should be sent
> is the unlocalized text including localization markings.  That text can
> always be 7-bit clean ASCII.

The charset makes all the difference, since that is the origin of the
bug.  The ruleset is in a latin1 encoding, while my client is using a
UTF-8 encoding.  This means many of the names are incompatible and
cannot be displayed.

> > The best/only way to solve this is to say all network
> > communications must be in UTF (probably UTF-8); each end of the
> > connection may convert the UTF into their local encoding.  Again we have
> > different types of strings: name strings (which should be sent in UTF-8,
> > then converted) and data strings (which are sent in ASCII and left
> > untouched).
> 
> There is no reason to do this, you can always send English language
> strings plus any localization markings required.

The name of a Hungarian city is not an "english language string". 
Translating names is a problem for modpack authors, although this is
already the case for unit names and such.

And currently, the translation is done at the server, into the server's
locale and charset.  This is a problem if the server and client's locale
differ - although this is a rare problem (most games are single-player).

jason




[Prev in Thread] Current Thread [Next in Thread]