Complete.Org: Mailing Lists: Archives: freeciv-dev: November 2001:
[Freeciv-Dev] Re: GTK+ 2.0 client port
Home

[Freeciv-Dev] Re: GTK+ 2.0 client port

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Vasco Alexandre Da Silva Costa <vasc@xxxxxxxxxxxxxx>
Cc: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: GTK+ 2.0 client port
From: Gaute B Strokkenes <gs234@xxxxxxxxx>
Date: Sun, 25 Nov 2001 02:35:49 +0000

On Wed, 21 Nov 2001, vasc@xxxxxxxxxxxxxx wrote:
> Hello,
> 
> I have been porting the GTK+ client to GTK+ 2.0, and starting with
> GTK+ 2.0 all strings should be stored as UTF-8.  UTF-8 includes
> ASCII as a subset as you probably know.
> 
> Therefore i propose we store all the .po files in UTF-8 format.

Bad idea.  A better idea is to use bind_textdomain_codeset(), which
instructs gettext to do the conversion for you.  I think it is better
to leave it to the translators to decide what format they want their
po files in; after all it is they who have to edit them.

Since gettext will convert the contents of a .mo file to the locale
encoding unless instructed otherwise, your proposed change is in any
case a no-op.

> This will reduce the miriad of encodings we use in the translation
> files to just one.

What's the benefit in that?

> We have to take care to make the server compatible with these
> changes.

Yes.  I think it is best to let each client/server choose their own
internal encoding.

> There is one remaining problem: some nation files include strings in
> ISO-8859-1(latin1) encoding. Either we strip them to ASCII form or
> store them as UTF-8 with the inconvenience this will create in
> platforms without gettext().

The gettext() interface has a weakness in that the codeset used for
strings in the source must be a subset of the codeset used in any
given PO file.

The proper way to solve this is to use a different domain for message
strings in the rulesets, and use UTF-8 for that domain.  This has the
additional benefit of allowing translation of strings in modpacks.

Another consideration is the codeset used by the protocol.  This
should be UTF-8 in all cases.  It is easy to convert whenever you send
or receive a packet, but there are a few things to take care of.
Firstly, the client needs to replace characters that can not be
represented in its codeset with something else.  It is relatively easy
to use a table that does this for all the Latin codepoints, and then
fall back on ? for anything else.  Secondly, the server should be able
to store all UTF-8 names that it receives unchanged.

> How well do Win32 and Amiga cope with UTF-8?

Windows has decent Unicode support; unfortunately that support is in
terms of UTF-16 rather than UTF-8.  Unicode is nonexistent on the
Amiga.

-- 
Big Gaute                               http://www.srcf.ucam.org/~gs234/
I'm EXCITED!!  I want a FLANK STEAK WEEK-END!!  I think I'm JULIA
 CHILD!!


[Prev in Thread] Current Thread [Next in Thread]