Complete.Org: Mailing Lists: Archives: freeciv-dev: August 2004:
[Freeciv-Dev] Re: (PR#1824) charset discussion
Home

[Freeciv-Dev] Re: (PR#1824) charset discussion

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Kenn.Munro@xxxxxxxxxxxxxx, jdorje@xxxxxxxxxxxxxxxxxxxxx, jdwheeler42@xxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, pawel@xxxxxxxxxxxxxxx, per@xxxxxxxxxxx
Cc: mrproper@xxxxxxxxxx, i-freeciv-1824@xxxxxxxxxxxxx, jlangley@xxxxxxx
Subject: [Freeciv-Dev] Re: (PR#1824) charset discussion
From: "Raimar Falke" <hawk@xxxxxxxxxxxxxxxxx>
Date: Thu, 19 Aug 2004 21:04:38 -0700
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >

On Thu, Aug 19, 2004 at 04:18:10PM +0000, Per Inge Mathisen wrote:
> In regards to the big charset discussion, I have spent some hours reading
> up on the new standards and google, and looked at the issue, and I have a
> few comments.
> 
> First, I agree with Vasco that UTF-8 internally is the best long-term
> solution. UCS-2 is a dead end that is rapidly being obsoleted, as 2 bytes
> are no longer enough to support all charsets. In particular, exotic
> characters used for names had trouble fitting into UCS-2, and we have lots
> of names.

I think that you are wrong here and that the characters are really exotic.
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt lists:
SHAVIAN,UGARITIC,AEGEAN,OSMANYA. However I will not argue this point (much).

> We could go with UTF-32 (or UCS-4) internally and UTF-8 in data files, but
> I do not like this much. It seems wasteful and the least supported format
> externally.

UCS-4 would be an internal format. UTF-8 as an external format for the data
files seems like a good idea.

> The rest of the world seems to go either for UTF-16 (Java) or UTF-8 (W3C,
> Unix, Linux, Gnu). Microsoft implemented UCS-2 very early before the
> Unicode consortium decided 2 bytes were not enough, but their .NET stuff
> seems to support a variety of encodings (AFAICT UTF-32 is not supported).

> The simpler solution is, however, to just note that we have a fixed length
> buffer and a variable number of characters. If you use characters that
> need more bytes, the maximum length of your string is shorter.

Sorry but this just screams for problems in the future.

        Raimar




[Prev in Thread] Current Thread [Next in Thread]