[Freeciv-Dev] Re: (PR#1824) charset discussion
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >
On Thu, Aug 19, 2004 at 04:18:10PM +0000, Per Inge Mathisen wrote:
> In regards to the big charset discussion, I have spent some hours reading
> up on the new standards and google, and looked at the issue, and I have a
> few comments.
>
> First, I agree with Vasco that UTF-8 internally is the best long-term
> solution. UCS-2 is a dead end that is rapidly being obsoleted, as 2 bytes
> are no longer enough to support all charsets. In particular, exotic
> characters used for names had trouble fitting into UCS-2, and we have lots
> of names.
I think that you are wrong here and that the characters are really exotic.
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt lists:
SHAVIAN,UGARITIC,AEGEAN,OSMANYA. However I will not argue this point (much).
> We could go with UTF-32 (or UCS-4) internally and UTF-8 in data files, but
> I do not like this much. It seems wasteful and the least supported format
> externally.
UCS-4 would be an internal format. UTF-8 as an external format for the data
files seems like a good idea.
> The rest of the world seems to go either for UTF-16 (Java) or UTF-8 (W3C,
> Unix, Linux, Gnu). Microsoft implemented UCS-2 very early before the
> Unicode consortium decided 2 bytes were not enough, but their .NET stuff
> seems to support a variety of encodings (AFAICT UTF-32 is not supported).
> The simpler solution is, however, to just note that we have a fixed length
> buffer and a variable number of characters. If you use characters that
> need more bytes, the maximum length of your string is shorter.
Sorry but this just screams for problems in the future.
Raimar
- [Freeciv-Dev] Re: (PR#1824) charset discussion,
Raimar Falke <=
|
|