Complete.Org: Mailing Lists: Archives: freeciv-dev: May 2004:
[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Kenn.Munro@xxxxxxxxxxxxxx, jdwheeler42@xxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, pawel@xxxxxxxxxxxxxxx, per@xxxxxxxxxxx
Cc: mrproper@xxxxxxxxxx, jlangley@xxxxxxx
Subject: [Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
From: "Jason Short" <jdorje@xxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 7 May 2004 11:32:30 -0700
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >

> [jdorje - Fri May 07 17:00:46 2004]:

> This whole argument is based on the idea that the GUI encoding may not 
> necessarily be a superset of ascii.  If it were, then all 
> transformations of ascii strings would be identities and we can be 
> sloppy with what encoding we're pretending the string is in right now. 
> This sloppiness isn't really better - it's sloppy, after all - but it is 
> certainly easier.  So another alternative is that we require the GUI 
> encoding to be a superset of ascii.  This would mean if gui-sdl wants to 
> use UTF-16 it must make the conversion internally (I imagine this is 
> what the GTK2 library does).  Other clients probably won't be affected.

After some more discussion with Vasco I think that's what we should do.

It may sound like this will screw over gui-sdl, but actually it won't. 
Gui-sdl already has a full infrastructure to convert from UTF-16 to
latin1.  All we have to do is change "latin1" to be utf-8 and things
will work fine.

So with these changes the flow chart becomes much simpler.  See
http://freeciv.org/~jdorje/iconv.png.

In this chart each box accepts one encoding and emits one encoding.  For
black (ascii), blue (unicode), red (GUI encoding), and green (locale)
the box emits the same type of encoding it accepts.  For iconv boxes
(purple) and gettext (brown) the charset is changed in the process.

Gettext accepts only ascii.  It emits whatever we tell it to.  In the
server this is unicode, in the client it is the GUI encoding.

Strings are stored in the server in unicode.  In the client they are
stored in the GUI encoding.

You'll note the use of ascii remains.  Here is the "sloppy" handling
that I referred to above.  This saves us a number of iconv conversions
that would probably take hundreds or thousands of lines of code, with an
extensive audit.  Instead we just assume it works.

Because we assume all encodings are a superset of ascii, it is safe to
feed ascii into any box.  However in a few places we take ascii strings
out of a box.  Here we're basically assuming that those strings don't
have any non-ascii characters.  For instance we assume that the sprite
tags ("tu.phalanx") and unit names ("Phalanx") in the server don't have
any non-ascii characters in them.  However other ruleset data (city
names, ruler names) may have non-ascii characters.  So unless we do a
full audit we basically have to hope that things work out here.  Of
course we're doing this already so this isn't anything of a step backward.

jason

image/xfig


[Prev in Thread] Current Thread [Next in Thread]