Re: [Freeciv-Dev] Internationalization - city names
[Top] [All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
Peter Schaefer wrote:
> It works fine for me as it is now; I think a lot of people will
> be comfortable with the special characters in latin-1 encoding.
> Maybe ( guessing ) it would be possible to easily use Unicode in the gtk
> client.
> To prepare this, adding a field that names the encoding used,
> like char default_german_city_names_encoding[8]="latin-1" might be useful one
> day.
Adding unicode support is quite a complicated thing if you will thing
about fact that server and client have to comunicate. All current
messages use just 8-bit chars. Chaning it would require a lot of work -
printf works only for chars, I suppose that also gettext is done with
serving only one codepage in mind.
If we really would like to have unicode, My proposition is to always use
escape sequences for chars > 127. Exact form is not important, but this
would allow:
1) Incremental migration - if client would not support unicode it would
display some garbage instead of letter, but would not crash; plainm
english msg could be still send without any changes
2) full compatibility with all std c functions/gettext
3) server do not need to know anything about unicode - it just processes
a bit longer strings
This way all work would be done in client.
For my java client it would be a brief - java already has full unicode
support. For others special parsing routines would have to be used - if
given toolkit support unicode, good, if not, then it could either:
a) display ? instead of character > 127
b) use some translation table mapping each unicode to ascii char looking
mostly similar to original unicode letter (u instead of u with umlaut
etc).
As far as escape sequence is concerned, java proposes \uxxxx where xxxx
is hexadecimal number. It takes 6 bytes to transport 2 bytes of data. It
has a direct bonus of being <128 entirely and thus easily edited. Second
option would be to use UTF8 encoding, which uses 2 or 3 bytes for > 127
chars, but this could not be directly edited easily - but maybe we could
use \u encoding for text files and utf8 for network packets/internal
structures.
Artur
|
|