Complete.Org: Mailing Lists: Archives: freeciv-dev: May 2004:
[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets

[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]

To:	Kenn.Munro@xxxxxxxxxxxxxx, jdorje@xxxxxxxxxxxxxxxxxxxxx, jdwheeler42@xxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, pawel@xxxxxxxxxxxxxxx, per@xxxxxxxxxxx
Cc:	mrproper@xxxxxxxxxx, jlangley@xxxxxxx
Subject:	[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
From:	"Raimar Falke" <i-freeciv-lists@xxxxxxxxxxxxx>
Date:	Sun, 9 May 2004 22:26:15 -0700
Reply-to:	rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >

On Sun, May 09, 2004 at 08:50:34PM -0700, Vasco Alexandre da Silva Costa wrote:
> 
> <URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >
> 
> On Sat, 8 May 2004, Raimar Falke wrote:
> 
> FWIW I agree with:
> <http://www.freeciv.org/~rfalke/iconv-rf.png>
> 
> "Unicode" is in UTF-8

If the assumption is: "make as less as changes as possible and reuse
as most as possible code" utf-8 is the one and only choice here.

> "Network" is in UTF-8

Here I want a measurement between utf8 and utf16 (and maybe ucs4) with
different locales. While the outcome may be clear it isn't since we
use compression afterwards. So if we use utf8 for the network encoding
it isn't while we use utf8 in the rest of the code it is since it is
the best worker for the job.

> Because it makes no sense to use UCS-2. It gives nothing. You need
> to change the code all over the map and make your own complete C
> library for string handling functions. If we use UTF-8, many
> functions such as strlen(), will still work.

<rant>

The question is if the assumption from above is a good one. It is a
lazy one. I think this is bad. Almost all code (90+%) works if you
change from latin1 to utf8 but you have a hard time catching the
problem cases. Code like:

char *s = ...
size_t len=strlen(s);
int i;

for(i=0;i<len;i++) {
  if(s[i]=='ä') ...
}

will just break. And there is no help from the compiler or tools like
splint because the form (char *) is the same but you changed the
content. Code like:

...
for(i=0;i<len;i++) {
  if(s[i]=='/') ...
}

will work because of the prefix property of utf8 but I consider such
code wrong or at least very misleading. Code like:

...
char *deli=_(",")[0];

for(i=0;i<len;i++) {
  if(s[i]==deli) ...
}

is broken multiple times. But it may work for all cases except when
the Japanese locale is active.

So while I agree that utf8 will require less changes it isn't very
clear to me that it requires less work.

The advantage of ucs2 is not really that it has the one character==one
element property but that it requires that you go over all your code
and inspect the code. With ucs2 you have a high chance to say
afterwards "we have complete unicode support".

I don't really think that you agree with me on this point. Laziness is
one of the inherent attributes of the human nature. And you can say
that the gtk2 and the c++ people also made this choice.

</rant>

> > For utf8 number of chars isn't number of bytes. There were problems
> > with some buffers. Also is_sane_name uses is_iso_latin1.
> 
> Is that the problem? There is a maximum number of bytes a character can
> take in UTF-8:
> <http://en.wikipedia.org/wiki/UTF-8>

> 4 bytes. Just increase the size of all string buffers * 4. So, it is twice
> as much as UCS-2. Big deal. UCS-2 is twice as big as ISO-8859-1. :-)

Hack.

> > You agree that it is the cleaner approach?!
> >
> > For gui-sdl and gui-fs these changes aren't this much of a problem
> > since all string data already goes through only a few functions.
> 
> Using UTF-8, everywhere you can, makes more sense. No need for network
> translations either.

You agree that gui-sdl uses a gui encoding of ucs2?!

You agree that because of this the red part is a lot smaller than in
Jason image?!

        Raimar

-- 
 email: rf13@xxxxxxxxxxxxxxxxx
 "#!/usr/bin/perl -w
  if ( `date +%w` != 1 ) {
    die "This script only works on Mondays." ;
  }"
    -- from chkars.pl by Cornelius Krasel in de.comp.os.linux.misc

[Prev in Thread]

Current Thread

[Next in Thread]

[Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets, (continued)
- [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets, Jason Short, 2004/05/10

Prev by Date: [Freeciv-Dev] Re: (PR#576) Wishlist: alternating unit movement
Next by Date: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Previous by thread: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Next by thread: [Freeciv-Dev] Re: (PR#1824) ruleset data is in incompatible charsets
Index(es):
- Date
- Thread