Complete.Org: Mailing Lists: Archives: freeciv-dev: January 2003:
[Freeciv-Dev] Re: (PR#2559) Font problem under BeOS
Home

[Freeciv-Dev] Re: (PR#2559) Font problem under BeOS

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: bernd.korz@xxxxxxxxxxxxx
Cc: freeciv-dev@xxxxxxxxxxx
Subject: [Freeciv-Dev] Re: (PR#2559) Font problem under BeOS
From: "Raimar Falke via RT" <rt@xxxxxxxxxxxxxx>
Date: Wed, 29 Jan 2003 09:33:24 -0800
Reply-to: rt@xxxxxxxxxxxxxx

On Tue, Jan 28, 2003 at 12:02:03PM -0800, Jason Short via RT wrote:
> 
> So, to start the explanation from scratch...
> 
> The problem is that UTF-16 can be in either little-endian or big-endian
> form.  The first two-byte character of the string is a byte-order-mark
> (BOM) that shows what the endian-ness of the string is.  See
> http://www.unicode.org/unicode/faq/utf_bom.html#22.
> 
> So, when using iconv to translate from the local encoding into UTF-16,
> iconv sticks the BOM onto the front of the string.
> 
> gui-sdl just skips past this character.  Rafal apparently though it was
> a bug.  SDL_ttf cannot handle the BOM, and in fact it appears SDL_ttf
> needs the unicode to be little-endian (at least on little-endian
> machines, see below).  The way to work around this is to use UTF-16LE,
> which is always little-endian (and doesn't need the BOM).  See the
> attached patch.
> 
> But, this could fail on big-endian machines.  gui-sdl breaks on Davide's
> alpha and Bernd's beos machine.  Both are little-endian (I'm told), but
> have big-endian UTF-16 strings for some reason.  But on a big-endian
> machine it's possible (even likely) that we will need to use big-endian
> unicode.  So in this case UTF-16LE wouldn't work - we'd probably have to
> check the endian-ness of the machine (at compile time, for instance) and
> use UTF-16BE for big-endian machines.
> 
> All of this seems like a gross hack.  SDL_ttf should be able to handle
> UTF-16 with a BOM.
> 
> P.S.  The gui-sdl conversion using iconv is unnecessarily long.  Since
> we know the target string's length (or think we do), we can do the
> entire conversion with one call to iconv.  The loop is unnecessary, and
> most of the extra variables are too.  But this function should go away
> when this feature is provided by the core code, so we can wait until
> then to change this.

From SDL_ttf 
(http://www.libsdl.org/cgi/cvsweb.cgi/SDL_ttf2/SDL_ttf.c?rev=1.12&content-type=text/x-cvsweb-markup):

static FT_Error Load_Glyph( TTF_Font* font, Uint16 ch, c_glyph* cached, int 
want )
{
...
        /* Load the glyph */
        if ( ! cached->index ) {
                cached->index = FT_Get_Char_Index( face, ch );
        }

ch is a utf-16 char (not sure which encoding).

The signature of FT_Get_Char_Index
(http://freetype.sourceforge.net/freetype2/docs/reference/ft2-base_interface.html#FT_Get_Char_Index)
however is:

  FT_UInt FT_Get_Char_Index(FT_Face face, FT_ULong charcode);

  Returns the glyph index of a given character code. This function
  uses a charmap object to do the translation.

charmaps are related to FT_Encoding and there is a 

   FT_ENCODING_UNICODE Corresponds to the Unicode character set. This
   value covers all versions of the Unicode repertoire, including
   ASCII and Latin-1. Most fonts include a Unicode charmap, but not
   all of them.

Summary: by the call of FT_Get_Char_Index in SDL_ttf the variable ch
is extended from 16 bits to 32 bits.

   THIS IS PLAIN WRONG

since there can be characters which consists of multiple 16bit values
(so called surrogates). Besides this it doesn't do any endianes
handling at all. So the normal compiler casts does the convertion of
the unicode char encoded in utf-16 into the unencoded unicode
value. This is also plain wrong.

Either we require a proper SDL_ttf or we have to do the conversion by
ourself. But not with iconv and only the in SDL client.

        Raimar

-- 
 email: rf13@xxxxxxxxxxxxxxxxx
 "Last year, out in California, at a PC users group, there was a demo of
  smart speech recognition software. Before the demonstrator could begin
  his demo, a voice called out from the audience: "Format c, return. Yes,
  return." Damned short demo, it was.




[Prev in Thread] Current Thread [Next in Thread]