Complete.Org: Mailing Lists: Archives: freeciv-dev: April 2004:
[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
Home

[Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets

[Top] [All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index] [Thread Index]
To: Kenn.Munro@xxxxxxxxxxxxxx, jdwheeler42@xxxxxxxxx, jrg45@xxxxxxxxxxxxxxxxx, pawel@xxxxxxxxxxxxxxx, per@xxxxxxxxxxx
Cc: mrproper@xxxxxxxxxx, jlangley@xxxxxxx
Subject: [Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets
From: "Jason Short" <jdorje@xxxxxxxxxxxxxxxxxxxxx>
Date: Sun, 25 Apr 2004 20:18:09 -0700
Reply-to: rt@xxxxxxxxxxx

<URL: http://rt.freeciv.org/Ticket/Display.html?id=1824 >

So here's an update of the 1824 patch.  It's still far from workable.  I
made some gtk2 changes (enough to get it to compile, although this
breaks the gtk2 client) and dropped sdl support (which should be easy
enough to add back, but hard to test since gui-sdl won't compile).  I
also changed the internal encoding to latin1 (after the patch is applied
we should convert all data and change it to utf-8).

The basic idea is that there are three different encodings.

The data encoding (or internal encoding, or network encoding) is used as
a global encoding for strings.  This encoding will be the same between
server and client.  All ruleset data (city names) must be in this
charset.  Non-ascii sent across the network are in this encoding.

The local encoding is the encoding of the terminal.  We have to convert
to this encoding before doing any printf stuff.  gettext will translate
into this encoding unless we tell it to do differently.  Server and
client will usually, but not always, have the same local encoding.

The display encoding is the encoding used by the GUI.  Often this is the
same as the local encoding.  But the gtk2 client uses UTF-8 here and the
SDL client uses UTF-16.

A fourth encoding is ASCII.  Many strings are in ascii.  Some are later
converted (at the translation step) into other encodings.  Others (like
sprite tags) are used as-is.  This encoding can basically be ignored,
since we don't need to convert these strings manually; however we do
need to remember not to convert such strings by mistake (although in
most cases it wouldn't matter, since ascii is a subset of common encodings).

So the question is how to convert between the encodings without having
to make changes _everywhere_.  There are a couple of ways of doing this.

What I'm leaning toward now is keeping all strings in the data encoding
internally.  We then convert to and from the display and local encodings
at the input and output stages.

We can tell gettext to put strings in the internal encoding instead of
the local encoding.  And any strings we receive over the network or from
the rulesets should already be in the internal encoding.  So any
freeciv-generated strings should be no problem.

For the local coding the conversion is fairly easy.  We can copy what
Vasco has for the gtk2 client and write our own fc_fprintf function.  No
doubt the server needs to convert its command-prompt input as well.

For the display encoding things are a bit harder.  Clients will need to
convert to and from the display encoding at every step (gui-sdl already
does this).  Some GUI libraries may allow this to be automated.  For
others it will be a lot of work!

jason

Index: configure.ac
===================================================================
RCS file: /home/freeciv/CVS/freeciv/configure.ac,v
retrieving revision 1.61
diff -u -r1.61 configure.ac
--- configure.ac        19 Apr 2004 17:24:16 -0000      1.61
+++ configure.ac        26 Apr 2004 03:01:14 -0000
@@ -305,6 +305,11 @@
   AC_MSG_ERROR([zlib found but not zlib.h.  
 You may need to install a zlib \"development\" package.]))
 
+dnl Check for libiconv (which is usually included in glibc, but may be
+dnl distributed separately).
+AM_ICONV
+LIBS="$LIBS $LIBICONV"
+
 dnl Check and choose clients
 if test x$client != xno; then
 
Index: configure.in
===================================================================
RCS file: /home/freeciv/CVS/freeciv/configure.in,v
retrieving revision 1.237
diff -u -r1.237 configure.in
--- configure.in        20 Apr 2004 17:26:13 -0000      1.237
+++ configure.in        26 Apr 2004 03:01:14 -0000
@@ -298,6 +298,11 @@
   AC_MSG_ERROR([zlib found but not zlib.h.  
 You may need to install a zlib \"development\" package.]))
 
+dnl Check for libiconv (which is usually included in glibc, but may be
+dnl distributed separately).
+AM_ICONV
+LIBS="$LIBS $LIBICONV"
+
 dnl Check and choose clients
 if test x$client != xno; then
 
Index: client/packhand.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/client/packhand.c,v
retrieving revision 1.362
diff -u -r1.362 packhand.c
--- client/packhand.c   14 Apr 2004 17:18:36 -0000      1.362
+++ client/packhand.c   26 Apr 2004 03:01:14 -0000
@@ -26,6 +26,7 @@
 #include "capability.h"
 #include "capstr.h"
 #include "events.h"
+#include "fciconv.h"
 #include "fcintl.h"
 #include "game.h"
 #include "government.h"
@@ -433,7 +434,8 @@
   pcity->owner=packet->owner;
   pcity->x=packet->x;
   pcity->y=packet->y;
-  sz_strlcpy(pcity->name, packet->name);
+  data_to_display_string_buffer(packet->name,
+                               pcity->name, sizeof(pcity->name));
   
   pcity->size=packet->size;
   for (i=0;i<5;i++) {
@@ -659,7 +661,8 @@
   pcity->owner=packet->owner;
   pcity->x=packet->x;
   pcity->y=packet->y;
-  sz_strlcpy(pcity->name, packet->name);
+  data_to_display_string_buffer(packet->name,
+                               pcity->name, sizeof(pcity->name));
   
   pcity->size=packet->size;
   pcity->tile_trade = packet->tile_trade;
@@ -2611,7 +2614,7 @@
   pl->leader_count = p->leader_count;
   pl->leaders = fc_malloc(sizeof(*pl->leaders) * pl->leader_count);
   for (i = 0; i < pl->leader_count; i++) {
-    pl->leaders[i].name = mystrdup(p->leader_name[i]);
+    pl->leaders[i].name = data_to_display_string_malloc(p->leader_name[i]);
     pl->leaders[i].is_male = p->leader_sex[i];
   }
   pl->city_style = p->city_style;
Index: client/gui-gtk/gui_main.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/client/gui-gtk/gui_main.c,v
retrieving revision 1.148
diff -u -r1.148 gui_main.c
--- client/gui-gtk/gui_main.c   15 Apr 2004 19:36:01 -0000      1.148
+++ client/gui-gtk/gui_main.c   26 Apr 2004 03:01:14 -0000
@@ -31,6 +31,7 @@
 #include <unistd.h>
 #endif
 
+#include "fciconv.h"
 #include "fcintl.h"
 #include "game.h"
 #include "government.h"
@@ -798,6 +799,7 @@
 **************************************************************************/
 void ui_init(void)
 {
+  init_character_encodings(NULL, 1);
 }
 
 /**************************************************************************
Index: client/gui-gtk-2.0/connectdlg.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/client/gui-gtk-2.0/connectdlg.c,v
retrieving revision 1.32
diff -u -r1.32 connectdlg.c
--- client/gui-gtk-2.0/connectdlg.c     23 Apr 2004 23:13:55 -0000      1.32
+++ client/gui-gtk-2.0/connectdlg.c     26 Apr 2004 03:01:15 -0000
@@ -260,12 +260,12 @@
       GtkTreeIter it;
       int i;
 
-      row[0] = ntoh_str(pserver->name);
-      row[1] = ntoh_str(pserver->port);
-      row[2] = ntoh_str(pserver->version);
+      row[0] = pserver->name;
+      row[1] = pserver->port;
+      row[2] = pserver->version;
       row[3] = _(pserver->status);
-      row[4] = ntoh_str(pserver->players);
-      row[5] = ntoh_str(pserver->metastring);
+      row[4] = pserver->players;
+      row[5] = pserver->metastring;
 
       gtk_list_store_append(storelan, &it);
       gtk_list_store_set(storelan, &it,
@@ -1135,12 +1135,12 @@
     GtkTreeIter it;
     int i;
 
-    row[0] = ntoh_str(pserver->name);
-    row[1] = ntoh_str(pserver->port);
-    row[2] = ntoh_str(pserver->version);
+    row[0] = pserver->name;
+    row[1] = pserver->port;
+    row[2] = pserver->version;
     row[3] = _(pserver->status);
-    row[4] = ntoh_str(pserver->players);
-    row[5] = ntoh_str(pserver->metastring);
+    row[4] = pserver->players;
+    row[5] = pserver->metastring;
 
     gtk_list_store_append(storemeta, &it);
     gtk_list_store_set(storemeta, &it,
Index: client/gui-gtk-2.0/gui_main.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/client/gui-gtk-2.0/gui_main.c,v
retrieving revision 1.71
diff -u -r1.71 gui_main.c
--- client/gui-gtk-2.0/gui_main.c       15 Apr 2004 19:36:01 -0000      1.71
+++ client/gui-gtk-2.0/gui_main.c       26 Apr 2004 03:01:15 -0000
@@ -33,6 +33,7 @@
 #include <gdk/gdkkeysyms.h>
 
 #include "dataio.h"
+#include "fciconv.h"
 #include "fcintl.h"
 #include "game.h"
 #include "government.h"
@@ -170,9 +171,9 @@
 static gint timer_callback(gpointer data);
 static gboolean show_conn_popup(GtkWidget *view, GdkEventButton *ev,
                                gpointer data);
-static char *network_charset = NULL;
 
 
+#if 0
 /**************************************************************************
 Network string charset conversion functions.
 **************************************************************************/
@@ -232,6 +233,7 @@
 
   return ret;
 }
+#endif
 
 /**************************************************************************
 Local log callback functions.
@@ -1012,14 +1014,18 @@
 void ui_init(void)
 {
   gchar *s;
+#if 0
   char *net_charset;
+#endif
 
 #ifdef ENABLE_NLS
   bind_textdomain_codeset(PACKAGE, "UTF-8");
 #endif
+  init_character_encodings("UTF-8", 1);
 
   log_set_callback(log_callback_utf8);
 
+#if 0
   /* set networking string conversion callbacks */
   if ((net_charset = getenv("FREECIV_NETWORK_ENCODING"))) {
     network_charset = mystrdup(net_charset);
@@ -1038,6 +1044,7 @@
 
   dio_set_put_conv_callback(put_conv);
   dio_set_get_conv_callback(get_conv);
+#endif
 
   /* convert inputs */
   s = g_locale_to_utf8(user_name, -1, NULL, NULL, NULL);
Index: client/gui-xaw/gui_main.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/client/gui-xaw/gui_main.c,v
retrieving revision 1.91
diff -u -r1.91 gui_main.c
--- client/gui-xaw/gui_main.c   15 Apr 2004 19:36:01 -0000      1.91
+++ client/gui-xaw/gui_main.c   26 Apr 2004 03:01:15 -0000
@@ -37,6 +37,7 @@
 #include "canvas.h"
 #include "pixcomm.h"
 
+#include "fciconv.h"
 #include "fcintl.h"
 #include "game.h"
 #include "government.h"
@@ -261,6 +262,7 @@
 **************************************************************************/
 void ui_init(void)
 {
+  init_character_encodings(NULL);
 }
 
 /**************************************************************************
Index: common/Makefile.am
===================================================================
RCS file: /home/freeciv/CVS/freeciv/common/Makefile.am,v
retrieving revision 1.49
diff -u -r1.49 Makefile.am
--- common/Makefile.am  13 Feb 2004 07:57:58 -0000      1.49
+++ common/Makefile.am  26 Apr 2004 03:01:15 -0000
@@ -28,6 +28,8 @@
                effects.c       \
                effects.h       \
                events.h        \
+               fciconv.c       \
+               fciconv.h       \
                fcintl.c        \
                fcintl.h        \
                game.c          \
Index: common/fciconv.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/common/fciconv.c,v
retrieving revision 1.1
diff -u -r1.1 fciconv.c
--- common/fciconv.c    26 Apr 2004 02:13:30 -0000      1.1
+++ common/fciconv.c    26 Apr 2004 03:01:15 -0000
@@ -0,0 +1,289 @@
+/********************************************************************** 
+ Freeciv - Copyright (C) 2003 - The Freeciv Project
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+***********************************************************************/
+
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#include <assert.h>
+#include <errno.h>
+#include <stdio.h>
+#include <string.h>
+
+#ifdef HAVE_ICONV
+#include <iconv.h>
+#endif
+
+#ifdef HAVE_LANGINFO_CODESET
+#include <langinfo.h>
+#endif
+
+#include "fciconv.h"
+#include "fcintl.h"
+#include "mem.h"
+#include "support.h"
+
+#ifdef HAVE_ICONV
+#include "log.h"
+#endif
+
+#define DEFAULT_DATA_ENCODING "ISO-8859-1"
+
+static bool is_init = FALSE;
+static char convert_buffer[4096];
+
+#ifdef HAVE_ICONV
+static char *local_encoding, *data_encoding, *display_encoding;
+static const size_t local_encoding_size = 1, data_encoding_size = 1;
+static size_t display_encoding_size = 1;
+#endif
+
+/***************************************************************************
+  Must be called during the initialization phase of server and client to
+  initialize the character encodings to be used.
+***************************************************************************/
+void init_character_encodings(char *my_display_encoding,
+                             size_t encoding_size)
+{
+#ifdef HAVE_ICONV
+  static char local[128];
+
+  /* Set the data encoding - first check $FREECIV_DATA_ENCODING,
+   * then fall back to the default. */
+  data_encoding = getenv("FREECIV_DATA_ENCODING");
+  if (!data_encoding) {
+    /* Currently the rulesets are in latin1 (ISO-8859-1). */
+    data_encoding = DEFAULT_DATA_ENCODING;
+  }
+
+  /* Set the local encoding - first check $FREECIV_LOCAL_ENCODING,
+   * then ask the system. */
+  local_encoding = getenv("FREECIV_LOCAL_ENCODING");
+  if (!local_encoding) {
+#ifdef HAVE_LIBCHARSET
+    local_encoding = locale_charset();
+#else
+#ifdef HAVE_LANGINFO_CODESET
+    local_encoding = nl_langinfo(CODESET);
+#else
+    local_encoding = "";
+#endif
+#endif
+    if (strcasecmp(local_encoding, "ANSI_X3.4-1968") == 0
+       || strcasecmp(local_encoding, "ASCII") == 0) {
+      /* HACK: use latin1 instead of ascii in typical cases when the
+       * encoding is unconfigured. */
+      local_encoding = "ISO-8859-1";
+    }
+
+
+    my_snprintf(local, sizeof(local), "%s//TRANSLIT", local_encoding);
+    local_encoding = local;
+  }
+
+  /* Set the display encoding - first check $FREECIV_DISPLAY_ENCODING,
+   * then check the passed-in default value, then fall back to the local
+   * encoding. */
+  display_encoding = getenv("FREECIV_DISPLAY_ENCODING");
+  if (!display_encoding) {
+    display_encoding = my_display_encoding;
+
+    if (!display_encoding) {
+      display_encoding = local_encoding;
+    }
+  }
+  display_encoding_size = encoding_size;
+
+  fprintf(stderr, "Data=%s, Local=%s, Display=%s\n",
+        data_encoding, local_encoding, display_encoding);
+#else
+   /* freelog may not work at this point. */
+   fprintf(stderr,
+           _("You are running Freeciv without using iconv.  Unless\n"
+           "you are using the latin1 character set, some characters\n"
+           "may not be displayed properly.  You can download iconv\n"
+           "at http://gnu.org/.\n";));
+   assert(encoding_size == 1);
+#endif
+
+   is_init = TRUE;
+}
+
+#ifdef HAVE_ICONV
+/***************************************************************************
+  Return the number of characters in the string.  from_sz is the character
+  encoding size (currently 1 or 2).
+***************************************************************************/
+static size_t char_strlen(const char *text, size_t from_sz)
+{
+  size_t length = 0;
+
+  do {
+    size_t i;
+
+    for (i = 0; i < from_sz; i++) {
+      if (text[length * from_sz + i] != 0) {
+       break;
+      }
+    }
+
+    if (i == from_sz) {
+      return length;
+    }
+
+    length++;
+  } while (TRUE);
+}
+
+/***************************************************************************
+  Convert the text.  This assumes 'from' is an 8-bit charset.  The result
+  will be put into the buf buffer unless it is NULL, in which case it
+  will be allocated on demand.
+***************************************************************************/
+static char *convert_string(const char *text,
+                           const char *from, size_t from_sz,
+                           const char *to,
+                           char *buf, size_t bufsz)
+{
+  iconv_t cd = iconv_open(to, from);
+  size_t from_len = char_strlen(text, from_sz) + from_sz, to_len;
+  bool alloc = (buf == NULL);
+
+  assert(is_init && from != NULL && to != NULL);
+  assert(text != NULL);
+
+  if (cd == (iconv_t) (-1)) {
+    freelog(LOG_ERROR,
+           _("Could not convert text from %s to %s: %s"),
+           from, to, strerror(errno));
+    /* The best we can do? */
+    if (alloc) {
+      return mystrdup(text);
+    } else {
+      my_snprintf(buf, bufsz, "%s", text);
+      return buf;
+    }
+  }
+
+  if (alloc) {
+    to_len = from_len;
+  } else {
+    to_len = bufsz;
+  }
+
+  do {
+    size_t flen = from_len, tlen = to_len, res;
+    const char *mytext = text;
+    char *myresult;
+
+    if (alloc) {
+      buf = fc_malloc(to_len);
+    }
+
+    myresult = buf;
+
+    /* Since we may do multiple translations, we may need to reset iconv
+     * in between. */
+    iconv(cd, NULL, NULL, NULL, NULL);
+
+    res = iconv(cd, (char**)&mytext, &flen, &myresult, &tlen);
+    if (res == (size_t) (-1)) {
+      if (errno != E2BIG) {
+       /* Invalid input. */
+       freelog(LOG_ERROR, _("The string '%s' is not valid in %s: %s"),
+               text, from, strerror(errno));
+       iconv_close(cd);
+       if (alloc) {
+         free(buf);
+         return mystrdup(text); /* The best we can do? */
+       } else {
+         my_snprintf(buf, bufsz, "%s", text);
+         return buf;
+       }
+      }
+    } else {
+      /* Success. */
+      iconv_close(cd);
+
+      /* There may be wasted space here, but there's nothing we can do
+       * about it. */
+      return buf;
+    }
+
+    if (alloc) {
+      /* Not enough space; try again. */
+      buf[to_len - 1] = 0;
+      freelog(LOG_NORMAL, "   Result was '%s'.", buf);
+
+      free(buf);
+      to_len *= 2;
+    }
+  } while (alloc);
+
+  return buf;
+}
+
+#endif
+
+#ifdef HAVE_ICONV
+
+#define CONV_FUNC_MALLOC(src, dst)                                          \
+char *src ## _to_ ## dst ## _string_malloc(const char *text)                \
+{                                                                           \
+  return convert_string(text, (src ## _encoding), (src ## _encoding_size),  \
+                       (dst ## _encoding), NULL, 0);                       \
+}
+
+#define CONV_FUNC_BUFFER(src, dst)                                          \
+char *src ## _to_ ## dst ## _string_buffer(const char *text,                \
+                                          char *buf, size_t bufsz)         \
+{                                                                           \
+  return convert_string(text, (src ## _encoding), (src ## _encoding_size),  \
+                        (dst ## _encoding), buf, bufsz);                    \
+}
+
+#else /* HAVE_ICONV */
+
+#define CONV_FUNC_MALLOC(src, dst)                                          \
+char *src ## _to_ ## dst ## _string_malloc(const char *text)                \
+{                                                                           \
+  return mystrdup(text);                                                    \
+}
+
+#define CONV_FUNC_BUFFER(src, dst)                                          \
+char *src ## _to_ ## dst ## _string_buffer(const char *text,                \
+                                          char *buf, size_t bufsz)         \
+{                                                                           \
+  my_snprintf(buf, bufsz, "%s", text);                                      \
+  return buf;                                                               \
+}
+
+#endif /* HAVE_ICONV */
+
+#define CONV_FUNC_STATIC(src, dst)                                          \
+char *src ## _to_ ## dst ## _string_static(const char *text)                \
+{                                                                           \
+  (src ## _to_ ## dst ## _string_buffer)(text,                              \
+                                       convert_buffer,                     \
+                                       sizeof(convert_buffer));            \
+  return convert_buffer;                                                    \
+}
+
+CONV_FUNC_MALLOC(data, display)
+CONV_FUNC_MALLOC(display, data)
+
+CONV_FUNC_STATIC(data, display)
+CONV_FUNC_STATIC(display, data)
+
+CONV_FUNC_BUFFER(data, display)
+CONV_FUNC_BUFFER(display, data)
Index: common/fciconv.h
===================================================================
RCS file: /home/freeciv/CVS/freeciv/common/fciconv.h,v
retrieving revision 1.1
diff -u -r1.1 fciconv.h
--- common/fciconv.h    26 Apr 2004 02:13:30 -0000      1.1
+++ common/fciconv.h    26 Apr 2004 03:01:15 -0000
@@ -0,0 +1,29 @@
+/********************************************************************** 
+ Freeciv - Copyright (C) 2003 - The Freeciv Project
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+***********************************************************************/
+#ifndef FC__FCICONV_H
+#define FC__FCICONV_H
+
+void init_character_encodings(char *my_display_encoding,
+                             size_t encoding_size);
+
+char *data_to_display_string_malloc(const char *text);
+char *data_to_display_string_static(const char *text);
+char *data_to_display_string_buffer(const char *text,
+                                   char *buf, size_t bufsz);
+
+char *display_to_data_string_malloc(const char *text);
+char *display_to_data_string_static(const char *text);
+char *display_to_data_string_buffer(const char *text,
+                                   char *buf, size_t bufsz);
+
+#endif /* FC__FCICONV_H */
Index: server/srv_main.c
===================================================================
RCS file: /home/freeciv/CVS/freeciv/server/srv_main.c,v
retrieving revision 1.159
diff -u -r1.159 srv_main.c
--- server/srv_main.c   25 Apr 2004 19:03:40 -0000      1.159
+++ server/srv_main.c   26 Apr 2004 03:01:16 -0000
@@ -49,6 +49,7 @@
 #include "city.h"
 #include "dataio.h"
 #include "events.h"
+#include "fciconv.h"
 #include "fcintl.h"
 #include "game.h"
 #include "log.h"
@@ -179,6 +180,9 @@
 
   /* mark as initialized */
   has_been_srv_init = TRUE;
+
+  /* init character encodings. */
+  init_character_encodings(NULL, 1);
 
   /* done */
   return;

[Prev in Thread] Current Thread [Next in Thread]
  • [Freeciv-Dev] (PR#1824) ruleset data is in incompatible charsets, Jason Short <=