Thread: Database design problem: multilingual strings

Database design problem: multilingual strings

From
Antonios Christofides
Date:
Hi,

I'm designing a database with a web interface, which will be
accessed by international users. The French may be requesting/entering
information in French, the Greeks in Greek, and the Japanese in
Japanese. I want every string in the database to be multilingual.
Let's use a hypothetical example:

  simple lookup table cutlery_types:

  id   description
  ----------------
  1    Spoon
  2    Fork
  3    Knife
  4    Teaspoon

'description' is no longer enough; it must be possible to add
translations to _any_ language and to any number of languages.
I've thought of a number of solutions, but none satisfies me to the
point that I'd feel ready to die :-) I'd much appreciate
comments/experience from anyone. I include the solutions I've thought
of below, but you don't need to read them if you have a good
pointer in hand.

Thanks a lot!




Solution 1
----------
  table cutlery_types_description_translations
  id   language  translation
  --------------------------
  1      fr      Cuilliere
  1      el      Koutali
  2      fr      Forchette
  2      es      Tenedor
        (or language can be id fk to languages table)

Clean solution, but... an additional table for each string in the
database?! The 50 tables will quickly become 300 :-(


Solution 2
----------

  translations
  id      language  translation
  -----------------------------
  Spoon      fr     Cuilliere
  Spoon      el     Koutali
  Fork       fr     Forchette
  Fork       es     Tenedor

Not possible, because it uses the English version of the string as an
id. What if the English version is a 300-word essay? What if the
English version changes? What if no English version exists for that
particular string?


Solution 3
----------

  cutlery_types
  id     description
  ------------------
  1      { "Spoon", "Cuilliere", "", "Koutali" }
  2      { "Fork",  "Forchette", "Tenedor", "" }

Where, obviously, a languages table tells that 1 is English, 2 is
French, 3 is Spanish and 4 is Greek. One of the problems with this
solution is that if I want to add a translation for language 45, I
need to insert an empty string for the previous 44 languages.


Solution 4
----------

  cutlery_types
  id      description
  -------------------
  1       Some way to represent a hash: 'en' => 'Spoon', 'fr' => 'Cuilliere' etc.
  2       'en' => 'Fork', 'fr' => 'Forchette', 'es' => 'Tenedor'

The description could be, for example, a TEXT containing all
translations separated by some kind of separator, or an array whose
odd elements may be the hash keys and the even elements the
translations. In any case,
  SELECT id, getstring(description, 'el') FROM cutlery_types
would use the user-defined function getstring to retrieve the needed
translation. Far from certain on how efficient it can be done.

Re: Database design problem: multilingual strings

From
Russ Brown
Date:
You might want to try abstracting the languages further, and have a central
store of strings for everything.

E.g. something like:

languages
id   handle   ------------
1    en
2    fr
3    es

strings
id   handle
-------------
1    FORK
2    SPOON
3    LARGE_ESSAY

string_translations
id    string_id     language_id     translation
--------------------------------------------------
1     1             1               Fork
2     1             2               Forchette
3     2             1               Spoon
4     2             2               Cuilliere
5     3             1               This is a large essay in Engligh. Same
would apply in any other language.

cuttlery_types
id    description_string_id
-----------------------------
1     1
2     2


Then you can use a query like this to get an array of cuttlery types in any
language:

SELECT cuttlery_types.id FROM cuttlery_types,
       string_translations,
       languages
 WHERE cuttlery_types.description_string_id=string_translations.string_id
   AND string_translations.language_id=languages.language_id
   AND languages.handle='fr';

Just swap the 'fr' for 'es' or whatever for a different language. Use the
same central store for all other text fields in the database that needs to
be language-independant. If you want to allow for missing values just use a
LEFT JOIN or similar. You could also write a fairly simple query to give
you a list of missing translations, which could be handy.

HTH,

Russ.



warning: long, Re: Database design problem: multilingual strings

From
Karsten Hilbert
Date:
Hi !

We had this problem in GnuMed (www.gnumed.org). Eventually, we
decided that it is only really solvable automatically for "fixed"
strings. That is, strings that are known at database creation.
User supplied strings need user supplied translations as well.
The translation mechanism works for them just as well but you
depend on the user to supply a translation.

I am attaching the solution we use in GnuMed. The schema file
shows our table setup:

-----------------------------------------------------------
-- =============================================
-- GnuMed fixed string internationalisation
-- ========================================
-- $Source: /cvsroot/gnumed/gnumed/gnumed/server/sql/gmI18N.sql,v $
-- $Id: gmI18N.sql,v 1.14 2003/06/10 09:58:11 ncq Exp $
-- license: GPL
-- author: Karsten.Hilbert@gmx.net
-- =============================================
-- Import this script into any GnuMed database you create.

-- This will allow for transparent translation of 'fixed'
-- strings in the database. Simply switching the language in
-- i18n_curr_lang will enable the user to see another language.

-- For details please see the Developer's Guide.
-- =============================================
-- force terminate + exit(3) on errors if non-interactive
\set ON_ERROR_STOP 1
-- =============================================

create table i18n_curr_lang (
    id serial primary key,
    owner name default CURRENT_USER unique not null,
    lang varchar(15) not null
);

comment on table i18n_curr_lang is
    'holds the currently selected language per user for fixed strings in the database';

-- =============================================
create table i18n_keys (
    id serial primary key,
    orig text unique
);

comment on table i18n_keys is
    'this table holds all the original strings that need translation so give this to your language teams,
    the function i18n() will take care to enter relevant strings into this table,
    the table table does NOT play any role in runtime translation activity';

-- =============================================
create table i18n_translations (
    id serial primary key,
    lang varchar(10),
    orig text,
    trans text,
    unique (lang, orig)
);
create index idx_orig on i18n_translations(orig);

-- =============================================
create function i18n(text) returns text as '
DECLARE
    original ALIAS FOR $1;
BEGIN
    if not exists(select id from i18n_keys where orig = original) then
        insert into i18n_keys (orig) values (original);
    end if;
    return original;
END;
' language 'plpgsql';

comment on function i18n(text) is
    'insert original strings into i18n_keys for later translation';

-- =============================================
create function _(text) returns text as '
DECLARE
    orig_str ALIAS FOR $1;
    trans_str text;
    my_lang varchar(10);
BEGIN
    -- no translation available at all ?
    if not exists(select orig from i18n_translations where orig = orig_str) then
        return orig_str;
    end if;

    -- get language
    select into my_lang lang
        from i18n_curr_lang
    where
        owner = CURRENT_USER;
    if not found then
        return orig_str;
    end if;

    -- get translation
    select into trans_str trans
        from i18n_translations
    where
        lang = my_lang
            and
        orig = orig_str;
    if not found then
        return orig_str;
    end if;
    return trans_str;
END;
' language 'plpgsql';

comment on function _(text) is
    'will return either the input or the translation if it exists';

-- =============================================
create function set_curr_lang(text) returns unknown as '
DECLARE
    language ALIAS FOR $1;
BEGIN
    if exists(select id from i18n_translations where lang = language) then
        delete from i18n_curr_lang where owner = CURRENT_USER;
        insert into i18n_curr_lang (lang) values (language);

        delete from i18n_curr_lang where owner = (select trim(leading ''_'' from CURRENT_USER));
        insert into i18n_curr_lang (lang, owner) values (language, (select trim(leading ''_'' from CURRENT_USER)));

        return 1;
    else
        raise exception ''Cannot set current language to [%]. No translations available.'', language;
        return NULL;
    end if;
    return NULL;
END;
' language 'plpgsql';

comment on function set_curr_lang(text) is
    'set preferred language:
     - for "current user" and "_current_user"
     - only if translations for this language are available';

-- =============================================
create function set_curr_lang(text, name) returns unknown as '
DECLARE
    language ALIAS FOR $1;
    username ALIAS FOR $2;
BEGIN
    if exists(select id from i18n_translations where lang = language) then
        delete from i18n_curr_lang where owner = username;
        insert into i18n_curr_lang (owner, lang) values (username, language);
        return 1;
    else
        raise exception ''Cannot set current language to [%]. No translations available.'', language;
        return NULL;
    end if;
    return NULL;
END;
' language 'plpgsql';

comment on function set_curr_lang(text, name) is
    'set language to first argument for the user named in
     the second argument if translations are available';

-- =============================================
-- there's most likely no harm in granting select to all
GRANT SELECT on
    i18n_curr_lang,
    i18n_keys,
    i18n_translations
TO group "gm-public";

-- users need to be able to change this
-- FIXME: more groups need to have access here
GRANT SELECT, INSERT, UPDATE, DELETE on
    i18n_curr_lang,
    i18n_curr_lang_id_seq
TO group "_gm-doctors";

-- =============================================
-- do simple schema revision tracking
INSERT INTO gm_schema_revision (filename, version) VALUES('$RCSfile: gmI18N.sql,v $', '$Revision: 1.14 $');

-----------------------------------------------------------

  Then, there's the relevant part from our developer's guide:

-----------------------------------------------------------

                                                           GNUMed:
      
Prev                                             Chapter 3. Coding Guidelines
 Next 

-------------------------------------------------------------------------------------------------------------------------------

3.7. Backend I18N for non-dynamic ("fixed") strings in the backend.

3.7.1. Introduction

In enumerations we often see fixed strings being stored in the backend. There's no good way a client can translate
thoseto the 
local language. Nevertheless we need to provide a translation. Consider the following example:

We want a table that enumerates family relations. The obvious table design would be


+-----------------------------------------------------------------------------------------------------------------------------+
|create table member (
     | 
|  id serial primary key,
     | 
|  name varchar(20)
     | 
|);
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

Other tables will obviously reference table.id but we want the frontend to be able to show a spelled-out name for the
family
member type. A simple


+-----------------------------------------------------------------------------------------------------------------------------+
| select name from member where id='some ID';
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

will, however, always return the version that was put into the database in upon installation. Typically this would be
doneby 
statements such as


+-----------------------------------------------------------------------------------------------------------------------------+
| insert into member(name) values('sister');
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

Hence, queries would always return the English 'sister'.

PostgreSQL does not directly support localization of database content. Therefor the following scheme has been devised:

At the top of your psql script schema definition files include the file gnumed/server/gmI18N.sql which provides a
localization
infrastructure. For your convenience, just copy/paste the following two lines:


+-----------------------------------------------------------------------------------------------------------------------------+
|-- do fixed string i18n()ing
     | 
|\i gmI18N.sql
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

The database will then contain several new tables starting with i18n_* and a few functions.

3.7.1.1. i18n_curr_lang

Here you can/should set the currently preferred language on a per-user basis. Only one language per user is allowed at
anyone 
time. Switching the language here will enable the user to see another translation (if provided).

3.7.1.2. i18n_keys

This is just a convenience table listing all the strings that need translations. Dump this and give to translation
teams.A 
tool will be provided to make use of this table. It is of no importance to the actual online translation process.

3.7.1.3. i18n_translations

This is where translations actually live. As in gettext the original string is used as the key and the language code
(which
should correspond with those used in i18n_curr_lang) as a discrimator.

3.7.2. How to translate strings

Make your string insertions aware of i18n issues. This is what the function i18n(text) is for. Regarding the above
example
insertions need to be rewritten from


+-----------------------------------------------------------------------------------------------------------------------------+
| insert into member(name) values('sister');
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

to


+-----------------------------------------------------------------------------------------------------------------------------+
|
     | 
| insert into member(name) values(i18n('sister'));
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

The i18n() function will take care of inserting the string 'sister' into the i18n_keys table where translation teams
willfind 
it and provide a translation. Later on, when a translation is available it will be inserted into i18n_translations:


+-----------------------------------------------------------------------------------------------------------------------------+
| insert into i18n_translations(lang, orig, trans) values ('de_DE', 'sister', 'Schwester');
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

3.7.3. How to make your tables translate strings

Now that we have translations available in i18n_translations we can start making our tables aware of them.
Unfortunately,
PostgreSQL does not yet support column-level select rules. We therefor have to create views wrapping the original
tables.Note 
that the original table will still be useable. Original tables which have translated strings should be named
"_tablename"while 
views translating them should be named "v_i18n_tablename". Going back to our previous example, the table


+-----------------------------------------------------------------------------------------------------------------------------+
|create table member (
     | 
|  id serial primary key,
     | 
|  name varchar(20)
     | 
|);
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

should be renamed to "_member" and a view created on it:


+-----------------------------------------------------------------------------------------------------------------------------+
|
     | 
|create view v_i18n_member (id, name) as
     | 
|  select _member.id, _(_member.name)
     | 
|  from member;
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

By making sure to use the same column names in the view we minimize frontend coding changes.

You will notice how the function _() is used to access the translation for the attribute "name". This function is
providedby 
gmI18N.sql and provides nearly the same functionality as gettext.gettext() which is often aliased to _() in Python and
other
languages. It will return a translation based on the user's currently selected language in i18n_curr_lang and the
translation
for that language in i18n_translations using the original string as the key.

If no translation is available for a given string _() will return the original string. Also, if the user did not select
a
language in i18n_curr_lang the original is returned.

3.7.4. How to make the frontend use translated strings

All the backend infrastructure is in place now so we can make frontends aware of translated strings. The first step is
tomake 
frontends use the v_i18n_* views instead of the tables. If we fail to do that everything will still work. We just won't
get
translations :-)

The second step is to make sure the current user has a language selected in i18n_curr_lang. Use something like


+-----------------------------------------------------------------------------------------------------------------------------+
|insert into i18n_curr_lang(lang) values ('de_DE');
     | 
|
     | 

+-----------------------------------------------------------------------------------------------------------------------------+

This will default to the CURRENT_USER. The actual value need not conform to anything in particular. It can be "Klingon"
for
that matter. Make sure then to have "Klingon" translations available in i18n_translations.

This i18n technique does not take care of strings that are inserted into the database dynamically (at runtime). It only
makes
sense for strings that are inserted once. Such strings are often used for enumerations.

All this crap isn't necessary anymore once PostgreSQL supports native internationalization of 'fixed' strings.


-------------------------------------------------------------------------------------------------------------------------------
Prev                                                         Home
  Next 
Client Internationalization /                                 Up                                   Interacting with the
Backend
Localization
       

-----------------------------------------------------------

There are known drawbacks but this is what we currently use.
Hope that helps !

Karsten Hilbert, MD
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: Database design problem: multilingual strings

From
Ernest E Vogelsinger
Date:
At 19:15 24.06.2003, Antonios Christofides said:
--------------------[snip]--------------------
>'description' is no longer enough; it must be possible to add
>translations to _any_ language and to any number of languages.
>I've thought of a number of solutions, but none satisfies me to the
>point that I'd feel ready to die :-) I'd much appreciate
>comments/experience from anyone. I include the solutions I've thought
>of below, but you don't need to read them if you have a good
>pointer in hand.
--------------------[snip]--------------------

Taking off from this table:

table cutlery_types
id description
----------------
1 Spoon
2 Fork
3 Knife
4 Teaspoon

you might use a table set like this:

table lg_dependent
oid_table | column | id_row | language | text
---------------------------------------------------
######### | desc   |      1 | en       | Spoon
######### | desc   |      1 | ger      | Löffel
######### | desc   |      1 | fr      | Cuilliere
######### | desc   |      1 | el      | Koutali

Use a select statement like this:

select t1.id, t2.text /*, etc */
from cutlery_description t1
join lg_dependent t2 on t2.oid_table = (select oid from pg_class where
relname='cutlery_types') and t2.column='desc' and t2.language='en'

It might be better to _not_ use the table oid to be more portable across
databases, or pg_dumps without OID. You might either use the table name
then, or have another table mapping table names to unique numbers.






--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



Re: Database design problem: multilingual strings

From
Dennis Gearon
Date:
In looking at your ideas, a thought came to mind.
This issue is something I've been looking at neediing to address, so any comments are welcome.

Whatever the first entry of a string entered becomes the reference string. You could add a field for that.The design
belowneeds a unique index on: 

    Translations( string_id, lang_id );
    Translations( string_id, lang_id, lang_string );
    Languages ( lang_name_full_eng );
    Languages (iso_latin_abbrev );

CREATE TABLE StringIDs(
    string_id serial NOT NULL PRIMARY KEY
);

CREATE TABLE Languages(
    lang_id serial NOT NULL PRIMARY KEY,
    lang_name_full_eng varchar(30) NOT NULL,
    iso_latin_abbrev varchar(2) NOT NULL,
);

CREATE TABLE Translations(
    string_id INT4 NOT NULL,
    lang_id INT4 NOT NULL,
    lang_string BYTEA NOT NULL
);

ALTER TABLE Translations
    ADD CONSTRAINT FK_translations_string_id
    FOREIGN KEY (string_id)
    REFERENCES StringIDs (string_id);

ALTER TABLE Translations
    ADD CONSTRAINT FK_translations_lang_id
    FOREIGN KEY (lang_id)
    REFERENCES Languages (lang_id);

The design above needs a unique index on:

    Translations( string_id, lang_id );
    Translations( string_id, lang_id, lang_string );
    Languages ( lang_name_full_eng );
    Languages (iso_latin_abbrev );

Antonios Christofides wrote:

> Hi,
>
> I'm designing a database with a web interface, which will be
> accessed by international users. The French may be requesting/entering
> information in French, the Greeks in Greek, and the Japanese in
> Japanese. I want every string in the database to be multilingual.
> Let's use a hypothetical example:
>
>   simple lookup table cutlery_types:
>
>   id   description
>   ----------------
>   1    Spoon
>   2    Fork
>   3    Knife
>   4    Teaspoon
>
> 'description' is no longer enough; it must be possible to add
> translations to _any_ language and to any number of languages.
> I've thought of a number of solutions, but none satisfies me to the
> point that I'd feel ready to die :-) I'd much appreciate
> comments/experience from anyone. I include the solutions I've thought
> of below, but you don't need to read them if you have a good
> pointer in hand.
>
> Thanks a lot!
>
>
>
>
> Solution 1
> ----------
>   table cutlery_types_description_translations
>   id   language  translation
>   --------------------------
>   1      fr      Cuilliere
>   1      el      Koutali
>   2      fr      Forchette
>   2      es      Tenedor
>         (or language can be id fk to languages table)
>
> Clean solution, but... an additional table for each string in the
> database?! The 50 tables will quickly become 300 :-(
>
>
> Solution 2
> ----------
>
>   translations
>   id      language  translation
>   -----------------------------
>   Spoon      fr     Cuilliere
>   Spoon      el     Koutali
>   Fork       fr     Forchette
>   Fork       es     Tenedor
>
> Not possible, because it uses the English version of the string as an
> id. What if the English version is a 300-word essay? What if the
> English version changes? What if no English version exists for that
> particular string?
>
>
> Solution 3
> ----------
>
>   cutlery_types
>   id     description
>   ------------------
>   1      { "Spoon", "Cuilliere", "", "Koutali" }
>   2      { "Fork",  "Forchette", "Tenedor", "" }
>
> Where, obviously, a languages table tells that 1 is English, 2 is
> French, 3 is Spanish and 4 is Greek. One of the problems with this
> solution is that if I want to add a translation for language 45, I
> need to insert an empty string for the previous 44 languages.
>
>
> Solution 4
> ----------
>
>   cutlery_types
>   id      description
>   -------------------
>   1       Some way to represent a hash: 'en' => 'Spoon', 'fr' => 'Cuilliere' etc.
>   2       'en' => 'Fork', 'fr' => 'Forchette', 'es' => 'Tenedor'
>
> The description could be, for example, a TEXT containing all
> translations separated by some kind of separator, or an array whose
> odd elements may be the hash keys and the even elements the
> translations. In any case,
>   SELECT id, getstring(description, 'el') FROM cutlery_types
> would use the user-defined function getstring to retrieve the needed
> translation. Far from certain on how efficient it can be done.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>