warning: long, Re: Database design problem: multilingual strings - Mailing list pgsql-general
From | Karsten Hilbert |
---|---|
Subject | warning: long, Re: Database design problem: multilingual strings |
Date | |
Msg-id | 20030624195441.L9075@hermes.hilbert.loc Whole thread Raw |
In response to | Database design problem: multilingual strings (Antonios Christofides <A.Christofides@itia.ntua.gr>) |
List | pgsql-general |
Hi ! We had this problem in GnuMed (www.gnumed.org). Eventually, we decided that it is only really solvable automatically for "fixed" strings. That is, strings that are known at database creation. User supplied strings need user supplied translations as well. The translation mechanism works for them just as well but you depend on the user to supply a translation. I am attaching the solution we use in GnuMed. The schema file shows our table setup: ----------------------------------------------------------- -- ============================================= -- GnuMed fixed string internationalisation -- ======================================== -- $Source: /cvsroot/gnumed/gnumed/gnumed/server/sql/gmI18N.sql,v $ -- $Id: gmI18N.sql,v 1.14 2003/06/10 09:58:11 ncq Exp $ -- license: GPL -- author: Karsten.Hilbert@gmx.net -- ============================================= -- Import this script into any GnuMed database you create. -- This will allow for transparent translation of 'fixed' -- strings in the database. Simply switching the language in -- i18n_curr_lang will enable the user to see another language. -- For details please see the Developer's Guide. -- ============================================= -- force terminate + exit(3) on errors if non-interactive \set ON_ERROR_STOP 1 -- ============================================= create table i18n_curr_lang ( id serial primary key, owner name default CURRENT_USER unique not null, lang varchar(15) not null ); comment on table i18n_curr_lang is 'holds the currently selected language per user for fixed strings in the database'; -- ============================================= create table i18n_keys ( id serial primary key, orig text unique ); comment on table i18n_keys is 'this table holds all the original strings that need translation so give this to your language teams, the function i18n() will take care to enter relevant strings into this table, the table table does NOT play any role in runtime translation activity'; -- ============================================= create table i18n_translations ( id serial primary key, lang varchar(10), orig text, trans text, unique (lang, orig) ); create index idx_orig on i18n_translations(orig); -- ============================================= create function i18n(text) returns text as ' DECLARE original ALIAS FOR $1; BEGIN if not exists(select id from i18n_keys where orig = original) then insert into i18n_keys (orig) values (original); end if; return original; END; ' language 'plpgsql'; comment on function i18n(text) is 'insert original strings into i18n_keys for later translation'; -- ============================================= create function _(text) returns text as ' DECLARE orig_str ALIAS FOR $1; trans_str text; my_lang varchar(10); BEGIN -- no translation available at all ? if not exists(select orig from i18n_translations where orig = orig_str) then return orig_str; end if; -- get language select into my_lang lang from i18n_curr_lang where owner = CURRENT_USER; if not found then return orig_str; end if; -- get translation select into trans_str trans from i18n_translations where lang = my_lang and orig = orig_str; if not found then return orig_str; end if; return trans_str; END; ' language 'plpgsql'; comment on function _(text) is 'will return either the input or the translation if it exists'; -- ============================================= create function set_curr_lang(text) returns unknown as ' DECLARE language ALIAS FOR $1; BEGIN if exists(select id from i18n_translations where lang = language) then delete from i18n_curr_lang where owner = CURRENT_USER; insert into i18n_curr_lang (lang) values (language); delete from i18n_curr_lang where owner = (select trim(leading ''_'' from CURRENT_USER)); insert into i18n_curr_lang (lang, owner) values (language, (select trim(leading ''_'' from CURRENT_USER))); return 1; else raise exception ''Cannot set current language to [%]. No translations available.'', language; return NULL; end if; return NULL; END; ' language 'plpgsql'; comment on function set_curr_lang(text) is 'set preferred language: - for "current user" and "_current_user" - only if translations for this language are available'; -- ============================================= create function set_curr_lang(text, name) returns unknown as ' DECLARE language ALIAS FOR $1; username ALIAS FOR $2; BEGIN if exists(select id from i18n_translations where lang = language) then delete from i18n_curr_lang where owner = username; insert into i18n_curr_lang (owner, lang) values (username, language); return 1; else raise exception ''Cannot set current language to [%]. No translations available.'', language; return NULL; end if; return NULL; END; ' language 'plpgsql'; comment on function set_curr_lang(text, name) is 'set language to first argument for the user named in the second argument if translations are available'; -- ============================================= -- there's most likely no harm in granting select to all GRANT SELECT on i18n_curr_lang, i18n_keys, i18n_translations TO group "gm-public"; -- users need to be able to change this -- FIXME: more groups need to have access here GRANT SELECT, INSERT, UPDATE, DELETE on i18n_curr_lang, i18n_curr_lang_id_seq TO group "_gm-doctors"; -- ============================================= -- do simple schema revision tracking INSERT INTO gm_schema_revision (filename, version) VALUES('$RCSfile: gmI18N.sql,v $', '$Revision: 1.14 $'); ----------------------------------------------------------- Then, there's the relevant part from our developer's guide: ----------------------------------------------------------- GNUMed: Prev Chapter 3. Coding Guidelines Next ------------------------------------------------------------------------------------------------------------------------------- 3.7. Backend I18N for non-dynamic ("fixed") strings in the backend. 3.7.1. Introduction In enumerations we often see fixed strings being stored in the backend. There's no good way a client can translate thoseto the local language. Nevertheless we need to provide a translation. Consider the following example: We want a table that enumerates family relations. The obvious table design would be +-----------------------------------------------------------------------------------------------------------------------------+ |create table member ( | | id serial primary key, | | name varchar(20) | |); | | | +-----------------------------------------------------------------------------------------------------------------------------+ Other tables will obviously reference table.id but we want the frontend to be able to show a spelled-out name for the family member type. A simple +-----------------------------------------------------------------------------------------------------------------------------+ | select name from member where id='some ID'; | | | +-----------------------------------------------------------------------------------------------------------------------------+ will, however, always return the version that was put into the database in upon installation. Typically this would be doneby statements such as +-----------------------------------------------------------------------------------------------------------------------------+ | insert into member(name) values('sister'); | | | +-----------------------------------------------------------------------------------------------------------------------------+ Hence, queries would always return the English 'sister'. PostgreSQL does not directly support localization of database content. Therefor the following scheme has been devised: At the top of your psql script schema definition files include the file gnumed/server/gmI18N.sql which provides a localization infrastructure. For your convenience, just copy/paste the following two lines: +-----------------------------------------------------------------------------------------------------------------------------+ |-- do fixed string i18n()ing | |\i gmI18N.sql | | | +-----------------------------------------------------------------------------------------------------------------------------+ The database will then contain several new tables starting with i18n_* and a few functions. 3.7.1.1. i18n_curr_lang Here you can/should set the currently preferred language on a per-user basis. Only one language per user is allowed at anyone time. Switching the language here will enable the user to see another translation (if provided). 3.7.1.2. i18n_keys This is just a convenience table listing all the strings that need translations. Dump this and give to translation teams.A tool will be provided to make use of this table. It is of no importance to the actual online translation process. 3.7.1.3. i18n_translations This is where translations actually live. As in gettext the original string is used as the key and the language code (which should correspond with those used in i18n_curr_lang) as a discrimator. 3.7.2. How to translate strings Make your string insertions aware of i18n issues. This is what the function i18n(text) is for. Regarding the above example insertions need to be rewritten from +-----------------------------------------------------------------------------------------------------------------------------+ | insert into member(name) values('sister'); | | | +-----------------------------------------------------------------------------------------------------------------------------+ to +-----------------------------------------------------------------------------------------------------------------------------+ | | | insert into member(name) values(i18n('sister')); | | | +-----------------------------------------------------------------------------------------------------------------------------+ The i18n() function will take care of inserting the string 'sister' into the i18n_keys table where translation teams willfind it and provide a translation. Later on, when a translation is available it will be inserted into i18n_translations: +-----------------------------------------------------------------------------------------------------------------------------+ | insert into i18n_translations(lang, orig, trans) values ('de_DE', 'sister', 'Schwester'); | | | +-----------------------------------------------------------------------------------------------------------------------------+ 3.7.3. How to make your tables translate strings Now that we have translations available in i18n_translations we can start making our tables aware of them. Unfortunately, PostgreSQL does not yet support column-level select rules. We therefor have to create views wrapping the original tables.Note that the original table will still be useable. Original tables which have translated strings should be named "_tablename"while views translating them should be named "v_i18n_tablename". Going back to our previous example, the table +-----------------------------------------------------------------------------------------------------------------------------+ |create table member ( | | id serial primary key, | | name varchar(20) | |); | | | +-----------------------------------------------------------------------------------------------------------------------------+ should be renamed to "_member" and a view created on it: +-----------------------------------------------------------------------------------------------------------------------------+ | | |create view v_i18n_member (id, name) as | | select _member.id, _(_member.name) | | from member; | | | +-----------------------------------------------------------------------------------------------------------------------------+ By making sure to use the same column names in the view we minimize frontend coding changes. You will notice how the function _() is used to access the translation for the attribute "name". This function is providedby gmI18N.sql and provides nearly the same functionality as gettext.gettext() which is often aliased to _() in Python and other languages. It will return a translation based on the user's currently selected language in i18n_curr_lang and the translation for that language in i18n_translations using the original string as the key. If no translation is available for a given string _() will return the original string. Also, if the user did not select a language in i18n_curr_lang the original is returned. 3.7.4. How to make the frontend use translated strings All the backend infrastructure is in place now so we can make frontends aware of translated strings. The first step is tomake frontends use the v_i18n_* views instead of the tables. If we fail to do that everything will still work. We just won't get translations :-) The second step is to make sure the current user has a language selected in i18n_curr_lang. Use something like +-----------------------------------------------------------------------------------------------------------------------------+ |insert into i18n_curr_lang(lang) values ('de_DE'); | | | +-----------------------------------------------------------------------------------------------------------------------------+ This will default to the CURRENT_USER. The actual value need not conform to anything in particular. It can be "Klingon" for that matter. Make sure then to have "Klingon" translations available in i18n_translations. This i18n technique does not take care of strings that are inserted into the database dynamically (at runtime). It only makes sense for strings that are inserted once. Such strings are often used for enumerations. All this crap isn't necessary anymore once PostgreSQL supports native internationalization of 'fixed' strings. ------------------------------------------------------------------------------------------------------------------------------- Prev Home Next Client Internationalization / Up Interacting with the Backend Localization ----------------------------------------------------------- There are known drawbacks but this is what we currently use. Hope that helps ! Karsten Hilbert, MD -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
pgsql-general by date: