advice on schema for multilingual text - Mailing list pgsql-general

From Daniel McBrearty
Subject advice on schema for multilingual text
Date
Msg-id 9cf113670604080831s706f3e34y5577183a612218db@mail.gmail.com
Whole thread Raw
Responses Re: advice on schema for multilingual text  (Michael Glaesemann <grzm@myrealbox.com>)
Re: advice on schema for multilingual text  (Tino Wildenhain <tino@wildenhain.de>)
List pgsql-general
Hi

I have a website that has multilingual text stored in the database. Currently I just have a flat table (lets called it "translations"), one row per text item, one column per language. This works OK, for now, but I am looking at a redesign. Mostly I want to keep information about the languages in teh db as well, so that look like an extra table, one row per lang.

The problem that now arises is that there is an expected correlation between the "languages" and "translations" tables - there should be a row in languages for each column of translations. AFAIK (could well be wrong, I am no expert in db theory) there is no real way to express in the ddl. Not ideal.

An alternative layout would now be to lose the "translations" table, and have two tables in place; one called "base_text" containing the text to be translated, and another called, say, "tx_text" which contains the translations. Each row of "tx_text" references both "base_text" and also "languages".

This looks like a nice layout, as there is an abstract rep of the languages, and we lose the "translations" table which can get very wide. It's nice that the schema doesn't actually change to add a new language.

BUT there are certain invariants that need to be enforced. The main one is this:

There must only be one row in "site_text" for any given language referencing a given row of "base_text".

How can I enforce this? Also have to bear in mind that there COULD easily be two translators working on the same language. It is imperative that they are not able to simultaneously save a translation of the same base text.

I would also like to have a column in "languages", type boolean, called "is_base" - this says what the base language is. Here, only ONE row can have a true value. (Obviously it has default value of false and is not null).

Another invariant now comes in - the language referenced by every row of "site_text" MUST have "is_base" set to FALSE.

If anyone can tell me how best to express to handle this stuff in postgresql, I'd be grateful. Also general comments on whether this is a good schema or not are welcome.

regards, and thanks in advance

Daniel



--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131

pgsql-general by date:

Previous
From: Michael Fuhr
Date:
Subject: Re: Expression matching related question
Next
From: Tom Lane
Date:
Subject: Re: Expression matching related question