Thread: advice on setting up schema sought
Hi I have a website that has multilingual text stored in the database. Currently I just have a flat table (lets called it "translations"), one row per text item, one column per language. This works OK, for now, but I am looking at a redesign. Mostly I want to keep information about the languages in teh db as well, so that look like an extra table, one row per lang. The problem that now arises is that there is an expected correlation between the "languages" and "translations" tables - there should be a row in languages for each column of translations. AFAIK (could well be wrong, I am no expert in db theory) there is no real way to express in the ddl. Not ideal. An alternative layout would now be to lose the "translations" table, and have two tables in place; one called "base_text" containing the text to be translated, and another called, say, "tx_text" which contains the translations. Each row of "tx_text" references both "base_text" and also "languages". This looks like a nice layout, as there is an abstract rep of the languages, and we lose the "translations" table which can get very wide. It's nice that the schema doesn't actually change to add a new language. BUT there are certain invariants that need to be enforced. The main one is this: There must only be one row in "site_text" for any given language referencing a given row of "base_text". How can I enforce this? Also have to bear in mind that there COULD easily be two translators working on the same language. It is imperative that they are not able to simultaneously save a translation of the same base text. I would also like to have a column in "languages", type boolean, called "is_base" - this says what the base language is. Here, only ONE row can have a true value. (Obviously it has default value of false and is not null). Another invariant now comes in - the language referenced by every row of "site_text" MUST have "is_base" set to FALSE. If anyone can tell me how best to express to handle this stuff in postgresql, I'd be grateful. Also general comments on whether this is a good schema or not are welcome. regards, and thanks in advance Daniel
Hi (see below), > Hi > > I have a website that has multilingual text stored > in the database. > Currently I just have a flat table (lets called it > "translations"), one > row per text item, one column per language. This > works OK, for now, but > I am looking at a redesign. Mostly I want to keep > information about the > languages in teh db as well, so that look like an > extra table, one row > per lang. > > The problem that now arises is that there is an > expected correlation > between the "languages" and "translations" tables - > there should be a > row in languages for each column of translations. > AFAIK (could well be > wrong, I am no expert in db theory) there is no real > way to express in > the ddl. Not ideal. > > An alternative layout would now be to lose the > "translations" table, > and have two tables in place; one called "base_text" > containing the > text to be translated, and another called, say, > "tx_text" which > contains the translations. Each row of "tx_text" > references both > "base_text" and also "languages". > > This looks like a nice layout, as there is an > abstract rep of the > languages, and we lose the "translations" table > which can get very > wide. It's nice that the schema doesn't actually > change to add a new > language. > > BUT there are certain invariants that need to be > enforced. The main one > is this: > > There must only be one row in "site_text" for any > given language > referencing a given row of "base_text". > > How can I enforce this? Also have to bear in mind > that there COULD > easily be two translators working on the same > language. It is > imperative that they are not able to simultaneously > save a translation > of the same base text. > > I would also like to have a column in "languages", > type boolean, called > "is_base" - this says what the base language is. > Here, only ONE row can > have a true value. (Obviously it has default value > of false and is not > null). > > Another invariant now comes in - the language > referenced by every row > of "site_text" MUST have "is_base" set to FALSE. > > If anyone can tell me how best to express to handle > this stuff in > postgresql, I'd be grateful. Also general comments > on whether this is a > good schema or not are welcome. > > regards, and thanks in advance > > Daniel Daniel, i'm not sure i totally understand your request. from what i could make of it, though, would this workd? t_text text_id (pkey) language_id (fkey) text t_language language_id (pkey) language the foreign key of the t_text table would reference the the pkey (primary key) of the t_language. every row of multi-lingual text would have an associated language. is this a solution that can help you or did i misunderstand your question? ps - you lost me at site_text and base_text. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Mon, Apr 10, 2006 at 22:30:47 -0700, operationsengineer1@yahoo.com wrote: > > Daniel, i'm not sure i totally understand your > request. from what i could make of it, though, would > this workd? > > t_text > text_id (pkey) > language_id (fkey) > text > > t_language > language_id (pkey) > language > > the foreign key of the t_text table would reference > the the pkey (primary key) of the t_language. In the translation table, you need to make the primary key a composite of text_id and language_id so that only one unique combination of the two is allowed.
--- Bruno Wolff III <bruno@wolff.to> wrote: > On Mon, Apr 10, 2006 at 22:30:47 -0700, > operationsengineer1@yahoo.com wrote: > > > > Daniel, i'm not sure i totally understand your > > request. from what i could make of it, though, > would > > this workd? > > > > t_text > > text_id (pkey) > > language_id (fkey) > > text > > > > t_language > > language_id (pkey) > > language > > > > the foreign key of the t_text table would > reference > > the the pkey (primary key) of the t_language. > > In the translation table, you need to make the > primary key a composite of > text_id and language_id so that only one unique > combination of the two is > allowed. something like so... t_base_text text_id (pkey) language_id (fkey) base_text (text) t_language language_id (pkey) language (text) t_translations translations_id (pkey - composite of text_id and language_id with a marker such as "_" inbetween. otherwise 1 and 11 ("111") can be confused with 11 and 1 ("111"). unless someone has a better idea (which they likely do). translation (text) would this be an option? Bruno, do you even need a translations table? couldn't you just use... t_text text_id (pkey) is_base (bool) text (text) if the base text is all in one language, then you wouldn't even need the boolean (as long as that rule sticks, anyway). is there a reason one setup is more compelling than another? tia... __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Apr 11, 2006 at 12:07:54 -0700, operationsengineer1@yahoo.com wrote: > > > t_translations > translations_id (pkey - composite of text_id and > language_id with a marker such as "_" inbetween. > otherwise 1 and 11 ("111") can be confused with 11 and > 1 ("111"). unless someone has a better idea (which > they likely do). > translation (text) > > would this be an option? No, you want to use the primary key declaration on the table, not the column. Done that way you can specify more than one column as forming the primary key. > Bruno, do you even need a translations table? > couldn't you just use... > > t_text > text_id (pkey) > is_base (bool) > text (text) > > if the base text is all in one language, then you > wouldn't even need the boolean (as long as that rule > sticks, anyway). This wouldn't solve the primary key issue, but it wouldn't unreasonable to combine the primary text and translations together especially if the primary text was always in a specific language.
> > t_translations > > translations_id (pkey - composite of text_id and > > language_id with a marker such as "_" inbetween. > > otherwise 1 and 11 ("111") can be confused with 11 > and > > 1 ("111"). unless someone has a better idea > (which > > they likely do). > > translation (text) > > > > would this be an option? > > No, you want to use the primary key declaration on > the table, not the > column. Done that way you can specify more than one > column as forming > the primary key. like so? t_translations language_id (used for table level pkey) base_text_id (used for table level pkey) translation (text) i haven't worked with table level pkeys before. next time i get pgadmin3 fired up, i'll look into it. thanks, oe1 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On Tue, Apr 11, 2006 at 14:01:08 -0700, operationsengineer1@yahoo.com wrote: > like so? > > t_translations > language_id (used for table level pkey) > base_text_id (used for table level pkey) > translation (text) > > i haven't worked with table level pkeys before. next > time i get pgadmin3 fired up, i'll look into it. Here is an example of a table constraint right from the documentation: CREATE TABLE films ( code char(5), title varchar(40), did integer, date_prod date, kind varchar(10), len interval hour to minute, CONSTRAINT code_title PRIMARY KEY(code,title) );
--- Bruno Wolff III <bruno@wolff.to> wrote: > On Tue, Apr 11, 2006 at 14:01:08 -0700, > operationsengineer1@yahoo.com wrote: > > like so? > > > > t_translations > > language_id (used for table level pkey) > > base_text_id (used for table level pkey) > > translation (text) > > > > i haven't worked with table level pkeys before. > next > > time i get pgadmin3 fired up, i'll look into it. > > Here is an example of a table constraint right from > the documentation: > CREATE TABLE films ( > code char(5), > title varchar(40), > did integer, > date_prod date, > kind varchar(10), > len interval hour to minute, > CONSTRAINT code_title PRIMARY KEY(code,title) > ); > cool beans! but my mysql friends tell me pgsql is slow... is it? juuuuuust kidding! __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
thanks people for your input.
I also posted this over on pgsql-general and got a pretty thorough reply from one Michael Glaesemann, which seems to have pretty much sorted it for me.
cheers
Daniel
--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131
I also posted this over on pgsql-general and got a pretty thorough reply from one Michael Glaesemann, which seems to have pretty much sorted it for me.
cheers
Daniel
--
Daniel McBrearty
email : danielmcbrearty at gmail.com
www.engoi.com : the multi - language vocab trainer
BTW : 0873928131