Re: Use arrays to store multilanguage texts - Mailing list pgsql-general

From Greg Stark
Subject Re: Use arrays to store multilanguage texts
Date
Msg-id 87k6yv9ujt.fsf@stark.xeocode.com
Whole thread Raw
In response to Use arrays to store multilanguage texts  (Michal Táborský <michal@taborsky.cz>)
Responses Re: Use arrays to store multilanguage texts
List pgsql-general
Michal Táborský <michal@taborsky.cz> writes:

> I am wondering, if it's effective to use text arrays to store
> multilanguage information.
>
> We used to do it like this:

I went through almost exactly the same design evolution. My initial design had
dozens of "text_id" fields littered throughout the database and had a
"localized_text" table with the expansion of every text_id for every language.

It was a total headache.

>     name text[],
>     description text[]

That's exactly what I have now.

> Then I'd just do:
>
> SELECT id, price, name[1], description[1] FROM product

Well I defined functions lang_?? for each language code, so that the index of
each language wasn't hard coded into the application.

> The only question, which remains unanswered is how is it with indexes. I
> mean--most of the queries on such table are ordered by name, for
> example. Therefore I'd do this:

Well in my case I was never using an index on the localized text columns
anyways.

> SELECT id, price, name[1], description[1] FROM product ORDER BY name[1]
>
> Is it possible to build an index, which will be used in such query? I
> had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe
> the syntax is just somehow different.

It is in 7.4. You might need another set of parentheses to make it parse, like
"ON product ((r[1]))". In 7.3 you would need to make a function for each
language and index on lang_en(r). As I said I think that's not a bad idea
anyways.

> Are there any more drawbacks or limitations, that maybe I am not aware
> of, which would discourage you from doing the multilanguage support this
> way? Should we consider this or stick to the old ways?

My main worry is what happens if I have to add a new language. I'll have to
update every record of every table to add a new element to the arrays. There's
no centralized place that "knows" about all the translated texts, I have to
know all the tables and columns that will have to be fixed.

Similarly it's impossible to query the database for "tell me all the strings
that haven't been translated yet". I would have to do an individual query for
every column of this type.

Note also that in my case I'm entering the data for both languages (we only
have two so far) at the same time in the same form. Most forms in the
application just have two columns and the user has to fill out the same
information in each language.

If you wanted a more traditional l10n interface where the user has to enter
translations for every string in the application then again the inability to
pull out all the strings might be a problem.

But in my model the convenience of being able to update both languages in the
same update and not having to worry about the order in which inserts are done
and keeping track of ids to ensure relationships, are huge. I can't imagine
having continued in the other track.

The benefit of having the translation available immediately without having to
do an extra join depends a lot on the application. In my case I had dozens of
fields to look up and often in the midst of fairly complex queries. I was
worried about performance and also about it raising the likelihood of the
optimizer getting confused. Also it made utter messes of the queries.

--
greg

pgsql-general by date:

Previous
From: Tamer
Date:
Subject: locating postgresql lib and include files for SuSE 9.0 RPMS?
Next
From: Joe Conway
Date:
Subject: Re: Use arrays to store multilanguage texts