Re: Use arrays to store multilanguage texts - Mailing list pgsql-general
From | Greg Stark |
---|---|
Subject | Re: Use arrays to store multilanguage texts |
Date | |
Msg-id | 87k6yv9ujt.fsf@stark.xeocode.com Whole thread Raw |
In response to | Use arrays to store multilanguage texts (Michal Táborský <michal@taborsky.cz>) |
Responses |
Re: Use arrays to store multilanguage texts
|
List | pgsql-general |
Michal Táborský <michal@taborsky.cz> writes: > I am wondering, if it's effective to use text arrays to store > multilanguage information. > > We used to do it like this: I went through almost exactly the same design evolution. My initial design had dozens of "text_id" fields littered throughout the database and had a "localized_text" table with the expansion of every text_id for every language. It was a total headache. > name text[], > description text[] That's exactly what I have now. > Then I'd just do: > > SELECT id, price, name[1], description[1] FROM product Well I defined functions lang_?? for each language code, so that the index of each language wasn't hard coded into the application. > The only question, which remains unanswered is how is it with indexes. I > mean--most of the queries on such table are ordered by name, for > example. Therefore I'd do this: Well in my case I was never using an index on the localized text columns anyways. > SELECT id, price, name[1], description[1] FROM product ORDER BY name[1] > > Is it possible to build an index, which will be used in such query? I > had no luck with CREATE INDEX product_name1 ON product (r[1]), but maybe > the syntax is just somehow different. It is in 7.4. You might need another set of parentheses to make it parse, like "ON product ((r[1]))". In 7.3 you would need to make a function for each language and index on lang_en(r). As I said I think that's not a bad idea anyways. > Are there any more drawbacks or limitations, that maybe I am not aware > of, which would discourage you from doing the multilanguage support this > way? Should we consider this or stick to the old ways? My main worry is what happens if I have to add a new language. I'll have to update every record of every table to add a new element to the arrays. There's no centralized place that "knows" about all the translated texts, I have to know all the tables and columns that will have to be fixed. Similarly it's impossible to query the database for "tell me all the strings that haven't been translated yet". I would have to do an individual query for every column of this type. Note also that in my case I'm entering the data for both languages (we only have two so far) at the same time in the same form. Most forms in the application just have two columns and the user has to fill out the same information in each language. If you wanted a more traditional l10n interface where the user has to enter translations for every string in the application then again the inability to pull out all the strings might be a problem. But in my model the convenience of being able to update both languages in the same update and not having to worry about the order in which inserts are done and keeping track of ids to ensure relationships, are huge. I can't imagine having continued in the other track. The benefit of having the translation available immediately without having to do an extra join depends a lot on the application. In my case I had dozens of fields to look up and often in the midst of fairly complex queries. I was worried about performance and also about it raising the likelihood of the optimizer getting confused. Also it made utter messes of the queries. -- greg
pgsql-general by date: