Thread: Transparent i18n?
I've recently been trying to implement some i18n functionality as simply as possible into my application. I have a lot of lookup values and such in the DB that need to be translated, and I would rather not do it in the calling client. A friend and I put our heads together, and came up which seemed like a ridiculously elegant way of doing it using PGs SELECT rules, and some type of connection "session" information. Well I got the connection session stuff figured out thanks to Richard Huxton. Apparently though, SELECT rules can only be used to create views and not actually modify the data returned by a table? I was trying to do something along the lines of a translations table, which house all translations for a given string. My lookup tables would then reference the specific strings they needed. I was then going to place a SELECT rule on the lookup table that would perform a join(based off of the lookup word, and the current session language), and return that in the field of the lookup value. Is my only option to use a separate view? What are some techniques the rest of you use for storing translation info in the DB? Thanks, Steve
Hi Steve. I have been a bit puzzling over a similar issue - not i18 for interface but for text data and trying to sort out a solution so I will be interested to hear additional advice as well. When I wrote to the list a couple of weeks back (look for my posting around the 17th) I was looking at doing something with a normalized structure but now I don't think this is going to work that well. It was suggested that I look at an array. I am looking at a multidimensional array to do this. I am just reading up on postgres support for arrays. I think my table will be pretty simple; CREATE TABLE multi_language ( id SERIAL, lang_code_and_text TEXT[][] ); So records would look like: 1, {{'en','the brown cow'},{'fr','la vache brun'}} 2, {{'en','the blue turkey'},{'fr','la dandon bleu'}} I have another table with language codes ie en, fr, etc. When languages are added, I would just append to array for whole table. The trouble for me is more of getting the data out in postgres because retrieving the raw array will be incompatible syntax for python and I would have to manipulate results. Quite frankly I want for this to be done in Postgres so I only have to retrieve query results. If I cant it would be a pain unless I can think of something else because the issue is going to be the keys and values in my languages table working with the array. For example, if I have a serial table containing my languages and add 2 entries english and french, I would then have two elements in my array and it wouldn't be so bad because I could use the id as a key to get the value back out through a query. But say I delete french (and remove second element in entries for my table) and add spanish, now I have an language id of 3 and two elements in my array that can't call each other properly. In python, arrays are called dictionaries and you can easily grab the an element doing something like lang_code_and_text['en'] to get the value of the en (english) key. I was hoping you could call the multi-language text out of the array with a text key instead of a numeric index but it appears Postgres will only let you do it this way or get results from slices as far as I can tell. Maybe someone else on the list has some advice to offer here. ie. SELECT language_text[1][1] AS language_code, language[1][2] AS text FROM language_text; Regards, David
David Pratt <fairwinds@eastlink.ca> writes: > It was suggested that I look at an array. I think that was me. I tried not to say there's only one way to do it. Only that I chose to go this way and I think it has worked a lot better for me. Having the text right there in the column saves a *lot* of work dealing with the tables. Especially since many tables would have multiple localized strings. > I think my table will be pretty simple; > CREATE TABLE multi_language ( > id SERIAL, > lang_code_and_text TEXT[][] > ); > > So records would look like: > > 1, {{'en','the brown cow'},{'fr','la vache brun'}} > 2, {{'en','the blue turkey'},{'fr','la dandon bleu'}} That's a lot more complicated than my model. Postgres doesn't have any functions for handling arrays like these as associative arrays like you might want. And as you've discovered it's not so easy to ship the whole array to your client where it might be easier to work with. I just have things like (hypothetically): CREATE TABLE states ( abbrev text, state_name text[], state_capitol text[] ) And then in my application code data layer I mark all "internationalized columns" and the object that handles creating the actual select automatically includes a "[$lang_id]" after every column in that list. The list of languages supported and the mapping of languages to array positions is fixed. I can grow it later but I can't reorganize them. This is fine for me since pretty much everything has exactly two languages. -- greg
> SELECT language_text[1][1] AS language_code, > language[1][2] AS text > FROM language_text; They way we do that in GNUmed: select lookup_val, _(lookup_val) from lookup_table where ...; If you want to know how see here: http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup Feel free to ask for clarification. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Many thanks Karsten for some insight into how you are handling this. Regards, David On Saturday, July 2, 2005, at 06:08 AM, Karsten Hilbert wrote: >> SELECT language_text[1][1] AS language_code, >> language[1][2] AS text >> FROM language_text; > > They way we do that in GNUmed: > > select lookup_val, _(lookup_val) from lookup_table where ...; > > If you want to know how see here: > > > http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/ > sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup > > Feel free to ask for clarification. > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
Hi Greg. Well I'm kind of half way but I think what I am doing could work out. I have an iso_languages table, a languages table for languages used and a multi_language table for storing values of my text fields. I choose my language from iso_languages. Any table that needs a multi_language field gets one by id with referential integrity with a multi_language table id since this is a direct relationship. Thanks for the idea of using array BTW. Referential integrity could not work with my first model. I am taking the array text and parsing the result in python to get the key positions. This is possible with a query using string_array function and getting text from any multi_language field. Then I put result into a dictionary (an array) and get length and add one to get new key value that is added when a new language is added. Using this key an array is added to existing array to each row of multi_language table (in lang_code_and_text) field. So the length of the main array in multi-demensional array grows by one array for the language for each record in multilanguage table. I can also seek the english (en) value so that I will be able to use english as default text for the new language and inserting a new array for that language into the lang_code_and_text array. For example, if spanish (es) added the new key is 3 so insert for each record so have something like this now: 1, {{'en','the brown cow'},{'fr','la vache brun'},{'es','the brown cow'}} 2, {{'en','the blue turkey'},{'fr','la dandon bleu'},{'es','the blue turkey'}} In my forms, I am using a template to display entry fields for each language used. The english will be default for new languages added so there is something in these fields to start with and it should update properly based on correct key values. In my languages table, I am storing the current key positions for each language used in my app. I have an i18 layer for zope and based on language code I will pass language id so you see right language in interface and data both. When updating or deleting records, I am will be making a trigger to remove the array that represents a translation after update. Then it has to update my language table to provide updated key values for my languages. I am working on my first functions and triggers in plpgsql. This is where I may need help from the list if I get stuck but so far so good! Well so far so go but not finished yet. Does anyone have any comments on scalability. I don't really see a problem since there really is not any risk of my needing any more than 10 - 15 languages or so max out of maybe 300 languages in the world. I think 15 entries in an array is very small so can't see any reason for this not to work well. > >> I think my table will be pretty simple; >> CREATE TABLE multi_language ( >> id SERIAL, >> lang_code_and_text TEXT[][] >> ); >> >> So records would look like: >> >> 1, {{'en','the brown cow'},{'fr','la vache brun'}} >> 2, {{'en','the blue turkey'},{'fr','la dandon bleu'}} > > That's a lot more complicated than my model. > > Postgres doesn't have any functions for handling arrays like these as > associative arrays like you might want. And as you've discovered it's > not so > easy to ship the whole array to your client where it might be easier > to work > with. > Yes. This is a bit complicating since if they were there it would be really nice to work with arrays. > > I just have things like (hypothetically): > > CREATE TABLE states ( > abbrev text, > state_name text[], > state_capitol text[] > ) > > And then in my application code data layer I mark all > "internationalized > columns" and the object that handles creating the actual select > automatically > includes a "[$lang_id]" after every column in that list. > > The list of languages supported and the mapping of languages to array > positions is fixed. I can grow it later but I can't reorganize them. > This is > fine for me since pretty much everything has exactly two languages. That is pretty cool. The only advantage in what I am doing will have is that you will be able to add languages at any time and there will be no huge load on postgres as far as I can tell since multilanguage table is a table is only two fields and one record for each multi-language field referenced from my other other tables and calls to it are direct by id. I think this should work but it is a puzzler for sure! Regards, David
On Sat, Jul 02, 2005 at 05:00:50PM -0300, David Pratt wrote: >> http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup > Many thanks Karsten for some insight into how you are handling this. David, if you go to the Developers Corner in our Wiki at http://salaam.homeunix.com/twiki/bin/view/Gnumed/WebHome you'll find an explanation of how we use this. Feel free to ask for comments if that doesn't suffice. (I am offline so can't give the precise URL.) Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Many thanks, Karsten. I am going to look at your example closely. Regards David On Sunday, July 3, 2005, at 09:50 AM, Karsten Hilbert wrote: > On Sat, Jul 02, 2005 at 05:00:50PM -0300, David Pratt wrote: > >>> http://savannah.gnu.org/cgi-bin/viewcvs/gnumed/gnumed/gnumed/server/ >>> sql/gmI18N.sql?rev=1.20&content-type=text/vnd.viewcvs-markup >> Many thanks Karsten for some insight into how you are handling this. > David, > > if you go to the Developers Corner in our Wiki at > > http://salaam.homeunix.com/twiki/bin/view/Gnumed/WebHome > > you'll find an explanation of how we use this. Feel free to > ask for comments if that doesn't suffice. > > (I am offline so can't give the precise URL.) > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
I wonder if you could make an SQL type that used text[] as its storage format but had an output function that displayed the correct text for the "current locale". Where "current locale" could be something you set by calling a function at the beginning of the transaction. Do pg_dump and all the important things use the send/receive functions not the input/output functions? so even though this output function loses information it wouldn't cause serious problems? You would still need a way to retrieve all the languages for the cases like administrative interfaces for updating the information. I'm not entirely convinced this would be any better than the alternative of retrieving all of them by default and having a function to retrieve only the correct language. -- greg
Hi Greg. Not sure about this one since I have never made my own type. Do you mean like an ip to country type of situation to guess locale? If so, I am using a ip to country table to lookup ip from request and get the country so language can be passed automatically to display proper language (but I need some translation work done first before I can activate this). I will also use this for black listing purposes and other things so multi purpose. I have got a good part of what I wanted working so far. I am just working on language update delete trigger since there does not appear to be a direct way of surgically removing a specific element from an array in postgres unless I have missed something. For example if I knew spanish was 3rd array in my multi-dimensional array of say 10 lang/translation arrays in the array containing all translations - to remove just this one without having rewrite the array and update the field (which is what I am hoping to complete today). So my language update delete trigger needs to scan the array for lang/translation for deletion, update language key for each language from a reference field (other than for the language being deleted), rewrite the array without the lang/translation that was deleted, and then update the field with rewritten array. Sounds worse that it really is since the multidimensional array containing each lang/translation array is same length and you are performing this by iterating with a loop through records in multi_language table. Further, each translation can be compared by key (for me this is the iso language code). Also, realistically how many times do you need to add and drop languages. And number of languages in use for me will likely never exceed say 20. So this process, even with large numbers of multi-language fields should not be that problematic even if you had say a few thousand text fields fields you wanted translations available for. I think you would still be looking at milliseconds to perform this. This will be an after type trigger (after deletion). I guess I will see what performance is like when I am finished - so far it is pretty fast for adding. You also have a sensible structure for multi_language fields where each one is referenced to multi_language table by id (normalized) with referential integrity (something I was seeking). The only thing not normalized are translations which is okay to me since array structure is dynamic yet keys give you exactly what you want. I am also going to look at Karsten's material shortly to see how his system works but I am interested in following through with what I started first with arrays approach since I am happy with what I am seeing. Regards, David On Monday, July 4, 2005, at 12:06 PM, Greg Stark wrote: > > I wonder if you could make an SQL type that used text[] as its storage > format > but had an output function that displayed the correct text for the > "current > locale". Where "current locale" could be something you set by calling a > function at the beginning of the transaction. > > Do pg_dump and all the important things use the send/receive functions > not the > input/output functions? so even though this output function loses > information > it wouldn't cause serious problems? > > You would still need a way to retrieve all the languages for the cases > like > administrative interfaces for updating the information. I'm not > entirely > convinced this would be any better than the alternative of retrieving > all of > them by default and having a function to retrieve only the correct > language. > > -- > greg > > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend >
Hi there, sorry if just misunderstanding but we have contrib/hstore available from http://www.sai.msu.su/~megera/postgres/gist/ which could be used for storing as many languages as you need. It's sort of perl hash. Oleg On Mon, 4 Jul 2005, David Pratt wrote: > Hi Greg. Not sure about this one since I have never made my own type. Do you > mean like an ip to country type of situation to guess locale? If so, I am > using a ip to country table to lookup ip from request and get the country so > language can be passed automatically to display proper language (but I need > some translation work done first before I can activate this). I will also > use this for black listing purposes and other things so multi purpose. > > I have got a good part of what I wanted working so far. I am just working on > language update delete trigger since there does not appear to be a direct way > of surgically removing a specific element from an array in postgres unless I > have missed something. For example if I knew spanish was 3rd array in my > multi-dimensional array of say 10 lang/translation arrays in the array > containing all translations - to remove just this one without having rewrite > the array and update the field (which is what I am hoping to complete today). > > So my language update delete trigger needs to scan the array for > lang/translation for deletion, update language key for each language from a > reference field (other than for the language being deleted), rewrite the > array without the lang/translation that was deleted, and then update the > field with rewritten array. Sounds worse that it really is since the > multidimensional array containing each lang/translation array is same length > and you are performing this by iterating with a loop through records in > multi_language table. Further, each translation can be compared by key (for > me this is the iso language code). Also, realistically how many times do you > need to add and drop languages. And number of languages in use for me will > likely never exceed say 20. So this process, even with large numbers of > multi-language fields should not be that problematic even if you had say a > few thousand text fields fields you wanted translations available for. I > think you would still be looking at milliseconds to perform this. This will > be an after type trigger (after deletion). I guess I will see what > performance is like when I am finished - so far it is pretty fast for adding. > > You also have a sensible structure for multi_language fields where each one > is referenced to multi_language table by id (normalized) with referential > integrity (something I was seeking). The only thing not normalized are > translations which is okay to me since array structure is dynamic yet keys > give you exactly what you want. I am also going to look at Karsten's > material shortly to see how his system works but I am interested in following > through with what I started first with arrays approach since I am happy with > what I am seeing. > > Regards, > David > > On Monday, July 4, 2005, at 12:06 PM, Greg Stark wrote: > >> >> I wonder if you could make an SQL type that used text[] as its storage >> format >> but had an output function that displayed the correct text for the "current >> locale". Where "current locale" could be something you set by calling a >> function at the beginning of the transaction. >> >> Do pg_dump and all the important things use the send/receive functions not >> the >> input/output functions? so even though this output function loses >> information >> it wouldn't cause serious problems? >> >> You would still need a way to retrieve all the languages for the cases like >> administrative interfaces for updating the information. I'm not entirely >> convinced this would be any better than the alternative of retrieving all >> of >> them by default and having a function to retrieve only the correct >> language. >> >> -- >> greg >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 8: explain analyze is your friend >> > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > Hi there, > > sorry if just misunderstanding but we have contrib/hstore available from > http://www.sai.msu.su/~megera/postgres/gist/ > which could be used for storing as many languages as you need. > It's sort of perl hash. Huh. That's pretty neat. I don't really need it since I can just assign fixed array indexes for each locale and use arrays. But for someone who has to support lots of different sets of locales it could be useful. Or for someone who has to index these columns using gist. -- greg
On Mon, Jul 04, 2005 at 03:27:59PM -0300, David Pratt wrote: > I am also going to look at Karsten's material shortly to see how his system works I am still away from the net but here is how to find the description in our Wiki: Go to user support, user guide, scroll down do developers guide, go to backend I18N. Please point out anything you find difficult to figure out. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Many thanks Karsten. I got a system working with arrays yesterday but will still be examining your code. I guess the next challenge is to see how well the multidimensional array can be searched. I guess I could make indexes on an expression to retrieve language for a specific key since each element array of multidimensional array is a translation that includes the iso code and text of the translation. It is pretty light and quick. I am open to examining anything that will help me learn more about doing this well. Regards, David. On Wednesday, July 6, 2005, at 11:19 AM, Karsten Hilbert wrote: > On Mon, Jul 04, 2005 at 03:27:59PM -0300, David Pratt wrote: > >> I am also going to look at Karsten's material shortly to see how his >> system works > I am still away from the net but here is how to find the > description in our Wiki: > > Go to user support, user guide, scroll down do developers > guide, go to backend I18N. > > Please point out anything you find difficult to figure out. > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > ---------------------------(end of > broadcast)--------------------------- > TIP 8: explain analyze is your friend >