Thread: TOAST & performance with lots of big columns in a table
I've got an articles table where I want to store texts, of which several translations exist. Thanks to TOAST I can now store texts of arbitrary length directly in the table, which is already a big advantage over stuffing them into the file system and trying to keep the database and file system in sync. What I am wondering is: From a conceptual point of view, it appears better to keep all translations in one table. I forget what exactly the argument is; it has something to do with normalization theory. Anyway I've already got meta information about articles that applies to all translations - such as author, position within the overall structure, related articles etc.; so if I were to have a table for every language, then every article row in any language-table corresponding to a particular article would have to link with the same row in authors, index, etc., and the structure would get more complicated than it needs to be. However, with a long article of maybe several 100 K, and translations in 6 languages (this is theoretical, actually I have only 2 at the moment), the row size would increase accordingly. Does this pose a problem for TOAST? Is it a better plan to have a separate table for each language? - Frank
Uh, I think I was wired rather the wrong way up, this question is confused. What a little fresh air can do. Cycling home from the office cleared the confusion in my head: It is of course nonsense to store all translations in a single row, also to have different tables for different languages. You have one table with a 'language' field that stores the information as to whether this is English, French, etc.; and then another table for the meta stuff, that also links to the authors table etc.. So simple. I am a little embarassed. Frank Joerdens wrote: > > I've got an articles table where I want to store texts, of which several translations > exist. Thanks to TOAST I can now store texts of arbitrary length directly in the table, > which is already a big advantage over stuffing them into the file system and trying to > keep the database and file system in sync. What I am wondering is: > > >From a conceptual point of view, it appears better to keep all translations in one table. > I forget what exactly the argument is; it has something to do with normalization theory. > Anyway I've already got meta information about articles that applies to all translations - > such as author, position within the overall structure, related articles etc.; so if I were > to have a table for every language, then every article row in any language-table > corresponding to a particular article would have to link with the same row in authors, > index, etc., and the structure would get more complicated than it needs to be. However, > with a long article of maybe several 100 K, and translations in 6 languages (this is > theoretical, actually I have only 2 at the moment), the row size would increase > accordingly. Does this pose a problem for TOAST? Is it a better plan to have a separate > table for each language? > > - Frank
Frank Joerdens wrote: > Uh, I think I was wired rather the wrong way up, this question is > confused. What a little fresh air can do. Cycling home from the office > cleared the confusion in my head: It is of course nonsense to store all > translations in a single row, also to have different tables for > different languages. You have one table with a 'language' field that > stores the information as to whether this is English, French, etc.; and > then another table for the meta stuff, that also links to the authors > table etc.. So simple. I am a little embarassed. That'd be my suggestion too, because it makes it alot easier to add a 7th and 8th language later. Anyway, having many big columns in one table will make it more likely that the toaster is invoked. Actually, it only does some work until the main tuple fits into BLKSZ/4, which is a little less than 2K in the default setup. This will cause more data to get toasted. The application would need to only select those columns that are actually required to avoid detoasting of all the stuff it doesn't use, so this approach is a little more complicated at the client side. If the application selects all translations every time anyway, there shouldn't be much of a difference. But if it first selects the original and just the available translations, it only needs to fetch the original text plus the language codes of all others with a second query, avoiding the detoasting and data transfer for all the available translations, so your new schema is definitely better. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #