Don Baccus wrote:
>
> ...
> I expect TOAST to work even better). Users will still be able change to
> larger blocksizes (perhaps a wise thing to do if a large percentage of their
> data won't fit into a single PG block). Users using the default will
> be able to store rows of *awesome* length, efficiently.
Depends...
Actually the toaster already jumps in if your tuples exceed BLKSZ/4, so with the default of 8K blocks it tries
to keep all tuples smaller than 2K. The reasons behind that are:
1. An average tuple size of 8K means an average of 4K unused space at the end of each block. Wasting space
means to waste IO bandwidth.
2. Since big items are unlikely to be search criteria, needing to read them into memory for every
chech for a match on other columns is a waste again. So the more big items are off from the main tuple,
thesmaller the main table becomes, the more likely it is that the main tuples (holding the keys) are
cached and the cheaper a sequential scan becomes.
Of course, especially for 2. there is a break even point. That is when the extra fetches to send toast values
to the client cost more than there was saved from not doing it during the main scan already. A full
table SELECT * definitely costs more if TOAST is involved. But who does unqualified SELECT * from a
multi-gigtable without problems anyway? Usually you pick a single or a few based on some other key attributes
-don't you?
Let's make an example. You have a forum server that displays one article plus the date and sender of all
follow-ups.The article bodies are usually big (1-10K). So you do a SELECT * to fetch the actually displayed
article,and another SELECT sender, date_sent just to get the info for the follow-ups. If we assume a uniform
distributionof body size and an average of 10 follow-ups, that'd mean that we save 52K of IO and cache usage
foreach article displayed.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #