TOASTing smaller things - Mailing list pgsql-hackers

From Chris Browne
Subject TOASTing smaller things
Date
Msg-id 60slbyk0xb.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to HELP all women were raped during the May riots in Jakarta  (adm@pu.go.id)
Responses Re: TOASTing smaller things  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: TOASTing smaller things  ("Jim C. Nasby" <jim@nasby.net>)
List pgsql-hackers
In some of our applications, we have cases where it would be very nice
if we could activate TOAST at some sort of lower threshold than the
usual 2K that is true now.  Let me note the current code that controls
the threshold:

/** These symbols control toaster activation.  If a tuple is larger than* TOAST_TUPLE_THRESHOLD, we will try to toast
itdown to no more than* TOAST_TUPLE_TARGET bytes.  Both numbers include all tuple header overhead* and between-fields
alignmentpadding, but we do *not* consider any* end-of-tuple alignment padding; hence the values can be compared
directly*to a tuple's t_len field.  We choose TOAST_TUPLE_THRESHOLD with the* knowledge that toast-table tuples will be
exactlythat size, and we'd* like to fit four of them per page with minimal space wastage.** The numbers need not be the
same,though they currently are.** Note: sizeof(PageHeaderData) includes the first ItemId, but we have* to allow for 3
more,if we want to fit 4 tuples on a page.*/
 
#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I  / 4)

We have cases where we're storing XML message information which is
near the 0.5K mark, that being the case, tuples virtually never get
TOASTed.

somesystem=# select min(length(xml)), max(length(xml)), avg(length(xml)), stddev(length(xml)) from table_with_xml;min |
max |         avg          |      stddev      
 
-----+------+----------------------+------------------244 | 2883 | 651.6900720788174376 | 191.602077911138
(1 row)

I can see four controls as being pretty plausible:

1.  Compile time...

#define TOAST_DENOMINATOR 17    /* Use this as the divisor; current default behaviour falls from TOAST_DENOMINATOR = 4
*/

#define TOAST_TUPLE_THRESHOLD^I\
^IMAXALIGN_DOWN((BLCKSZ - \
^I^I^I^I   MAXALIGN(sizeof(PageHeaderData) + 3 * sizeof(ItemIdData))) \
^I^I^I^I  / TOAST_DENOMINATOR)

That's obviously cheapest to the DB engine.

I just made this change to a checkout of CVS HEAD, and it readily
survived a regression test.

2.  GUC value for TOAST_DENOMINATOR

Do the above, but with the added detail that TOAST_DENOMINATOR refers
to a GUC value.

I think I could probably make this change; the principle remains much
the same as with #1.

3.  GUC value for TOAST_TUPLE_THRESHOLD

This probably has to get modified to the nearest feasible value,
modulo alignment; it's not all that different from #1 or #2.

4.  A different mechanism would be to add a fifth storage column
strategy (the present four are PLAIN, EXTENDED, EXTERNAL, MAIN), let's
say, TOAST.

At present, the 4 values are essentially advisory; columns get TOASTed
if the column permits EXTENDED storage, but that only occurs if the
size is greater than TOAST_TUPLE_THRESHOLD.

If the new value was chosen, the column would *always* get stored as
TOAST.

Presumably #1 or #2 could readily get into 8.3 as they're pretty easy;
#3 is a bit trickier, whilst #4 is probably not "8.3-fittable".

Question:

Which of these sounds preferable?
-- 
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/finances.html
Where do you  *not* want to go today?  "Confutatis maledictis, flammis
acribus addictis" (<http://www.hex.net/~cbbrowne/msprobs.html>


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: CREATE INDEX and HOT - revised design
Next
From: Gregory Stark
Date:
Subject: Re: CREATE INDEX and HOT - revised design