Thread: AW: update on TOAST status'

AW: update on TOAST status'

From

Zeugswetter Andreas SB

Date:

12 July 2000, 06:32:12

> > I don't like that --- seems it would put a definite crimp in the
> > whole point of TOAST, which is not to have arbitrary limits on field
> > sizes.
> 
>     If we can solve it, let's do so. If we cannot, let's restrict
>     it for 7.1.

How are you doing the index toasting currently ? Is it on the same 
line as table toasting ? That is: toast some index column values if the key 
exceeds 2k ?

Andreas

Re: AW: update on TOAST status'

From

JanWieck@t-online.de (Jan Wieck)

Date:

12 July 2000, 09:07:50

Zeugswetter Andreas SB wrote:
>
> > > I don't like that --- seems it would put a definite crimp in the
> > > whole point of TOAST, which is not to have arbitrary limits on field
> > > sizes.
> >
> >     If we can solve it, let's do so. If we cannot, let's restrict
> >     it for 7.1.
>
> How are you doing the index toasting currently ? Is it on the same
> line as table toasting ? That is: toast some index column values if the key
> exceeds 2k ?
   The current CVS is broken in that area. You'll notice as soon   as you have many huge "text" values in an index,
updatethem,   vacuum and continue to update.

   The  actual  behaviour  of the toaster is to toast each tuple   until it has a delicious looking, brown and  crispy
surface.  The  indicator  for  beeing delicious is that it shrank below   MaxTupleSize/4 - that's a little less than 2K
ina default 8K   blocksize setup.

   It  then  sticks  the  new  tuple into the HeapTuple's t_data   pointer.
   Index  inserts  are  allways  done  after  heap_insert()   or   heap_update().   At that time, the index tuples will
bebuilt   from the values found in the now  replaced  heap  tuple.  And   since  the  heap  tuple found now is allways
smallerthan 2K,   any combination of attributes out of it  must  be  too  (it's   impossible  to  specify  one  and the
sameattribute multiple   times in one index).

   So the indices simply inherit the toasting result. If a value   got  compressed,  the index will store the
compressedformat.   If it got moved off, the index  will  hold  the  toast  entry   reference for it.

   One  of the biggest advantages is this: In the old system, an   indexed column of 2K caused 2K be stored in the heap
plus 2K   stored in the index. Plus all the 2K instances in upper index   block range specs.  Now, the heap and  the
index will  only   hold references or compressed items.

   Absolutely  no  problem for compressed items. All information   to recreate the original value is in the Datum
itself.
   For external stored ones, the reference tells the OIDs of the   secondary  relation and it's index (where to find
thedata of   this entry), a unique identifier of the  item  (another  OID)   and  some  other  info.   So  the
referencecontains all the   information required to fetch the data just by looking at the   reference.  And  since  the
detoaster  scans  the  secondary   relation with a visibility of SnapShotAny, it'll  succeed  to   find  them  even  if
they'vebeen deleted long ago by another   committed transaction. So index  traversal  will  succeed  on   that in any
case.
   What  I  didn't  knew  at the time of implementation is, that   btree indices can keep such a reference in upper
levelblocks   range specifications even after a vacuum successfully deleted   the index tuple holding  the  reference
itself. That's  the   current pity.

   Thus,  if  vacuum  finally  removed  deleted  tuples from the   secondary relations (after  the  heap  and  index
have been   vacuumed),   the   detoaster   cannot   find  those  entries,   referenced by upper index blocks, any
more.
   Maybe we could propagate key range changes into upper  blocks   at index_delete() time. Will look at the btree code
now.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: AW: update on TOAST status'

From

JanWieck@t-online.de (Jan Wieck)

Date:

12 July 2000, 15:16:47

I wrote:
>
>     Maybe we could propagate key range changes into upper  blocks
>     at index_delete() time. Will look at the btree code now.
   After looking at the vacuum code it doesn't seem to be a good   idea.  Doing so would require to traverse  the
btree first,   while  the  current  implementation  just  grabs the block by   index ctid and pulls out the tuple.  I
would expect  it  to   significantly  slow  down  vacuum  again  - what we all don't   want.

   So the only way left is recreating the indices  from  scratch   and moving the new ones into place.
   But  in  contrast  to things like column dropping, this would   have to happen on every vacuum run for alot of
tables.
   Isn't it appropriate to have a specialized version of it  for   this   case   instead  of  waiting  for  a  general
relation  versioning?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: AW: update on TOAST status'

From

Tom Lane

Date:

12 July 2000, 16:01:48

JanWieck@t-online.de (Jan Wieck) writes:
>     So the only way left is recreating the indices  from  scratch
>     and moving the new ones into place.
>     But  in  contrast  to things like column dropping, this would
>     have to happen on every vacuum run for alot of tables.
>     Isn't it appropriate to have a specialized version of it  for
>     this   case   instead  of  waiting  for  a  general  relation
>     versioning?

I don't see a "specialized" way that would be any different in
performance from a "generalized" solution.  The hard part AFAICT is how
does a newly-started backend discover the current version numbers for
the critical system tables and indexes.  To do versioning of system
indexes at all, we need a full-fledged solution.

But as you pointed out before, none of the system indexes are on
toastable datatypes.  (I just checked --- the only index opclasses used
in template1 are: int2_ops int4_ops oid_ops char_ops oidvector_ops
name_ops.)  Maybe we could have an interim solution using the old method
for system indexes and a drop-and-rebuild approach for user indexes.
A crash partway through rebuild would leave you with a busted index,
but maybe WAL could take care of redoing the index build after restart.
(Of course, if the index build failure is reproducible, you're in
big trouble...)

I don't *like* that approach a whole lot; it's ugly and doesn't sound
all that reliable.  But if we don't want to deal with relation
versioning for 7.1, maybe it's the only way for now.
        regards, tom lane