Thread: TOAST on indices

TOAST on indices

From
JanWieck@t-online.de (Jan Wieck)
Date:
For discussion:
   First  what the current implementation and the yet to be done   proposals do.
       All varlena data types (text, char, varchar, arrays) will       finally  be  toastable.  Every table that uses
suchtypes       will have a secondary relation to move off attributes.
 
       The toaster allways tries to  keep  a  main  tuple  small       enough  so that at minimum 4 tuples fit into a
block.One       had complained about, and I explain  later  why  I  think       it's a good decision anyway.
 
   This strategy already covers most possible index problems. If   the main tuple fits into 2K after toasting,  any
combination  of  attributes out of it will too. The only thing not covered   are functional indices.
 
   In real world scenarios, indices are usually set up on  small   key  values.   These  are very likely to be kept
plainin the   main tuple by the toaster, becuase it looks  at  the  biggest   values  first.   So  an index (built out
ofthe values in the   main tuple  after  toasting)  will  also  contain  the  plain   values.  Thus,  index scans will
notrequire toast fetches in   turn. Except the indexed attribute had at some point  a  huge   value.
 
   The  current  TOAST implementation hooks into the heap access   methods only.  Automagically covering the index
issuesdue to   the  2K approach. Fact is, that if more toast entries can get   produced during index inserts, we need
totake care for  them   during vacuum (the only place where index items get removed).   Alot of work just to support
hugefunctional indices  -  IMHO   not  worth  the  efford  right  now.  Let's  better  get some   experience with the
entirething before going too far.
 
   Why is it good to keep the main tuple below 2K? First because   of the above side effects for indices. Second,
becausein the   most likely case  of  small  indexed  attributes,  more  main   tuples  (that  must  be  fetched  for
the visibility checks   anyway) will fit into one block. That'll cause more blocks of   the relation to fit into the
givenshared memory buffer cache   and avoids I/O during index scans.
 
   My latest tests load a 1.1M tree full of .html files  into  a   database.   The  result  is  a  140K  heap  plus
300K toast   relation. Without that 2K approach, the result is a 640K heap   plus  90K  toastrel  only.   Since all
compressionis done on   single entries, it scales linear, so that a  1.1G  tree  will   result in a 140M heap plus 300M
toastrelvs. a 640M heap plus   90M toastrel. No need to bechmark it - I know which  strategy   wins.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: TOAST on indices

From
Philip Warner
Date:
At 20:42 4/07/00 +0200, Jan Wieck wrote:
>
>    So  an index (built out of the values in the
>    main tuple  after  toasting)  will  also  contain  the  plain
>    values.  Thus,  index scans will not require toast fetches in
>    turn. Except the indexed attribute had at some point  a  huge
>    value.

So that, for toasted attrs, the indexes will be no use for sorting. I agree
that in the majority of cases this is not a problem, but if the entire
tuple gets toasted because it is too large it becomes a problem.

I agree that this is only a problem if indexes are used in sorting, and may
not be a problem if one builds an index on 'substr(toastable-field,1,20)',
but I think you are suggesting not supporting functional indexes,
below...but maybe I've missed the point.



>    The  current  TOAST implementation hooks into the heap access
>    methods only.  Automagically covering the index issues due to
>    the  2K approach. Fact is, that if more toast entries can get
>    produced during index inserts, we need to take care for  them
>    during vacuum (the only place where index items get removed).
>    Alot of work just to support huge functional indices  -  IMHO
>    not  worth  the  efford  right  now.  Let's  better  get some
>    experience with the entire thing before going too far.
>


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: TOAST on indices

From
JanWieck@t-online.de (Jan Wieck)
Date:
Philip Warner wrote:
> At 20:42 4/07/00 +0200, Jan Wieck wrote:
> >
> >    So  an index (built out of the values in the
> >    main tuple  after  toasting)  will  also  contain  the  plain
> >    values.  Thus,  index scans will not require toast fetches in
> >    turn. Except the indexed attribute had at some point  a  huge
> >    value.
>
> So that, for toasted attrs, the indexes will be no use for sorting. I agree
> that in the majority of cases this is not a problem, but if the entire
> tuple gets toasted because it is too large it becomes a problem.
   That   ain't  true,  entirely.  First  of  all,  only  single   attributes get toasted.  Never a complete tuple but
maybeall   of  it's  attributes  (if  it  is  a  table  with  many, many   attributes or all of them are big).
 
   If so, well, the sort might become damned slow. Assuming  all   the rows selected have the attribute to sort on
toasted,each   comparision will  require  two  index  scans  (plus  possibly   decompression) during the sort.
 
   But tell me, do you know of real world DB installations where   indices on fields likely to be >1K exist?  What  is
such an   index good for? Fast reverse lookup of 65536-bit RSA keys?
 
   The  system  won't  complain,  nor will it bail out in such a   situation.  That it won't behave as good as  it
could is  a   con.  Maybe  we should tell on our web pages that someone who   wants to create indices on multi-K
attributes should  better   look for another DB, because Postgres is slow in that case?
 

> I agree that this is only a problem if indexes are used in sorting, and may
> not be a problem if one builds an index on 'substr(toastable-field,1,20)',
> but I think you are suggesting not supporting functional indexes,
> below...but maybe I've missed the point.
>
> >    The  current  TOAST implementation hooks into the heap access
> >    methods only.  Automagically covering the index issues due to
> >    the  2K approach. Fact is, that if more toast entries can get
> >    produced during index inserts, we need to take care for  them
> >    during vacuum (the only place where index items get removed).
> >    Alot of work just to support huge functional indices  -  IMHO
> >    not  worth  the  efford  right  now.  Let's  better  get some
> >    experience with the entire thing before going too far.
   Yeah - you missed me here.
   In  the case of a functional index, the function would seldom   return one of the original tuples attribute  values.
Usually   those  functions manipulate one or more attributes to compute   a completely new value (like your  substr()
example above).   In  the TOAST world, any such function returns a plain, fully   expanded, in memory value.
 
   So even if the toaster had  worked  on  the  main  tuple  and   compressed/moved  off  some  attributes,  the  value
that is   computed during index tuple creation is of full size.  Having   a  char(20000)  attribute, the toaster will
shrinkit down so   the  main  tuple  will  fit.  But  a  functional  index  like   "substr(att, 1, 10000)" must fail,
becauseduring index tuple   creation the funtion is evalueated and creates a  10000  byte   value.
 
   In the current implementation, non-functional indices on huge   fields should be supported (there still are bugs
because it   doesn't  work  right  now).  For  functional  ones,  the  old   restriction of "index-tuple must be
smaller than  supported   tuple size of index method" applies.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #