Re: update on TOAST status' - Mailing list pgsql-hackers

From JanWieck@t-online.de (Jan Wieck)
Subject Re: update on TOAST status'
Date
Msg-id 200007071130.NAA24645@hot.jw.home
Whole thread Raw
In response to Re: update on TOAST status'  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: update on TOAST status'  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> One simple answer that might help for other scenarios too is to keep
> >> a small cache of the last few values that had to be untoasted.  Maybe
> >> we only need it for moved-off values --- it could be that decompression
> >> is fast enough that we should just do it over rather than trying to
> >> cache.
>
> >     I'm  still argueing that indexing huge values is a hint for a
> >     misleading schema.  If  this  is  true,  propagating  toasted
> >     attributes  into indices is a dead end street and I'd have to
> >     change  the  heap-access<->toaster  interface  so  that   the
> >     modified  (stored)  main tuple isn't visible to the following
> >     code (that does the index inserts).
>
> But you'll notice that is *not* what I suggested.  A detoasted-value
   Haven't missed it in the first read - of course.

> cache could be useful in more situations than just an index lookup.
> I don't necessarily say we've got to have it in 7.1, but let's keep
> the idea in mind in case we start finding there is a bottleneck here.
>
> >     What is the value of supporting index tuples >2K?
>
> If you're toasting the whole main tuple down to <2K, you might find
> yourself toasting individual fields that are a good bit less than
> that.  So I don't think indexing a toasted value will be all that
> unusual.
   Exactly that's why I'm asking if we wouldn't be better off by   limiting index tuples to  (blocksize  -  overhead)
/ 4  and   allways store plain, untoasted values in indices.
 
   I've asked now a couple of times "who really has the need for   indexing huge values"? All responses I got so  far
where of   the  kind  "would  be nice if we support it" or "I don't like   such restrictions".  But noone really said
"Ineed it".
 

> But this is all speculation for now.  Let's get it working bulletproof
> for 7.1, and then worry about speedups after we know they are needed.
   Let me speculate too a little.
   The experience I have up to now is that the saved  time  from   requiring less blocks in the buffer cache outweights
thecost   of decompression.  Especially with our algorithm, because  it   is byte oriented (instead of huffman coding
beeingbased on a   bit  stream),  causing   it   to   be   extremely   fast   on   decompression.  And  the  technique
ofmoving off values from   the main heap causes the main tuples to be much  smaller.  As   long  as  the  toasted
valuesaren't used in qualification or   joining,  only  their  references  move  around  through  the   various
executor steps, and only those values that are part   of the final result set need to be fetched when sending  them
tothe client.
 
   Given  a  limited  amount  of  total memory available for one   running postmaster, we save alot of disk I/O  and
hold more   values in their compressed format in the shared buffers. With   the limit on total memory, the size of the
buffercache  must   be  lowered  by the size of the new detoasted cache, and that   only if we make it shared too.
Givenfurther an  average  of   50% compression ratio (what's not unlikely with typical input   like html pages), one
cacheddetoasted  value  would  require   two compressed ones to go away.
 
   Wouldn't  really  surprise me if we gain speed from it in the   average query.  Even  if  some  operations  might
slow down   (sorting on maybe toasted fields).
 
   We  need  to  see some results and wait for reports for this.   But we know already that it can cause  trouble  with
indexed   fields,  because  these are likely to be used for comparision   during scans. So do we want to  have  indices
storing  plain   values allways and limit them in the index-tuple size or not?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




pgsql-hackers by date:

Previous
From: eisentrp@csis.gvsu.edu
Date:
Subject: Re: PostgreSQL & the BSD License
Next
From: eisentrp@csis.gvsu.edu
Date:
Subject: Re: Re: [SQL] Re: [GENERAL] lztext and compression ratios...