Thread: update on TOAST status

update on TOAST status

From
JanWieck@t-online.de (Jan Wieck)
Date:
FYI,
   during the day I committed a couple of changes to TOAST.
   -   Secondary  relations for the toaster (to move off values)       are now automatically created during CREATE
TABLE, ALTER       TABLE  ...  ADD  COLUMN and SELECT ... INTO, whenever the       first toastable attribute appears in
thetable schema.
 
   -   The TOAST tables are now of kind RELKIND_TOASTVALUE.
   -   TOAST tables cannot  be  vacuumed  separately.  They  are       allways  vacuumend if their master table is,
whileVACUUM       still holds the lock on the master table.
 
   -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.       Statistics for them are needless because the
toastaccess       is allways hardcoded indexed.
 
   -   TOAST tables are protected against manual INSERT,  UPDATE       and  DELETE  operations.  SELECT  is  still
possiblefor       debugging purposes.  The  name  of  the  TOAST  table  is       pg_toast_<oid-of-master>.
 
   -   The  chunk_data attribute has been changed to type bytea.
   For now, "lztext" is the  only  test  candidate  datatype  to   invoke  the  toaster.  It can hold up to
multi-megabytesnow.   But be warned, this datatype will disappear as soon as "text"   is toastable.
 
   Next I'll make pg_dump TOAST-safe. Will only take a couple of   minutes I think.
   Toast  tables  aren't  automatically   created   for   system   catalogs.  Thus I'll add
       ALTER TABLE pg_rewrite CREATE TOAST TABLE
   to  initdb.  So we'll get unlimited view complexity for free.   As soon as  arrays  are  toastable,  we  might  want
to  add   pg_class because of relacl too.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status

From
Jeffery Collins
Date:
Jan Wieck wrote:

> FYI,
>
>     For now, "lztext" is the  only  test  candidate  datatype  to
>     invoke  the  toaster.  It can hold up to multi-megabytes now.
>     But be warned, this datatype will disappear as soon as "text"
>     is toastable.
>

I have not been following the TOAST discussion, but why would lztext
disappear?  It seems like a useful datatype independent of TOAST or not
TOAST?

Jeff




Re: update on TOAST status

From
JanWieck@t-online.de (Jan Wieck)
Date:
Jeffery Collins wrote:
> Jan Wieck wrote:
>
> > FYI,
> >
> >     For now, "lztext" is the  only  test  candidate  datatype  to
> >     invoke  the  toaster.  It can hold up to multi-megabytes now.
> >     But be warned, this datatype will disappear as soon as "text"
> >     is toastable.
> >
>
> I have not been following the TOAST discussion, but why would lztext
> disappear?  It seems like a useful datatype independent of TOAST or not
> TOAST?
   The  "lztext" type was something I developed before TOAST was   born. It's was a "text" type that tried to compress
thevalue   at input time.
 
   In the TOAST world, each input value will be passed around as   is.  Only when it gets down to be stored in a table
and the   resulting  heap  tuple  exceeds  2K,  the toaster will try to   compress toastable attributes and/or move off
attributes.The   behaviour  will  be  configurable  on  a per tables attribute   base. So someone can specify "don't
trycompression", "ignore   this  attribute until all others are toasted" or "never toast   this, instead fail and abort
-unwise but possible".
 
   In the current CVS sources,  "lztext"  already  doesn't  know   anything   about  compression  anymore.  It's  more
or less   equivalent to "text"  now,  where  it's  lztextin()  function   produces  a plain varlena structure like
textin()does.  Only   that all it's other functions are aware that the values  they   recieve might be toasted ones.
It'sthe toaster that does the   compression/move-off for it now.
 
   So as soon as "text" is toastable,  there  is  absolutely  no   need for "lztext" anymore. We will add an alias to
theparser   for  7.1,  which  will  disappear  in  7.2  again.   If   you   pg_dump/restore  your  databases during the
7.0->7.1upgrade,   all your table schemas will  automatically  be  changed  from   "lztext" to "text".
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status

From
Chris Bitmead
Date:
Jan Wieck wrote:

>     Toast  tables  aren't  automatically   created   for   system
>     catalogs.  Thus I'll add
> 
>         ALTER TABLE pg_rewrite CREATE TOAST TABLE
> 
>     to  initdb.  So we'll get unlimited view complexity for free.
>     As soon as  arrays  are  toastable,  we  might  want  to  add
>     pg_class because of relacl too.

Why would we want system catalogs toastable?


Re: update on TOAST status

From
Mike Mascari
Date:
Chris Bitmead wrote:
> 
> Jan Wieck wrote:
> 
> >     Toast  tables  aren't  automatically   created   for   system
> >     catalogs.  Thus I'll add
> >
> >         ALTER TABLE pg_rewrite CREATE TOAST TABLE
> >
> >     to  initdb.  So we'll get unlimited view complexity for free.
> >     As soon as  arrays  are  toastable,  we  might  want  to  add
> >     pg_class because of relacl too.
> 
> Why would we want system catalogs toastable?

I assume this will allow for Views with large rewrite rules which
currently are limited in size.

Mike Mascari


Re: update on TOAST status'

From
Bruce Momjian
Date:
>     -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.
>         Statistics for them are needless because the toast access
>         is allways hardcoded indexed.

I don't think statistics are insignificant for TOASTed columns.  If I
say col=3, the optimizer uses that information for estimating the number
of rows returned, and figuring out the type of join and order of join to
perform, not just for "use index, don't use index" decisions.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Bruce Momjian wrote:
> >     -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.
> >         Statistics for them are needless because the toast access
> >         is allways hardcoded indexed.
>
> I don't think statistics are insignificant for TOASTed columns.  If I
> say col=3, the optimizer uses that information for estimating the number
> of rows returned, and figuring out the type of join and order of join to
> perform, not just for "use index, don't use index" decisions.
   Ask  your  boys  to give you a training session for "reading"   when they go to bed tonight - and greet them from
the"police   officer" :-)
 
   I said "to TOAST tables", not "TOASTed columns".
   Their   master  tables  will  allways  have  the  statistics,   including those for toasted columns, if you ask for
them via   ANALYZE.
 
   In  normal  operation, noone would ever know if a TOAST table   is accessed during his  query  -  not  even  the
planner or   optimmizer.   It's  totally  transparent  and  the  only  one   accessing the TOAST tables is the toaster
himself -  and  he   knows what he does.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.
>> Statistics for them are needless because the toast access
>> is allways hardcoded indexed.

> I don't think statistics are insignificant for TOASTed columns.

He didn't say that!  I think what he meant is that there's no need for
statistics associated with the TOAST table itself, and AFAICS that's OK.

BTW, I have thought of a potential problem with indexes on toasted
columns.  As I understand Jan's current thinking, the idea is

1. During storage of the tuple in the main table, any oversize fields
get compressed/moved off.

2. The toasted item in the finished main tuple gets handed to the index
routines to be stored in the index.

Now, storing the toasted item in the index tuple seems fine, but what
I do not like here is the implication that all the comparisons needed
to find where to *put* the index tuple are done using a pretoasted
value.  That seems to imply dozens of redundant decompressions/fetches,
another one for each key comparison we have to do.

Jan, do you have a way around this that I missed?

One simple answer that might help for other scenarios too is to keep
a small cache of the last few values that had to be untoasted.  Maybe
we only need it for moved-off values --- it could be that decompression
is fast enough that we should just do it over rather than trying to
cache.
        regards, tom lane


Re: update on TOAST status

From
JanWieck@t-online.de (Jan Wieck)
Date:
Mike Mascari wrote:
> Chris Bitmead wrote:
> >
> > Jan Wieck wrote:
> >
> > >     Toast  tables  aren't  automatically   created   for   system
> > >     catalogs.  Thus I'll add
> > >
> > >         ALTER TABLE pg_rewrite CREATE TOAST TABLE
> > >
> > >     to  initdb.  So we'll get unlimited view complexity for free.
> > >     As soon as  arrays  are  toastable,  we  might  want  to  add
> > >     pg_class because of relacl too.
> >
> > Why would we want system catalogs toastable?
>
> I assume this will allow for Views with large rewrite rules which
> currently are limited in size.
   Absolutely correnct.
   With the code in place (after a few more fixes) I was able to   create a "SELECT *" view from  a  681  attribute
table. The   resulting  rule  is  about  170K! And more complex things are   possible too now,  because  the  rewrite
rule size  is  not   limited  any  longer (as long as you have enough CPU, ram and   disk space).
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.
> >> Statistics for them are needless because the toast access
> >> is allways hardcoded indexed.
>
> > I don't think statistics are insignificant for TOASTed columns.
>
> He didn't say that!  I think what he meant is that there's no need for
> statistics associated with the TOAST table itself, and AFAICS that's OK.
>
> BTW, I have thought of a potential problem with indexes on toasted
> columns.  As I understand Jan's current thinking, the idea is
>
> 1. During storage of the tuple in the main table, any oversize fields
> get compressed/moved off.
>
> 2. The toasted item in the finished main tuple gets handed to the index
> routines to be stored in the index.
   Right.

> Now, storing the toasted item in the index tuple seems fine, but what
> I do not like here is the implication that all the comparisons needed
> to find where to *put* the index tuple are done using a pretoasted
> value.  That seems to imply dozens of redundant decompressions/fetches,
> another one for each key comparison we have to do.
   Dozens - right.
   I just did a little gdb session tracing a
       SELECT ... WHERE toasted = 'xxx'
   The table has 151 rows and an index on 'toasted'. It needed 6   fetches of the attribute. Better than good,  because
2^6  is   only 64, so btree did a perfect job. Anyhow, in the case of a   real TOASTed (read burned) value, it'd mean 6
indexscans  to   recreate   the   on   disk   stored   representation  plus  6   decompression loops to get the plain
oneto compare  against.   What the hell would an "IN (SELECT ...)" cause?
 

> Jan, do you have a way around this that I missed?
>
> One simple answer that might help for other scenarios too is to keep
> a small cache of the last few values that had to be untoasted.  Maybe
> we only need it for moved-off values --- it could be that decompression
> is fast enough that we should just do it over rather than trying to
> cache.
   I'm  still argueing that indexing huge values is a hint for a   misleading schema.  If  this  is  true,  propagating
toasted   attributes  into indices is a dead end street and I'd have to   change  the  heap-access<->toaster  interface
so  that   the   modified  (stored)  main tuple isn't visible to the following   code (that does the index inserts).
 
   What is the value of supporting index tuples >2K? Support  of   braindead schemas? I can live withoout it!


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> > >     -   VACUUM  doesn't  propagate  ANALYZE  to   TOAST   tables.
> > >         Statistics for them are needless because the toast access
> > >         is allways hardcoded indexed.
> >
> > I don't think statistics are insignificant for TOASTed columns.  If I
> > say col=3, the optimizer uses that information for estimating the number
> > of rows returned, and figuring out the type of join and order of join to
> > perform, not just for "use index, don't use index" decisions.
> 
>     Ask  your  boys  to give you a training session for "reading"
>     when they go to bed tonight - and greet them from the "police
>     officer" :-)

Sure.

> 
>     I said "to TOAST tables", not "TOASTed columns".
> 
>     Their   master  tables  will  allways  have  the  statistics,
>     including those for toasted columns, if you ask for them  via
>     ANALYZE.
> 
>     In  normal  operation, noone would ever know if a TOAST table
>     is accessed during his  query  -  not  even  the  planner  or
>     optimmizer.   It's  totally  transparent  and  the  only  one
>     accessing the TOAST tables is the toaster himself  -  and  he
>     knows what he does.
> 

Oh, sure, got it.  It is the toast table that doesn't need stats.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
> Tom Lane wrote:
>> One simple answer that might help for other scenarios too is to keep
>> a small cache of the last few values that had to be untoasted.  Maybe
>> we only need it for moved-off values --- it could be that decompression
>> is fast enough that we should just do it over rather than trying to
>> cache.

>     I'm  still argueing that indexing huge values is a hint for a
>     misleading schema.  If  this  is  true,  propagating  toasted
>     attributes  into indices is a dead end street and I'd have to
>     change  the  heap-access<->toaster  interface  so  that   the
>     modified  (stored)  main tuple isn't visible to the following
>     code (that does the index inserts).

But you'll notice that is *not* what I suggested.  A detoasted-value
cache could be useful in more situations than just an index lookup.
I don't necessarily say we've got to have it in 7.1, but let's keep
the idea in mind in case we start finding there is a bottleneck here.

>     What is the value of supporting index tuples >2K?

If you're toasting the whole main tuple down to <2K, you might find
yourself toasting individual fields that are a good bit less than
that.  So I don't think indexing a toasted value will be all that
unusual.

But this is all speculation for now.  Let's get it working bulletproof
for 7.1, and then worry about speedups after we know they are needed.
        regards, tom lane


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> One simple answer that might help for other scenarios too is to keep
> >> a small cache of the last few values that had to be untoasted.  Maybe
> >> we only need it for moved-off values --- it could be that decompression
> >> is fast enough that we should just do it over rather than trying to
> >> cache.
>
> >     I'm  still argueing that indexing huge values is a hint for a
> >     misleading schema.  If  this  is  true,  propagating  toasted
> >     attributes  into indices is a dead end street and I'd have to
> >     change  the  heap-access<->toaster  interface  so  that   the
> >     modified  (stored)  main tuple isn't visible to the following
> >     code (that does the index inserts).
>
> But you'll notice that is *not* what I suggested.  A detoasted-value
   Haven't missed it in the first read - of course.

> cache could be useful in more situations than just an index lookup.
> I don't necessarily say we've got to have it in 7.1, but let's keep
> the idea in mind in case we start finding there is a bottleneck here.
>
> >     What is the value of supporting index tuples >2K?
>
> If you're toasting the whole main tuple down to <2K, you might find
> yourself toasting individual fields that are a good bit less than
> that.  So I don't think indexing a toasted value will be all that
> unusual.
   Exactly that's why I'm asking if we wouldn't be better off by   limiting index tuples to  (blocksize  -  overhead)
/ 4  and   allways store plain, untoasted values in indices.
 
   I've asked now a couple of times "who really has the need for   indexing huge values"? All responses I got so  far
where of   the  kind  "would  be nice if we support it" or "I don't like   such restrictions".  But noone really said
"Ineed it".
 

> But this is all speculation for now.  Let's get it working bulletproof
> for 7.1, and then worry about speedups after we know they are needed.
   Let me speculate too a little.
   The experience I have up to now is that the saved  time  from   requiring less blocks in the buffer cache outweights
thecost   of decompression.  Especially with our algorithm, because  it   is byte oriented (instead of huffman coding
beeingbased on a   bit  stream),  causing   it   to   be   extremely   fast   on   decompression.  And  the  technique
ofmoving off values from   the main heap causes the main tuples to be much  smaller.  As   long  as  the  toasted
valuesaren't used in qualification or   joining,  only  their  references  move  around  through  the   various
executor steps, and only those values that are part   of the final result set need to be fetched when sending  them
tothe client.
 
   Given  a  limited  amount  of  total memory available for one   running postmaster, we save alot of disk I/O  and
hold more   values in their compressed format in the shared buffers. With   the limit on total memory, the size of the
buffercache  must   be  lowered  by the size of the new detoasted cache, and that   only if we make it shared too.
Givenfurther an  average  of   50% compression ratio (what's not unlikely with typical input   like html pages), one
cacheddetoasted  value  would  require   two compressed ones to go away.
 
   Wouldn't  really  surprise me if we gain speed from it in the   average query.  Even  if  some  operations  might
slow down   (sorting on maybe toasted fields).
 
   We  need  to  see some results and wait for reports for this.   But we know already that it can cause  trouble  with
indexed   fields,  because  these are likely to be used for comparision   during scans. So do we want to  have  indices
storing  plain   values allways and limit them in the index-tuple size or not?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     ... So do we want to  have  indices  storing  plain
>     values allways and limit them in the index-tuple size or not?

I think not: it will be seen as a robustness failure, even (or
especially) if it doesn't happen often.  I can see the bug reports now:
"Hey!  I tried to insert a long value in my field, and it didn't work!
I thought you'd fixed this bug?"

You make good arguments that we shouldn't be too concerned about the
speed of access to toasted index values, and I'm willing to accept
that point of view (at least till we have hard evidence about it).
But when I say "it should be bulletproof" I mean it should *work*,
without imposing arbitrary limits on the user.  Arbitrary limits are
exactly what we are trying to eliminate.
        regards, tom lane


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >     ... So do we want to  have  indices  storing  plain
> >     values allways and limit them in the index-tuple size or not?
>
> I think not: it will be seen as a robustness failure, even (or
> especially) if it doesn't happen often.  I can see the bug reports now:
> "Hey!  I tried to insert a long value in my field, and it didn't work!
> I thought you'd fixed this bug?"
>
> You make good arguments that we shouldn't be too concerned about the
> speed of access to toasted index values, and I'm willing to accept
> that point of view (at least till we have hard evidence about it).
> But when I say "it should be bulletproof" I mean it should *work*,
> without imposing arbitrary limits on the user.  Arbitrary limits are
> exactly what we are trying to eliminate.
   After debugging something I thought was a bug in the toaster,   I've found something really causing headaches.
   TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES!
   It appears that brtee indices (at least) can keep  references   to  old  toast  values  that  survive  a  VACUUM!
Seemsthese   references live in nodes actually not  referring  to  a  heap   tuple   any   more,   but   used  during
tree traversal  in   comparisions.  As if an index tuple delete from a  btree  not   necessarily  causes  the  index
value to disappear from the   btree completely.  It'll never be returned by an index  scan,   but the value is still
theresomewhere.
 
   Everything  is  OK  with this up to a VACUUM run. The toaster   uses SnapShotAny to fetch toast values. So an
external value   can  be  fetched by the toaster even if it is already deleted   and committed. If  he  has  a
reference somewhere,  he  has   allways   a  share  or  higher  lock  on  the  main  relation   preventing VACUUM to
mangleup the toast  relation  (I  moved   toast relation vacuuming into the lock time of the main table   recently).
 
   But in the above case it is already vacuumed and not  present   any  more. Now the btree traversal needs to compare
againsta   value, long gone to the bit heaven, and that cannot work with   the toast architecture.
 
   Seems  the  designs of btree and toast are colliding. As soon   as "text" is toastable, this'll hurt - be warned.
   AFAICS, we need to detoast values for index  inserts  allways   and  have  another toaster inside the index access
methodsat   some day.  In the meantime we should decide  a  safe  maximum   index tuple size and emit an explanative
errormessage on the   attempt to insert oversized index entries instead of possibly   corrupting the index.
 
   Comment!


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Philip Warner
Date:
At 14:02 11/07/00 +0200, Jan Wieck wrote:
>    AFAICS, we need to detoast values for index  inserts  allways
>    and  have  another toaster inside the index access methods at
>    some day.

We might not need it...at least not in the furst pass.


>    In the meantime we should decide  a  safe  maximum
>    index tuple size and emit an explanative error message on the
>    attempt to insert oversized index entries instead of possibly
>    corrupting the index.

Can I suggest that we also put out a warning when defining an index using a
field with a (potentially) unlimited size? Indexing a text field will
mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a
text field (via substr) might not be.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Philip Warner wrote:
> At 14:02 11/07/00 +0200, Jan Wieck wrote:
> >    AFAICS, we need to detoast values for index  inserts  allways
> >    and  have  another toaster inside the index access methods at
> >    some day.
>
> We might not need it...at least not in the furst pass.
   The  thing  is  actually  broken  and needs a fix. As soon as   "text" is toastable, it can happen everywhere  that
text is   toasted  even  if it's actual plain value would perfectly fit   into an index tuple. Think of a table with 20
text  columns,   where  the  indexed  one  has  a  1024 bytes value, while all   others hold 512 bytes. In that case,
theindexed one  is  the   biggest  and  get's  toasted first. And if all the data is of   nature that compression
doesn'tgain enough, it  might  still   be the biggest one after that step and will be considered for   move off ...
boom.
   We can't let this in in the first pass!

> >    In the meantime we should decide  a  safe  maximum
> >    index tuple size and emit an explanative error message on the
> >    attempt to insert oversized index entries instead of possibly
> >    corrupting the index.
>
> Can I suggest that we also put out a warning when defining an index using a
> field with a (potentially) unlimited size? Indexing a text field will
> mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a
> text field (via substr) might not be.
   Marking it BOLD somewhere in the release  notes,  the  CREATE   INDEX  doc  and  some  other  places should be
enough.Such a   message at every CREATE INDEX is annoying.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Philip Warner
Date:
At 14:38 11/07/00 +0200, Jan Wieck wrote:
>> Can I suggest that we also put out a warning when defining an index using a
>> field with a (potentially) unlimited size? Indexing a text field will
>> mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a
>> text field (via substr) might not be.
>
>    Marking it BOLD somewhere in the release  notes,  the  CREATE
>    INDEX  doc  and  some  other  places should be enough. Such a
>    message at every CREATE INDEX is annoying.

The suggestion was only if the index contained a text, lztext etc field,
but no problem. The way I read your suggestion was that I'd get a real
error when doing an insert if the text was too large.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Philip Warner wrote:
> At 14:38 11/07/00 +0200, Jan Wieck wrote:
> >> Can I suggest that we also put out a warning when defining an index using a
> >> field with a (potentially) unlimited size? Indexing a text field will
> >> mostly be a bizarre thing to do, but, eg, indexing the first 255 chars of a
> >> text field (via substr) might not be.
> >
> >    Marking it BOLD somewhere in the release  notes,  the  CREATE
> >    INDEX  doc  and  some  other  places should be enough. Such a
> >    message at every CREATE INDEX is annoying.
>
> The suggestion was only if the index contained a text, lztext etc field,
> but no problem. The way I read your suggestion was that I'd get a real
> error when doing an insert if the text was too large.
   Yes, that's what I'm after. It's too fragile IMHO to check on   multi column indices with char(n) or so  if
resulting index   tuples will fit in the future.
 
   The  atttypmod  field  on NUMERIC columns for example doesn't   tell the easy way how big the internal
representation might   grow.  And  what  about variable size user defined types that   are marked toastable?  Can you
estimatethe maximum  internal   storage size for them?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Philip Warner
Date:
At 15:08 11/07/00 +0200, Jan Wieck wrote:
>
>    The  atttypmod  field  on NUMERIC columns for example doesn't
>    tell the easy way how big the internal  representation  might
>    grow.  And  what  about variable size user defined types that
>    are marked toastable?  Can you estimate the maximum  internal
>    storage size for them?
>

Well, uncompressed size would be a good upper estimate, since you may be
passed already compressed data...



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     After debugging something I thought was a bug in the toaster,
>     I've found something really causing headaches.
>     TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES!
>     It appears that brtee indices (at least) can keep  references
>     to  old  toast  values  that  survive  a  VACUUM! Seems these
>     references live in nodes actually not  referring  to  a  heap
>     tuple   any   more,   but   used  during  tree  traversal  in
>     comparisions.  As if an index tuple delete from a  btree  not
>     necessarily  causes  the  index  value  to disappear from the
>     btree completely.  It'll never be returned by an index  scan,
>     but the value is still there somewhere.

Oooh, nasty.  Probably the keys you are looking at are in upper-
level btree pages and indicate the ranges of keys found in lower
pages, rather than being pointers to real tuples.

One answer is to rebuild indexes from scratch during VACUUM,
before we vacuum the TOAST relation.  We've been talking about
doing that for a long time.  Maybe it's time to bite the bullet
and do it.  (Of course that means fixing the relation-versioning
problem, which it seems we don't have a consensus on yet...)

>     Seems  the  designs of btree and toast are colliding. As soon
>     as "text" is toastable, this'll hurt - be warned.

Text *is* marked toastable in current CVS...

>     AFAICS, we need to detoast values for index  inserts  allways
>     and  have  another toaster inside the index access methods at
>     some day.  In the meantime we should decide  a  safe  maximum
>     index tuple size and emit an explanative error message on the
>     attempt to insert oversized index entries instead of possibly
>     corrupting the index.

I don't like that --- seems it would put a definite crimp in the
whole point of TOAST, which is not to have arbitrary limits on field
sizes.
        regards, tom lane


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >     After debugging something I thought was a bug in the toaster,
> >     I've found something really causing headaches.
> >     TOAST AS IS IS NOT CAPABLE OF HOLDING INDEXED VALUES!
> >     It appears that brtee indices (at least) can keep  references
> >     to  old  toast  values  that  survive  a  VACUUM! Seems these
> >     references live in nodes actually not  referring  to  a  heap
> >     tuple   any   more,   but   used  during  tree  traversal  in
> >     comparisions.  As if an index tuple delete from a  btree  not
> >     necessarily  causes  the  index  value  to disappear from the
> >     btree completely.  It'll never be returned by an index  scan,
> >     but the value is still there somewhere.
>
> Oooh, nasty.  Probably the keys you are looking at are in upper-
> level btree pages and indicate the ranges of keys found in lower
> pages, rather than being pointers to real tuples.
   So  our  btree  implementation  is  closer  to  an  ISAM file   organization than to a real tree? Anyway, either one
or  the   other  is  the  reason  that an attempt to insert a new value   results in an lztext_cmp() call that cannot
be resolved  due   to a missing toast value.
 
   I  added  some  checks  to  the  detoaster  just  to throw an   elog(ERROR) instead of a coredump  in  such  a  case
earlier   today.
 

> One answer is to rebuild indexes from scratch during VACUUM,
> before we vacuum the TOAST relation.  We've been talking about
> doing that for a long time.  Maybe it's time to bite the bullet
> and do it.  (Of course that means fixing the relation-versioning
> problem, which it seems we don't have a consensus on yet...)
   Doesn't  matter if we do it before or after, because the main   heap shouldn't contain any more toast references  to
deleted   (later to be vacuumed) toast entries at that time.
 
   Anyway,  it's  a nice idea that should solve the problem. For   indices, which can allways be rebuilt from the heap
data, I   don't  see  such  a  big need for the versioning. Only that a   partially rebuilt index (rebuild crashed in
themiddle) needs   another  vacuum  before  the  the DB is accessible again. How   often does that happen?
 
   So why not having vacuum truncating the index  file  to  zero   and rebuilding it from scratch in place? Can anyone
accessan   index while vacuum has a lock on it's heap?
 

>
> >     Seems  the  designs of btree and toast are colliding. As soon
> >     as "text" is toastable, this'll hurt - be warned.
>
> Text *is* marked toastable in current CVS...
   Whow - haven't noticed.
   Will run my tests against text ... parallel. Does it have any   impact  on the regression test execution time? Does
anytoast   table (that should now be there in the regression  DB)  loose   it's zero size during the tests?
 

>
> >     AFAICS, we need to detoast values for index  inserts  allways
> >     and  have  another toaster inside the index access methods at
> >     some day.  In the meantime we should decide  a  safe  maximum
> >     index tuple size and emit an explanative error message on the
> >     attempt to insert oversized index entries instead of possibly
> >     corrupting the index.
>
> I don't like that --- seems it would put a definite crimp in the
> whole point of TOAST, which is not to have arbitrary limits on field
> sizes.
   If we can solve it, let's do so. If we cannot, let's restrict   it for 7.1.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
> Tom Lane wrote:
>> One answer is to rebuild indexes from scratch during VACUUM,
>> before we vacuum the TOAST relation.  We've been talking about
>> doing that for a long time.  Maybe it's time to bite the bullet
>> and do it.  (Of course that means fixing the relation-versioning
>> problem, which it seems we don't have a consensus on yet...)

>     Doesn't  matter if we do it before or after, because the main
>     heap shouldn't contain any more toast references  to  deleted
>     (later to be vacuumed) toast entries at that time.

No, we must fix the indexes first, so that they contain no bogus
values if we fail while vacuuming the TOAST relation.

>     Anyway,  it's  a nice idea that should solve the problem. For
>     indices, which can allways be rebuilt from the heap  data,  I
>     don't  see  such  a  big need for the versioning. Only that a
>     partially rebuilt index (rebuild crashed in the middle) needs
>     another  vacuum  before  the  the DB is accessible again. How
>     often does that happen?

If it happens just once on one of your system-table indices, you
won't be happy.  We've sweated hard to make VACUUM crash-safe,
and I don't want to throw that away because of TOAST.

>> Text *is* marked toastable in current CVS...

>     Whow - haven't noticed.

>     Will run my tests against text ... parallel. Does it have any
>     impact  on the regression test execution time? Does any toast
>     table (that should now be there in the regression  DB)  loose
>     it's zero size during the tests?

Yes, there are some nonzero-size toast files in there.  Haven't
tried to run any timing tests...
        regards, tom lane


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
tOM lANE wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> One answer is to rebuild indexes from scratch during VACUUM,
> >> before we vacuum the TOAST relation.  We've been talking about
> >> doing that for a long time.  Maybe it's time to bite the bullet
> >> and do it.  (Of course that means fixing the relation-versioning
> >> problem, which it seems we don't have a consensus on yet...)
>
> >     Doesn't  matter if we do it before or after, because the main
> >     heap shouldn't contain any more toast references  to  deleted
> >     (later to be vacuumed) toast entries at that time.
>
> No, we must fix the indexes first, so that they contain no bogus
> values if we fail while vacuuming the TOAST relation.
   Got me.

> >     Anyway,  it's  a nice idea that should solve the problem. For
> >     indices, which can allways be rebuilt from the heap  data,  I
> >     don't  see  such  a  big need for the versioning. Only that a
> >     partially rebuilt index (rebuild crashed in the middle) needs
> >     another  vacuum  before  the  the DB is accessible again. How
> >     often does that happen?
>
> If it happens just once on one of your system-table indices, you
> won't be happy.  We've sweated hard to make VACUUM crash-safe,
> and I don't want to throw that away because of TOAST.
   Alternatively  we  could go for both methods. Does any system   catalog have an index on a varlena field? If not, we
coulddo   the  classic vacuum on anything that is either a catalog or a   table that doesn't have a toast relation.
Then do  the  lazy   reindex from scratch on anything left.
 

>
> >> Text *is* marked toastable in current CVS...
>
> >     Whow - haven't noticed.
>
> >     Will run my tests against text ... parallel. Does it have any
> >     impact  on the regression test execution time? Does any toast
> >     table (that should now be there in the regression  DB)  loose
> >     it's zero size during the tests?
>
> Yes, there are some nonzero-size toast files in there.  Haven't
> tried to run any timing tests...
   No,  there  aren't.  All you've seen are their indices of 16K   each.  But my tests, formerly using lztext, ran
smooth with   text.
 
   I've  looked at textout() and, well, your style of detoasting   arguments looks alot better and easier.  From  the
way it's   implemented  I  assume  the  per tuple memory context is done   too, no?
 


Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     I've  looked at textout() and, well, your style of detoasting
>     arguments looks alot better and easier.  From  the  way  it's
>     implemented  I  assume  the  per tuple memory context is done
>     too, no?

Not yet --- I'm running regress tests on it right now, though.
You're right that I'm assuming the function routines can leak
memory without trouble.

(We might need to avoid leaks in the comparison routines that are used
for indexes, but otherwise I think this scheme will work comfortably.)
        regards, tom lane


Re: update on TOAST status'

From
JanWieck@t-online.de (Jan Wieck)
Date:
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> >     I've  looked at textout() and, well, your style of detoasting
> >     arguments looks alot better and easier.  From  the  way  it's
> >     implemented  I  assume  the  per tuple memory context is done
> >     too, no?
>
> Not yet --- I'm running regress tests on it right now, though.
> You're right that I'm assuming the function routines can leak
> memory without trouble.
>
> (We might need to avoid leaks in the comparison routines that are used
> for indexes, but otherwise I think this scheme will work comfortably.)
   That sounds bad. At least not very good.
   So we better add a PG_FREEARG_xxx(ptr, argno) macro that does   the pfree if the pointer is different from  the  one
in  the   argument.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: update on TOAST status'

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>> (We might need to avoid leaks in the comparison routines that are used
>> for indexes, but otherwise I think this scheme will work comfortably.)

>     That sounds bad. At least not very good.

>     So we better add a PG_FREEARG_xxx(ptr, argno) macro that does
>     the pfree if the pointer is different from  the  one  in  the
>     argument.

Yes, I already borrowed that idea from your original code.  I don't
like it a whole lot, but as long as the need for it is confined to
the indexable comparison operators I think we can tolerate it.

The alternative is to hack up the index search routines (and also
tuplesort.c, and perhaps other places?) to maintain a short-term memory
context for evaluating comparison operators, and reset said context
fairly frequently.  That might be doable but I haven't yet looked into
what it would take.

I'm hoping to commit what I have this evening...
        regards, tom lane