Thread: Applying TOAST to CURRENT

Applying TOAST to CURRENT

From
JanWieck@t-online.de (Jan Wieck)
Date:
Hi,
   now  that we have the branch for 7.0, I could apply my actual   work on TOAST to the CURRENT development tree.
Before doing   so, I'd like to discuss some related details.
 
   1.  In  the  actual  version, the lztext datatype is stripped       down to something more similar to text (does not
compress      on input). So it is kinda toastable base type for testing       purposes created at initdb time.
 
       The pg_rules catalog still uses it, just that the toaster       is  now  responsible  to  do  the  compression
work. No       problems so far with that.
 
       In the long run I think lztext will disappear  completely       again (it was supposed to be). Does anybody see
aproblem       with abuse of this type during development?
 
   2.  I've added another ALTER  TABLE  command  to  create  the       external storage table for a relation. The
syntaxis
 
           ALTER TABLE tablename CREATE TOAST TABLE;
       Up  to  that,  toastable  types (lztext only yet) will be       compressed, but the INSERT  still  fails  if
compression      isn't enough to make a tuple fit.
 
       We  haven't  decided yet how/when to create the secondary       relation and it's index. Since we  intend  to
make base       types like text and varchar by default toastable, I don't       think that "if a tables schema contains
toastable types"       is  a  good  enough reason to create them silently. There       might exists tons of  tables  in
a  schema,  that  don't       require it.
 
       OTOH  I  don't  think  it's  a good thing to try creating       these things on  the  fly  the  first  time
needed. The       required catalog changes and file creations introduce all       kinds of possible rollback/crash
problems,that we  don't       want to have here - do we?
 
   3.  Tom,  we  don't  have  a consensus how to merge the TOAST       related function changes with the fmgr changes
upto now.       Which base type specific functions will be touched due to       fmgr changes right now?
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Applying TOAST to CURRENT

From
Bruce Momjian
Date:
> Hi,
> 
>     now  that we have the branch for 7.0, I could apply my actual
>     work on TOAST to the CURRENT development tree.  Before  doing
>     so, I'd like to discuss some related details.
> 
>     1.  In  the  actual  version, the lztext datatype is stripped
>         down to something more similar to text (does not compress
>         on input). So it is kinda toastable base type for testing
>         purposes created at initdb time.
> 
>         The pg_rules catalog still uses it, just that the toaster
>         is  now  responsible  to  do  the  compression  work.  No
>         problems so far with that.
> 
>         In the long run I think lztext will disappear  completely
>         again (it was supposed to be). Does anybody see a problem
>         with abuse of this type during development?

Sounds fine.

>     2.  I've added another ALTER  TABLE  command  to  create  the
>         external storage table for a relation. The syntax is
> 
>             ALTER TABLE tablename CREATE TOAST TABLE;
> 
>         Up  to  that,  toastable  types (lztext only yet) will be
>         compressed, but the INSERT  still  fails  if  compression
>         isn't enough to make a tuple fit.
> 
>         We  haven't  decided yet how/when to create the secondary
>         relation and it's index. Since we  intend  to  make  base
>         types like text and varchar by default toastable, I don't
>         think that "if a tables schema contains toastable  types"
>         is  a  good  enough reason to create them silently. There
>         might exists tons of  tables  in  a  schema,  that  don't
>         require it.
> 
>         OTOH  I  don't  think  it's  a good thing to try creating
>         these things on  the  fly  the  first  time  needed.  The
>         required catalog changes and file creations introduce all
>         kinds of possible rollback/crash problems, that we  don't
>         want to have here - do we?

Well, we could print the message suggesing ALTER TABLE when printing
tuple too large.  Frankly, I don't see a problem in creating the backup
table automatically.  If you are worried about performance, how about
putting it in a subdirectory.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Applying TOAST to CURRENT

From
JanWieck@t-online.de (Jan Wieck)
Date:
Bruce Momjian wrote:
> >         OTOH  I  don't  think  it's  a good thing to try creating
> >         these things on  the  fly  the  first  time  needed.  The
> >         required catalog changes and file creations introduce all
> >         kinds of possible rollback/crash problems, that we  don't
> >         want to have here - do we?
>
> Well, we could print the message suggesing ALTER TABLE when printing
> tuple too large.  Frankly, I don't see a problem in creating the backup
> table automatically.  If you are worried about performance, how about
> putting it in a subdirectory.
   It's  the toast-table and the index. So it's 2 Inodes and 16K   per table.  If the backend is compiled with -g,
someoneneeds   to create about 500 tables to waste the same amount of space.
 
   Well, I like the subdirectory idea. I only  wonder  how  that   should be implemented (actually the tablename is the
filename  - and that doesn't allow / in it).
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Applying TOAST to CURRENT

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     3.  Tom,  we  don't  have  a consensus how to merge the TOAST
>         related function changes with the fmgr changes up to now.
>         Which base type specific functions will be touched due to
>         fmgr changes right now?

For functions that need their inputs de-toasted, I think that the
changes you need should be a free byproduct of the fmgr changes.
I'd recommend we make those changes first, and then in a cleanup pass
you can modify anything that is able to work on still-toasted input.

I can't really do much with updating any varlena datatypes until
there's a version of heap_tuple_untoast_attr() somewhere in the
system --- if you look at src/include/fmgr.h, you'll see the call
is already there:

/* use this if you want the raw, possibly-toasted input datum: */
#define PG_GETARG_RAW_VARLENA_P(n)  ((struct varlena *) PG_GETARG_POINTER(n))
/* use this if you want the input datum de-toasted: */
#define PG_GETARG_VARLENA_P(n)  \(VARATT_IS_EXTENDED(PG_GETARG_RAW_VARLENA_P(n)) ?  \ (struct varlena *)
heap_tuple_untoast_attr((varattrib*) PG_GETARG_RAW_VARLENA_P(n)) :  \ PG_GETARG_RAW_VARLENA_P(n))
 
/* GETARG macros for varlena types will typically look like this: */
#define PG_GETARG_TEXT_P(n) ((text *) PG_GETARG_VARLENA_P(n))

BTW, it would save some casting if heap_tuple_untoast_attr were declared
to accept and return "struct varlena *" ...

Anyway, as soon as that code links to something that works, let me know
and I'll make a pass over the "text" functions.  That should give you
something to test with.
        regards, tom lane


Re: Applying TOAST to CURRENT

From
Bruce Momjian
Date:
> Bruce Momjian wrote:
> > >         OTOH  I  don't  think  it's  a good thing to try creating
> > >         these things on  the  fly  the  first  time  needed.  The
> > >         required catalog changes and file creations introduce all
> > >         kinds of possible rollback/crash problems, that we  don't
> > >         want to have here - do we?
> >
> > Well, we could print the message suggesing ALTER TABLE when printing
> > tuple too large.  Frankly, I don't see a problem in creating the backup
> > table automatically.  If you are worried about performance, how about
> > putting it in a subdirectory.
> 
>     It's  the toast-table and the index. So it's 2 Inodes and 16K
>     per table.  If the backend is compiled with -g, someone needs
>     to create about 500 tables to waste the same amount of space.
> 
>     Well, I like the subdirectory idea. I only  wonder  how  that
>     should be implemented (actually the tablename is the filename
>     - and that doesn't allow / in it).

Not sure.  It will take some tricks, I am sure.   How about if we add
some TOAST option to CREATE TABLE, so they can create with TOAST support
rather than having to use ALTER every time.  Maybe that would work.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Applying TOAST to CURRENT

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>>>> OTOH  I  don't  think  it's  a good thing to try creating
>>>> these things on  the  fly  the  first  time  needed.  The
>>>> required catalog changes and file creations introduce all
>>>> kinds of possible rollback/crash problems, that we  don't
>>>> want to have here - do we?

AFAIK we are pretty solid on rolling back table creation, it's just
rename/drop that have problems.  A worse problem is what if two
backends both decide they need to create the toast table at the same
time.  That might be fixable with appropriate locking but it seems
like there'd be potential for deadlocks.

> Bruce Momjian wrote:
>> Well, we could print the message suggesing ALTER TABLE when printing
>> tuple too large.  Frankly, I don't see a problem in creating the backup
>> table automatically.  If you are worried about performance, how about
>> putting it in a subdirectory.

I agree with Bruce --- the toast table should be created automatically,
at least if the table contains any potentially-toastable columns.  We
want this to be as transparent as possible.  I'd rather have auto create
on-the-fly when first needed, but if that seems too risky then let's
just make the table when its owning table is created.

If you want to control it with an ALTER TABLE function, let's add ALTER
TABLE DROP TOAST so that admins who don't like the excess space usage
can get rid of it.  (Of course that should only succeed after verifying
the toast table is empty...)
        regards, tom lane


Re: Applying TOAST to CURRENT

From
The Hermit Hacker
Date:
On Tue, 30 May 2000, Tom Lane wrote:

> JanWieck@t-online.de (Jan Wieck) writes:
> >>>> OTOH  I  don't  think  it's  a good thing to try creating
> >>>> these things on  the  fly  the  first  time  needed.  The
> >>>> required catalog changes and file creations introduce all
> >>>> kinds of possible rollback/crash problems, that we  don't
> >>>> want to have here - do we?
> 
> AFAIK we are pretty solid on rolling back table creation, it's just
> rename/drop that have problems.  A worse problem is what if two
> backends both decide they need to create the toast table at the same
> time.  That might be fixable with appropriate locking but it seems
> like there'd be potential for deadlocks.
> 
> > Bruce Momjian wrote:
> >> Well, we could print the message suggesing ALTER TABLE when printing
> >> tuple too large.  Frankly, I don't see a problem in creating the backup
> >> table automatically.  If you are worried about performance, how about
> >> putting it in a subdirectory.
> 
> I agree with Bruce --- the toast table should be created automatically,
> at least if the table contains any potentially-toastable columns.  We
> want this to be as transparent as possible.  I'd rather have auto create
> on-the-fly when first needed, but if that seems too risky then let's
> just make the table when its owning table is created.

have to third this one ... I think it should be totally transparent to the
admin/user ... just create it when the table is created, what's the worst
case scenario?  it never gets used and you waste 16k of disk space?



Re: Applying TOAST to CURRENT

From
Bruce Momjian
Date:
> I agree with Bruce --- the toast table should be created automatically,
> at least if the table contains any potentially-toastable columns.  We
> want this to be as transparent as possible.  I'd rather have auto create
> on-the-fly when first needed, but if that seems too risky then let's
> just make the table when its owning table is created.
> 
> If you want to control it with an ALTER TABLE function, let's add ALTER
> TABLE DROP TOAST so that admins who don't like the excess space usage
> can get rid of it.  (Of course that should only succeed after verifying
> the toast table is empty...)

But when you vacuum a table, doesn't it get zero size?  Sure works here:
#$ cd /u/pg/data/base/test#$ ls -l kkk*-rw-------  1 postgres  postgres  0 May 30 12:20 kkk#$ 

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Applying TOAST to CURRENT

From
Hannu Krosing
Date:
The Hermit Hacker wrote:
> 
> On Tue, 30 May 2000, Tom Lane wrote:
> >
> > I agree with Bruce --- the toast table should be created automatically,
> > at least if the table contains any potentially-toastable columns.  We
> > want this to be as transparent as possible.  I'd rather have auto create
> > on-the-fly when first needed, but if that seems too risky then let's
> > just make the table when its owning table is created.
> 
> have to third this one ... I think it should be totally transparent to the
> admin/user ... just create it when the table is created, what's the worst
> case scenario?  it never gets used and you waste 16k of disk space?

You dont even use 16k if toast tables are like ordinary tables (which I 
guess they are). New empty tables seem to occupy 0k.

So I'm also for immediate creation of tost tables for all tables that 
require them, either at create (if there are any toastable columns in 
the create clause) or at alter table time if first toestable column is 
added after initial create.

The only drawback is bloating directories, but it was already suggested
that 
TOAST tables could/should be kept in subdirectory toast (as should
indexes 
too, imho).

And the most widespread database in the world does it too ;) 
(dBASE and its derivates)

--------
Hannu


Re: Applying TOAST to CURRENT

From
The Hermit Hacker
Date:
On Tue, 30 May 2000, Hannu Krosing wrote:

> The Hermit Hacker wrote:
> > 
> > On Tue, 30 May 2000, Tom Lane wrote:
> > >
> > > I agree with Bruce --- the toast table should be created automatically,
> > > at least if the table contains any potentially-toastable columns.  We
> > > want this to be as transparent as possible.  I'd rather have auto create
> > > on-the-fly when first needed, but if that seems too risky then let's
> > > just make the table when its owning table is created.
> > 
> > have to third this one ... I think it should be totally transparent to the
> > admin/user ... just create it when the table is created, what's the worst
> > case scenario?  it never gets used and you waste 16k of disk space?
> 
> You dont even use 16k if toast tables are like ordinary tables (which I 
> guess they are). New empty tables seem to occupy 0k.
> 
> So I'm also for immediate creation of tost tables for all tables that 
> require them, either at create (if there are any toastable columns in 
> the create clause) or at alter table time if first toestable column is 
> added after initial create.
> 
> The only drawback is bloating directories, but it was already suggested
> that 
> TOAST tables could/should be kept in subdirectory toast (as should
> indexes 
> too, imho).

still say, simplest "fix":
<dbname>/{system,db,toast,index}




Re: Applying TOAST to CURRENT

From
Hannu Krosing
Date:
The Hermit Hacker wrote:
> 
> On Tue, 30 May 2000, Hannu Krosing wrote:
> 
> > The only drawback is bloating directories, but it was already suggested
> > that
> > TOAST tables could/should be kept in subdirectory toast (as should
> > indexes
> > too, imho).
> 
> still say, simplest "fix":
> 
>         <dbname>/{system,db,toast,index}

Why can't we just add a column named "tablepath" to pg_table, that can
either be 
a simple filename, or relative path with a filename or even full path 
(if we don't worry too much for security ;)

That has came up before when discussing ways to make rename table
rollbackable
but it could be handy here two. 

AFAIK it has been a general principle in programming to keep separate
things 
separate unless a very good reason not to do so is present.

-----------
Hannu


Re: Applying TOAST to CURRENT

From
JanWieck@t-online.de (Jan Wieck)
Date:
The Hermit Hacker wrote:
>
> have to third this one ... I think it should be totally transparent to the
> admin/user ... just create it when the table is created, what's the worst
> case scenario?  it never gets used and you waste 16k of disk space?
>
   Not exactly.
   I've  made  some  good  experiences  with  having the toaster   trying to keep the main tuple size below 1/4 of
MaxTupleSize  (BLKSIZE  -  block  header).   Remember  that external stored   attributes are only fetched from the
secondary relation  if   really  needed  (when the result set is sent to the client or   if explicitly used in the
query).So in a usual case, where a   relatively  small  amount of the entire data is retrieved and   key attributes are
small,it's a win. With this  config  more   main tuples fit into one block, and if the attributes used in   the WHERE
clause aren't  stored  external,  the  result  set   (including  sort  and  group  actions)  can be collected with
fewerblock reads. Only those big  values,  that  the  client   really wanted, have to be fetched at send time.
 
   If  no  external  table  exists, the toaster will try the <2K   thing by compression only. If the resulting tuple
fits into   the  8K  limit,  it's OK. But if a secondary relation exists,   it'll store external to make the tuple <2K.
Thus, a 4K or 6K   tuple,  that  actually  fits  and would be stored in the main   table, will cause the toaster to
jumpin if we allways create   the secondary table.
 
   Hmmm - thinking about that it doesn't sound bad if we allways   create a secondary relation at CREATE TABLE time,
butNOT the   index  for  it.  And at VACUUM time we create the index if it   doesn't exist AND there is external stored
data.
   The table is prepared for external  storage  allways  and  we   avoid  the  risks  from  creating  tables  in
possiblylater   aborting transactions or due to concurrency  issues.  But  we   don't  waste  the  index space for
reallyallways-small-tuple   tables.
 
   Another benefit would  be,  that  reloads  should  be  faster   because  with  this  technique,  the  toaster
doesn'tneed to   insert index tuples during the load. The indices are  created   later at VACUUM after reload.
 
   The  toaster  needs  to  use sequential scans on the external   table until the next vacuum  run,  but  index  usage
allways   depends on vacuum so that's not a real issue from my PoV.
 
   At least a transparent compromise - isn't it?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




Re: Applying TOAST to CURRENT

From
Tom Lane
Date:
JanWieck@t-online.de (Jan Wieck) writes:
>     Hmmm - thinking about that it doesn't sound bad if we allways
>     create a secondary relation at CREATE TABLE time, but NOT the
>     index  for  it.  And at VACUUM time we create the index if it
>     doesn't exist AND there is external stored data.

Don't much like that --- what if the user doesn't run vacuum for
a good long while?  Could be doing a lot of sequential scans over
a pretty large toast file...

If the 16K for an empty btree index really bothers you, let's
attack that head-on.  I don't see why a freshly created index
couldn't be zero bytes, and the metadata page gets created on
first store into the index.

>     The  toaster  needs  to  use sequential scans on the external
>     table until the next vacuum  run,  but  index  usage  allways
>     depends on vacuum so that's not a real issue from my PoV.

What makes you say that?  Indexes will be used on a never-vacuumed
table with the current planner.
        regards, tom lane


Re: Applying TOAST to CURRENT

From
Hannu Krosing
Date:
Jan Wieck wrote:
> 
> The Hermit Hacker wrote:
> >
> > have to third this one ... I think it should be totally transparent to the
> > admin/user ... just create it when the table is created, what's the worst
> > case scenario?  it never gets used and you waste 16k of disk space?
> >
> 
>     Not exactly.
> 
>     I've  made  some  good  experiences  with  having the toaster
>     trying to keep the main tuple size below 1/4 of  MaxTupleSize
>     (BLKSIZE  -  block  header). 

Can't _that_ behaviour be made modifyable by some setting ?

>     Remember  that external stored
>     attributes are only fetched from the  secondary  relation  if
>     really  needed  (when the result set is sent to the client or
>     if explicitly used in the query). So in a usual case, where a
>     relatively  small  amount of the entire data is retrieved and
>     key attributes are small, it's a win. With this  config  more
>     main tuples fit into one block, and if the attributes used in
>     the WHERE clause  aren't  stored  external,  the  result  set
>     (including  sort  and  group  actions)  can be collected with
>     fewer block reads. Only those big  values,  that  the  client
>     really wanted, have to be fetched at send time.

What is the priority of checks on indexed fetch?

I mean if we do "SELECT * FROM ttable WHERE toasted LIKE 'ab%' "

DO we first scan by index to 'ab%', then check if tuple is live and 
after that to the LIKE comparison ?

Would it not be faster in toast case to use the already retrieved 
index data and check that first, before going to main table (not to 
mention the TOAST table)

>     If  no  external  table  exists, the toaster will try the <2K
>     thing by compression only. If the resulting tuple  fits  into
>     the  8K  limit,  it's OK. 

Would it not be faster/cleaner to check some configuration variable 
than the existance of toest table ?

>     But if a secondary relation exists,
>     it'll store external to make the tuple <2K.  Thus, a 4K or 6K
>     tuple,  that  actually  fits  and would be stored in the main
>     table, will cause the toaster to jump in if we allways create
>     the secondary table.

Do our current (btree/hash) indexes support toast ?

If not, will they ?

> 
>     Hmmm - thinking about that it doesn't sound bad if we allways
>     create a secondary relation at CREATE TABLE time, but NOT the
>     index  for  it.  And at VACUUM time we create the index if it
>     doesn't exist AND there is external stored data.

Is there a plan to migrate to some combined index/database table for 
at least toast tables later ? 

For at least toast tables it seems feasible to start using the 
originally planned tuple-spanning mechanisms, unless we plan 
migrating LOs to toast table at some point which would make index-less 
tuple chaining a bad idea as it would make seeking on really large 
LOs slow. 

>     The table is prepared for external  storage  allways  and  we
>     avoid  the  risks  from  creating  tables  in  possibly later
>     aborting transactions or due to concurrency  issues.  But  we
>     don't  waste  the  index space for really allways-small-tuple
>     tables.

That could perhaps be done for other tables too, ie CREATE INDEX 
would not actually create index until VACUUM notices that table is 
big enough to make use of that index ?

On second thought that seems not a good idea to me ;(

> 
>     Another benefit would  be,  that  reloads  should  be  faster
>     because  with  this  technique,  the  toaster doesn't need to
>     insert index tuples during the load. The indices are  created
>     later at VACUUM after reload.

AFAIK reloads (from pg_dump at least) create indexes after LOAD'ing data

>     The  toaster  needs  to  use sequential scans on the external
>     table until the next vacuum  run,  but  index  usage  allways
>     depends on vacuum so that's not a real issue from my PoV.
> 
>     At least a transparent compromise - isn't it?

But do we need it ?

I suspect there are other issues that need your attention more than 
complicating table creation to save a few kb ;)

Creating toast tables still wastes only 1MB per 64 tables _that have 
toastable columns_, which seems real cheap considering today's HD
prices.

You would need 6400 toast tables to consume 1% of the smallest currently 
available (10GB) disk.

If that is a concern this can probably be cured by good docs that say 
in detail which datatypes cause toast tables an which don't.

-----------
Hannu


Re: Applying TOAST to CURRENT

From
JanWieck@t-online.de (Jan Wieck)
Date:
Hannu Krosing wrote:
> >     I've  made  some  good  experiences  with  having the toaster
> >     trying to keep the main tuple size below 1/4 of  MaxTupleSize
> >     (BLKSIZE  -  block  header).
>
> Can't _that_ behaviour be made modifyable by some setting ?
   Good point.
   There  is  already  a fine tuning option per table attribute,   where someone can tell things like "forget about
compression  for  this  attribute" or "try keeping in main tuple and toast   others first".  Theres  no  utility
command up  to  now  to   customize them, but an UPDATE pg_attribute does it already.
 
   Seems another value in pg_class, telling the toaster what max   size to try, would be a good idea.

> What is the priority of checks on indexed fetch?
>
> I mean if we do "SELECT * FROM ttable WHERE toasted LIKE 'ab%' "
>
> DO we first scan by index to 'ab%', then check if tuple is live and
> after that to the LIKE comparison ?
   That's the current behaviour, and TOAST doesn't change it.
   There was discussion  already  about  index  tuple  toasting.   Indices have different size constraints and other
featuresso   they cannot share exactly the same toasting  scheme  as  heap   tuples.
 
   I'm  still  not  sure if supporting indices on huge values is   worth the efford. Many databases have some limit on
the size   of index entries, and noone seems to really care for that.
 

> >     If  no  external  table  exists, the toaster will try the <2K
> >     thing by compression only. If the resulting tuple  fits  into
> >     the  8K  limit,  it's OK.
>
> Would it not be faster/cleaner to check some configuration variable
> than the existance of toest table ?
   The  toast  tables and indexes OID are stored in pg_class. An   open Relation has reference to  the  pg_class  row,
so it's   simply comparing that to INVALID_OID. No wasted time here.
 

> Do our current (btree/hash) indexes support toast ?
   Not  hard  tested  yet.  At  least,  they don't support it if   toasting would be required to make the index tuple
fit, but   the heap toaster is already happy with it.
 
   The tuple is modified in place at heap_insert(). So the later   index_insert() will use the Datums found there to
build the   index  tuples,  either plain or toast reference, whatever the   toaster left.
 

>
> If not, will they ?
   Not planned for 7.1. Maybe we  can  workout  a  solution  for   unlimited index entries after that.

> >     Hmmm - thinking about that it doesn't sound bad if we allways
> >     create a secondary relation at CREATE TABLE time, but NOT the
> >     index  for  it.  And at VACUUM time we create the index if it
> >     doesn't exist AND there is external stored data.
>
> Is there a plan to migrate to some combined index/database table for
> at least toast tables later ?
   No.  But  we  plan  a general overwriting storage manager, so   that might not be an issue at all.

> For at least toast tables it seems feasible to start using the
> originally planned tuple-spanning mechanisms, unless we plan
> migrating LOs to toast table at some point which would make index-less
> tuple chaining a bad idea as it would make seeking on really large
> LOs slow.
   I've never seen a complete proposal for  tuple-spanning.  The   toaster  breaks  up  the  large Datum into chunks.
Thereis a   chunk number, so modifying the index to be a  multi-attribute   one  would  gain  direct access to a chunk.
That should make   seeks reasonably fast.
 

> >     Another benefit would  be,  that  reloads  should  be  faster
> >     because  with  this  technique,  the  toaster doesn't need to
> >     insert index tuples during the load. The indices are  created
> >     later at VACUUM after reload.
>
> AFAIK reloads (from pg_dump at least) create indexes after LOAD'ing data
   Finally the toast table will have another relkind,  so  it'll   not  be  accessible  by normal SQL. The toaster acts
onthese   tables more hardwired like on system catalogs. It  expects  a   fixed  schema  and  uses  direct  heap
access. Due  to  the   different relkind, a dump wouldn't be able to delay the index   creation.
 

> But do we need it ?
>
> [...]
>
> You would need 6400 toast tables to consume 1% of the smallest currently
> available (10GB) disk.
>
> If that is a concern this can probably be cured by good docs that say
> in detail which datatypes cause toast tables an which don't.
   We plan to make ALL variable size builtin types toastable. So   this list would name them all :-).
   But this 6400 = 1% really is the point.  Let's  forget  about   the  16K  and  create the toast table allways (as
soonas the   main table has toastable attributes).
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #