Thread: Planned changes to pg_am catalog

Planned changes to pg_am catalog

From
Tom Lane
Date:
Since I am about to add a "bulk delete" routine to the index access
method APIs for concurrent VACUUM, I need to add a column to pg_am
to define the associated procedure for each index AM.  This seems like
a fine time to clean up some of the other outstanding TODO items for
pg_am:

1. Add boolean columns that indicate the following for each AM:* Does it support UNIQUE indexes?* Does it support
multicolumnindexes?* Does it handle its own locking (as opposed to expecting  the executor to obtain an index-wide
lock)?
This will eliminate ugly hardcoded tests on index AM oid's in various
places.

2. Remove the "deprecated" columns, which aren't doing anything except
wasting space.

3. Alter the index_build code so that we don't have duplicate code in
each index AM for scanning the parent relation.  I'm envisioning that
index.c would provide a routine IndexBuildHeapScan() that does the basic
heap scan, testing of partial-index predicate, etc, and the calls back
an index-AM-specific routine (which it's handed as a function pointer)
for each tuple that should be added to the index.  A void pointer would
also be passed through to let the callback routine have access to
working state of the AM-specific index_build procedure.
(IndexBuildHeapScan would replace the currently-unused DefaultBuild
routine in index.c, which is mostly the same code it needs anyway.)
The index AM's index_build procedure would do initial setup, call
IndexBuildHeapScan, and then do any finishing-up processing needed.


Note that this doesn't address Oleg's concerns about haskeytype,
lossiness, etc.  AFAICS those issues are not related to the contents
of pg_am.  Later on, I am going to have some proposals for altering
pg_opclass and related tables to deal with those issues...

Comments?  Any other festering problems in this immediate area?
        regards, tom lane


OID question

From
Naomi Walker
Date:
Is there any way to backtrack from an OID to tell what table included that 
row (like some secret incantation from the system tables)?
--
Naomi Walker
Chief Information Officer
Eldorado Computing, Inc.
602-604-3100  ext 242 



Re: OID question

From
Tom Lane
Date:
Naomi Walker <nwalker@eldocomp.com> writes:
> Is there any way to backtrack from an OID to tell what table included that 
> row (like some secret incantation from the system tables)?

Nope, sorry.  There's very little magic about OIDs at all; they're just
values from a sequence.
        regards, tom lane


Re: Planned changes to pg_am catalog

From
Oleg Bartunov
Date:
On Fri, 13 Jul 2001, Tom Lane wrote:

> Since I am about to add a "bulk delete" routine to the index access
> method APIs for concurrent VACUUM, I need to add a column to pg_am
> to define the associated procedure for each index AM.  This seems like
> a fine time to clean up some of the other outstanding TODO items for
> pg_am:
>
> Note that this doesn't address Oleg's concerns about haskeytype,
> lossiness, etc.  AFAICS those issues are not related to the contents
> of pg_am.  Later on, I am going to have some proposals for altering
> pg_opclass and related tables to deal with those issues...

Any chance you'd untie a knot for our development in 7.2 development
cycle ? Our code for multikey GiST, Btree is more or less complete
and work with ugly workaround, and the only thing we need is a
solution of the problem with index_formtuple.

>
> Comments?  Any other festering problems in this immediate area?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Planned changes to pg_am catalog

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> Any chance you'd untie a knot for our development in 7.2 development
> cycle ?

I am trying to focus on getting concurrent VACUUM done, because I think
that's a "must do" for 7.2.  I hope to have some time during August to
deal with your GIST issues, but they are definitely lower down on the
priority list for me.
        regards, tom lane


Re: Planned changes to pg_am catalog

From
Tom Lane
Date:
... however, if you want to do some of the legwork yourself, here are
the ideas I had about what to do:

pg_opclass should have, not just one row for each distinct opclass name,
but one row for each supported combination of index AM and opclass name.
Doing it this way would allow us to put additional info in pg_opclass
rows --- right now, they're not really able to carry much information.
The particular bit of info I want to add is a "keytype" column.  If this
is not InvalidOid then it gives the OID of the index column datatype to
be used when this opclass is selected.  For keytype to be different from
data type, the amproc entries associated with the opclass would need to
include a conversion routine to produce the index value given the input
data columns --- ie, what the GIST code calls a compression routine.
(In essence, this would be a form of functional index, no?)  Possibly
pg_opclass should also include the amprocnum of the conversion routine;
not sure how that ought to be handled.

Note that this change would have a number of implications for the
indexing of not only pg_opclass, but pg_amop and pg_amproc as well.
In particular, pg_amop could lose its amopid column, and pg_amproc
its amid column, since the opclass OID would be sufficient to indicate
which index AM is meant for any row in these tables.  I have not worked
out all the details, but I believe that these tables would become a lot
more understandable this way.

As for lossiness, I'm inclined to remove that column from pg_index
altogether.  Instead, it should be a column in pg_amop, indicating that
an index must be treated as lossy *for a particular operator in a
particular opclass*.  Per previous discussion, this is the right level
for the concept.  AFAIR, we could drop the WITH clause from CREATE INDEX
altogether if we did this, which I think is the right thing --- the user
should not be responsible for telling the system the properties of an
index type and opclass.

If you have time to start working out the details, that'd be great.
I won't have time for it before mid-August probably.
        regards, tom lane


RE: Planned changes to pg_am catalog

From
"Christopher Kings-Lynne"
Date:
> Note that this doesn't address Oleg's concerns about haskeytype,
> lossiness, etc.  AFAICS those issues are not related to the contents
> of pg_am.  Later on, I am going to have some proposals for altering
> pg_opclass and related tables to deal with those issues...
>
> Comments?  Any other festering problems in this immediate area?

As part of my DROP CONSTRAINT stuff I've been fiddling with, I've found it
necessary to write an 'IsIndex' function.  At the moment, all it does is
return true if the named index exists on the named relation and is unique
(or primary, or neither, or any).

I think it would be very nice to have an all-purpose function with a
definition something like this:

bool IsIndex(Relation rel, const char *indname, int type, List attrs);

Where type could be:

0 - any
1 - normal
2 - unique
3 - primary

And attrs, if not null, indicates that true should only be returned if the
index is over the given list of attributes (in the given order).

I guess the function would assume that the necessary lock is acquired on the
relation from outside the function.

I think there's _lots_ of places in the code where index existence checks
are performed and this could prevent vast code-duplication...

Chris



Re: Planned changes to pg_am catalog

From
Oleg Bartunov
Date:
On Sat, 14 Jul 2001, Tom Lane wrote:

> ... however, if you want to do some of the legwork yourself, here are
> the ideas I had about what to do:

OK. We'll dig into problem in august. At least we'll try.
How many possible problems would arise after changing of pg_opclass ?
Does existing code will handle this change somewhat automagically
or we have to find and modify relevant code ?

>
> pg_opclass should have, not just one row for each distinct opclass name,
> but one row for each supported combination of index AM and opclass name.
> Doing it this way would allow us to put additional info in pg_opclass
> rows --- right now, they're not really able to carry much information.
> The particular bit of info I want to add is a "keytype" column.  If this
> is not InvalidOid then it gives the OID of the index column datatype to
> be used when this opclass is selected.  For keytype to be different from
> data type, the amproc entries associated with the opclass would need to
> include a conversion routine to produce the index value given the input
> data columns --- ie, what the GIST code calls a compression routine.
> (In essence, this would be a form of functional index, no?)  Possibly
> pg_opclass should also include the amprocnum of the conversion routine;
> not sure how that ought to be handled.

compress/decompress isn't a type conversion. for example,
gist__int*_ops. indexed values and keytype are both int4 one dimensional
arrays and compress/decompress in this case do some real work.


>
> Note that this change would have a number of implications for the
> indexing of not only pg_opclass, but pg_amop and pg_amproc as well.
> In particular, pg_amop could lose its amopid column, and pg_amproc
> its amid column, since the opclass OID would be sufficient to indicate
> which index AM is meant for any row in these tables.  I have not worked
> out all the details, but I believe that these tables would become a lot
> more understandable this way.
>
> As for lossiness, I'm inclined to remove that column from pg_index
> altogether.  Instead, it should be a column in pg_amop, indicating that
> an index must be treated as lossy *for a particular operator in a
> particular opclass*.  Per previous discussion, this is the right level
> for the concept.  AFAIR, we could drop the WITH clause from CREATE INDEX
> altogether if we did this, which I think is the right thing --- the user
> should not be responsible for telling the system the properties of an
> index type and opclass.
>
> If you have time to start working out the details, that'd be great.
> I won't have time for it before mid-August probably.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: Planned changes to pg_am catalog

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> How many possible problems would arise after changing of pg_opclass ?
> Does existing code will handle this change somewhat automagically
> or we have to find and modify relevant code ?

There's a fair amount of code that would need to be touched.  One thing
I realized just last night is that some routines use the tables to ask
questions like "is this operator OID a member of any btree opclass, and
if so which opclass and strategy number?"  This is a relatively simple
sequential scan over the pg_amop table at the moment.  But if the amid
column were removed, it'd require a join with pg_opclass, which might be
good from the point of view of normalization theory but is a bit of a
pain in the neck to program in low-level code.  It might also be nice if
we could use an index instead of a seq scan (although pg_amop is not so
large that this is essential).  So all the places that touch these
tables need to be identified, and a design invented that doesn't make
any of them unreasonably complex.

Possibly we should leave the amid column in pg_amop, ie, deliberately
keep the tables unnormalized, to make some of these lookups easier.

> compress/decompress isn't a type conversion. for example,
> gist__int*_ops. indexed values and keytype are both int4 one dimensional
> arrays and compress/decompress in this case do some real work.

Okay, so the presence of a non-null keytype field should indicate that a
conversion routine is to be invoked, even if it's the same type as the
underlying datatype.
        regards, tom lane


Re: Planned changes to pg_am catalog

From
Bruce Momjian
Date:
Is this all addresssed?

> On Sat, 14 Jul 2001, Tom Lane wrote:
> 
> > ... however, if you want to do some of the legwork yourself, here are
> > the ideas I had about what to do:
> 
> OK. We'll dig into problem in august. At least we'll try.
> How many possible problems would arise after changing of pg_opclass ?
> Does existing code will handle this change somewhat automagically
> or we have to find and modify relevant code ?
> 
> >
> > pg_opclass should have, not just one row for each distinct opclass name,
> > but one row for each supported combination of index AM and opclass name.
> > Doing it this way would allow us to put additional info in pg_opclass
> > rows --- right now, they're not really able to carry much information.
> > The particular bit of info I want to add is a "keytype" column.  If this
> > is not InvalidOid then it gives the OID of the index column datatype to
> > be used when this opclass is selected.  For keytype to be different from
> > data type, the amproc entries associated with the opclass would need to
> > include a conversion routine to produce the index value given the input
> > data columns --- ie, what the GIST code calls a compression routine.
> > (In essence, this would be a form of functional index, no?)  Possibly
> > pg_opclass should also include the amprocnum of the conversion routine;
> > not sure how that ought to be handled.
> 
> compress/decompress isn't a type conversion. for example,
> gist__int*_ops. indexed values and keytype are both int4 one dimensional
> arrays and compress/decompress in this case do some real work.
> 
> 
> >
> > Note that this change would have a number of implications for the
> > indexing of not only pg_opclass, but pg_amop and pg_amproc as well.
> > In particular, pg_amop could lose its amopid column, and pg_amproc
> > its amid column, since the opclass OID would be sufficient to indicate
> > which index AM is meant for any row in these tables.  I have not worked
> > out all the details, but I believe that these tables would become a lot
> > more understandable this way.
> >
> > As for lossiness, I'm inclined to remove that column from pg_index
> > altogether.  Instead, it should be a column in pg_amop, indicating that
> > an index must be treated as lossy *for a particular operator in a
> > particular opclass*.  Per previous discussion, this is the right level
> > for the concept.  AFAIR, we could drop the WITH clause from CREATE INDEX
> > altogether if we did this, which I think is the right thing --- the user
> > should not be responsible for telling the system the properties of an
> > index type and opclass.
> >
> > If you have time to start working out the details, that'd be great.
> > I won't have time for it before mid-August probably.
> >
> >             regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
> 
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026