Thread: Planned changes to pg_am catalog
Since I am about to add a "bulk delete" routine to the index access method APIs for concurrent VACUUM, I need to add a column to pg_am to define the associated procedure for each index AM. This seems like a fine time to clean up some of the other outstanding TODO items for pg_am: 1. Add boolean columns that indicate the following for each AM:* Does it support UNIQUE indexes?* Does it support multicolumnindexes?* Does it handle its own locking (as opposed to expecting the executor to obtain an index-wide lock)? This will eliminate ugly hardcoded tests on index AM oid's in various places. 2. Remove the "deprecated" columns, which aren't doing anything except wasting space. 3. Alter the index_build code so that we don't have duplicate code in each index AM for scanning the parent relation. I'm envisioning that index.c would provide a routine IndexBuildHeapScan() that does the basic heap scan, testing of partial-index predicate, etc, and the calls back an index-AM-specific routine (which it's handed as a function pointer) for each tuple that should be added to the index. A void pointer would also be passed through to let the callback routine have access to working state of the AM-specific index_build procedure. (IndexBuildHeapScan would replace the currently-unused DefaultBuild routine in index.c, which is mostly the same code it needs anyway.) The index AM's index_build procedure would do initial setup, call IndexBuildHeapScan, and then do any finishing-up processing needed. Note that this doesn't address Oleg's concerns about haskeytype, lossiness, etc. AFAICS those issues are not related to the contents of pg_am. Later on, I am going to have some proposals for altering pg_opclass and related tables to deal with those issues... Comments? Any other festering problems in this immediate area? regards, tom lane
Is there any way to backtrack from an OID to tell what table included that row (like some secret incantation from the system tables)? -- Naomi Walker Chief Information Officer Eldorado Computing, Inc. 602-604-3100 ext 242
Naomi Walker <nwalker@eldocomp.com> writes: > Is there any way to backtrack from an OID to tell what table included that > row (like some secret incantation from the system tables)? Nope, sorry. There's very little magic about OIDs at all; they're just values from a sequence. regards, tom lane
On Fri, 13 Jul 2001, Tom Lane wrote: > Since I am about to add a "bulk delete" routine to the index access > method APIs for concurrent VACUUM, I need to add a column to pg_am > to define the associated procedure for each index AM. This seems like > a fine time to clean up some of the other outstanding TODO items for > pg_am: > > Note that this doesn't address Oleg's concerns about haskeytype, > lossiness, etc. AFAICS those issues are not related to the contents > of pg_am. Later on, I am going to have some proposals for altering > pg_opclass and related tables to deal with those issues... Any chance you'd untie a knot for our development in 7.2 development cycle ? Our code for multikey GiST, Btree is more or less complete and work with ugly workaround, and the only thing we need is a solution of the problem with index_formtuple. > > Comments? Any other festering problems in this immediate area? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > Any chance you'd untie a knot for our development in 7.2 development > cycle ? I am trying to focus on getting concurrent VACUUM done, because I think that's a "must do" for 7.2. I hope to have some time during August to deal with your GIST issues, but they are definitely lower down on the priority list for me. regards, tom lane
... however, if you want to do some of the legwork yourself, here are the ideas I had about what to do: pg_opclass should have, not just one row for each distinct opclass name, but one row for each supported combination of index AM and opclass name. Doing it this way would allow us to put additional info in pg_opclass rows --- right now, they're not really able to carry much information. The particular bit of info I want to add is a "keytype" column. If this is not InvalidOid then it gives the OID of the index column datatype to be used when this opclass is selected. For keytype to be different from data type, the amproc entries associated with the opclass would need to include a conversion routine to produce the index value given the input data columns --- ie, what the GIST code calls a compression routine. (In essence, this would be a form of functional index, no?) Possibly pg_opclass should also include the amprocnum of the conversion routine; not sure how that ought to be handled. Note that this change would have a number of implications for the indexing of not only pg_opclass, but pg_amop and pg_amproc as well. In particular, pg_amop could lose its amopid column, and pg_amproc its amid column, since the opclass OID would be sufficient to indicate which index AM is meant for any row in these tables. I have not worked out all the details, but I believe that these tables would become a lot more understandable this way. As for lossiness, I'm inclined to remove that column from pg_index altogether. Instead, it should be a column in pg_amop, indicating that an index must be treated as lossy *for a particular operator in a particular opclass*. Per previous discussion, this is the right level for the concept. AFAIR, we could drop the WITH clause from CREATE INDEX altogether if we did this, which I think is the right thing --- the user should not be responsible for telling the system the properties of an index type and opclass. If you have time to start working out the details, that'd be great. I won't have time for it before mid-August probably. regards, tom lane
> Note that this doesn't address Oleg's concerns about haskeytype, > lossiness, etc. AFAICS those issues are not related to the contents > of pg_am. Later on, I am going to have some proposals for altering > pg_opclass and related tables to deal with those issues... > > Comments? Any other festering problems in this immediate area? As part of my DROP CONSTRAINT stuff I've been fiddling with, I've found it necessary to write an 'IsIndex' function. At the moment, all it does is return true if the named index exists on the named relation and is unique (or primary, or neither, or any). I think it would be very nice to have an all-purpose function with a definition something like this: bool IsIndex(Relation rel, const char *indname, int type, List attrs); Where type could be: 0 - any 1 - normal 2 - unique 3 - primary And attrs, if not null, indicates that true should only be returned if the index is over the given list of attributes (in the given order). I guess the function would assume that the necessary lock is acquired on the relation from outside the function. I think there's _lots_ of places in the code where index existence checks are performed and this could prevent vast code-duplication... Chris
On Sat, 14 Jul 2001, Tom Lane wrote: > ... however, if you want to do some of the legwork yourself, here are > the ideas I had about what to do: OK. We'll dig into problem in august. At least we'll try. How many possible problems would arise after changing of pg_opclass ? Does existing code will handle this change somewhat automagically or we have to find and modify relevant code ? > > pg_opclass should have, not just one row for each distinct opclass name, > but one row for each supported combination of index AM and opclass name. > Doing it this way would allow us to put additional info in pg_opclass > rows --- right now, they're not really able to carry much information. > The particular bit of info I want to add is a "keytype" column. If this > is not InvalidOid then it gives the OID of the index column datatype to > be used when this opclass is selected. For keytype to be different from > data type, the amproc entries associated with the opclass would need to > include a conversion routine to produce the index value given the input > data columns --- ie, what the GIST code calls a compression routine. > (In essence, this would be a form of functional index, no?) Possibly > pg_opclass should also include the amprocnum of the conversion routine; > not sure how that ought to be handled. compress/decompress isn't a type conversion. for example, gist__int*_ops. indexed values and keytype are both int4 one dimensional arrays and compress/decompress in this case do some real work. > > Note that this change would have a number of implications for the > indexing of not only pg_opclass, but pg_amop and pg_amproc as well. > In particular, pg_amop could lose its amopid column, and pg_amproc > its amid column, since the opclass OID would be sufficient to indicate > which index AM is meant for any row in these tables. I have not worked > out all the details, but I believe that these tables would become a lot > more understandable this way. > > As for lossiness, I'm inclined to remove that column from pg_index > altogether. Instead, it should be a column in pg_amop, indicating that > an index must be treated as lossy *for a particular operator in a > particular opclass*. Per previous discussion, this is the right level > for the concept. AFAIR, we could drop the WITH clause from CREATE INDEX > altogether if we did this, which I think is the right thing --- the user > should not be responsible for telling the system the properties of an > index type and opclass. > > If you have time to start working out the details, that'd be great. > I won't have time for it before mid-August probably. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > How many possible problems would arise after changing of pg_opclass ? > Does existing code will handle this change somewhat automagically > or we have to find and modify relevant code ? There's a fair amount of code that would need to be touched. One thing I realized just last night is that some routines use the tables to ask questions like "is this operator OID a member of any btree opclass, and if so which opclass and strategy number?" This is a relatively simple sequential scan over the pg_amop table at the moment. But if the amid column were removed, it'd require a join with pg_opclass, which might be good from the point of view of normalization theory but is a bit of a pain in the neck to program in low-level code. It might also be nice if we could use an index instead of a seq scan (although pg_amop is not so large that this is essential). So all the places that touch these tables need to be identified, and a design invented that doesn't make any of them unreasonably complex. Possibly we should leave the amid column in pg_amop, ie, deliberately keep the tables unnormalized, to make some of these lookups easier. > compress/decompress isn't a type conversion. for example, > gist__int*_ops. indexed values and keytype are both int4 one dimensional > arrays and compress/decompress in this case do some real work. Okay, so the presence of a non-null keytype field should indicate that a conversion routine is to be invoked, even if it's the same type as the underlying datatype. regards, tom lane
Is this all addresssed? > On Sat, 14 Jul 2001, Tom Lane wrote: > > > ... however, if you want to do some of the legwork yourself, here are > > the ideas I had about what to do: > > OK. We'll dig into problem in august. At least we'll try. > How many possible problems would arise after changing of pg_opclass ? > Does existing code will handle this change somewhat automagically > or we have to find and modify relevant code ? > > > > > pg_opclass should have, not just one row for each distinct opclass name, > > but one row for each supported combination of index AM and opclass name. > > Doing it this way would allow us to put additional info in pg_opclass > > rows --- right now, they're not really able to carry much information. > > The particular bit of info I want to add is a "keytype" column. If this > > is not InvalidOid then it gives the OID of the index column datatype to > > be used when this opclass is selected. For keytype to be different from > > data type, the amproc entries associated with the opclass would need to > > include a conversion routine to produce the index value given the input > > data columns --- ie, what the GIST code calls a compression routine. > > (In essence, this would be a form of functional index, no?) Possibly > > pg_opclass should also include the amprocnum of the conversion routine; > > not sure how that ought to be handled. > > compress/decompress isn't a type conversion. for example, > gist__int*_ops. indexed values and keytype are both int4 one dimensional > arrays and compress/decompress in this case do some real work. > > > > > > Note that this change would have a number of implications for the > > indexing of not only pg_opclass, but pg_amop and pg_amproc as well. > > In particular, pg_amop could lose its amopid column, and pg_amproc > > its amid column, since the opclass OID would be sufficient to indicate > > which index AM is meant for any row in these tables. I have not worked > > out all the details, but I believe that these tables would become a lot > > more understandable this way. > > > > As for lossiness, I'm inclined to remove that column from pg_index > > altogether. Instead, it should be a column in pg_amop, indicating that > > an index must be treated as lossy *for a particular operator in a > > particular opclass*. Per previous discussion, this is the right level > > for the concept. AFAIR, we could drop the WITH clause from CREATE INDEX > > altogether if we did this, which I think is the right thing --- the user > > should not be responsible for telling the system the properties of an > > index type and opclass. > > > > If you have time to start working out the details, that'd be great. > > I won't have time for it before mid-August probably. > > > > regards, tom lane > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026