Thread: Improving REINDEX for system indexes (long)
I've been looking at the issues involved in reindexing system tables, and I now have what I think is a fairly defensible set of proposals. We should whenever possible use the same reindexing technique used by CLUSTER: assign a new relfilenode number, build the new index in that file, and apply an ordinary heap_update operation to update the index's pg_class row with the new relfilenode value. This technique is fully crash-safe and transaction-safe (meaning it can be rolled back even after completion, in case of failure later in the same transaction). However, there are several pitfalls for applying it to system indexes: 1. There is a problem for a shared index, because we have no way to update pg_class.relfilenode in all the other databases besides ours. I see no good way around this problem. 2. There is a problem for a nailed-in-cache index, because the relcache isn't prepared to cope with relfilenode updates for such indexes. However, that is fixable. 3. There is a problem for an index on pg_class itself: doing heap_update on a pg_class row must make new index entries. We have to be careful that the new row does appear in the updated index, while not making entries in old-and-possibly-broken indexes. This is doable. 4. There is a problem with updating indexes that might be used for catalog fetches executed during the index_build process: if we try to use the partially-rebuilt index for such a fetch we will fail. In the case of disaster recovery the problem doesn't exist because we'll have IsIgnoringSystemIndexes true, but for an ordinary on-line indexing operation there's definitely a risk. Presently the code relies on "deactivating indexes" to prevent this, but I think it can be done more simply, because we only need to suppress access to the target index locally in the reindexing backend. (Other backends will be locked out by means of an AccessExclusiveLock on the system catalog in question.) In short then, we can fix things so that the new-relfilenode path can be taken in all cases except shared indexes. For shared indexes, we have little choice but to truncate and rebuild the index in-place. (There is then no need to update its pg_class row at all.) This is of course horribly crash-unsafe. The code presently uses a "deactivated indexes" flag (namely, setting pg_class.relhasindex false) to provide some protection against a crash midway through an in-place reindex. However, this concept is really quite useless for a shared index, because only one database could contain the deactivation flag. Accesses from any other database would still try to use the broken index. Based on this analysis, I feel that the "deactivated indexes" mechanism is of no further value and should be retired. We should instead simply acknowledge that reindexing shared indexes is dangerous. I propose that it be allowed only from a standalone backend. Specifically I think that REINDEX INDEX and REINDEX TABLE should error out if pointed to a shared index or table, unless running in a standalone backend; REINDEX DATABASE should skip over the shared indexes unless running in a standalone backend. (I'm not convinced that either -O or -P needs to be insisted on by REINDEX, btw, but we can discuss that separately.) We'll have to warn users not to try to restart the database when reindexing of a shared table wasn't successfully completed. Details to back up the above claims: Part of the code needed to fix the relcache restriction on nailed indexes is already there, but ifdef'd out; that's the part that re-reads the index's pg_class row after receiving SI inval for it. There are some cases not yet handled though. In the first place, if the nailed index being modified is pg_class_oid_index, ScanPgRelation will try to use that same index to load the updated row, and will fail because it'll be trying to use the old relfilenode. We can easily force a seqscan in that case, however. A more subtle problem is that if an SI buffer overflow occurs, RelationCacheInvalidate walks through the relation cache in a random order. I believe that the correct update order has to be first pg_class_oid_index, then the other nailed indexes, and finally all other relation cache entries. pg_class_oid_index has to be updated first (with the aforementioned seqscan), since it will be used by ScanPgRelation to reread the pg_class rows for the other nailed indexes. Only when we are sure the nailed indexes are up-to-date can we safely rebuild other relcache entries. Assigning a new relfilenode to indexes of pg_class is not a big deal when we don't have an index-corruption problem. We simply have to make the new heap_updated row before we start index_build (which indeed the code does already); both old and new rows will be indexed in the new index, and all is well. However, if we suspect index corruption then it is important not to try to make entries into the old indexes, because we might crash trying to update a corrupt index. I propose the following behavior: REINDEX INDEX: no special action; this is not the thing to usewhen index corruption is suspected anyway. REINDEX TABLE: while operating on pg_class, arrange for therelcache's list of known indexes of pg_class to contain only theindexesalready reindexed. In this way, only those indexes willbe updated by CatalogUpdateIndexes(), and so no untrustworthyindexeswill be touched while making new pg_class rows. REINDEX DATABASE: take care to process pg_class first. To avoid catalog accesses via an index that's being rebuilt, we can simply generalize the SetReindexProcessing() state to include the OID of the index currently being rebuilt. Places that presently make checks for IsIgnoringSystemIndexes can check this as well and fall back to seqscans when the targeted index would have been used. One other point: IsIgnoringSystemIndexes seems presently to be taken to mean not only that we don't *read* the system indexes, but that we don't *write* them either. I think this is horribly dangerous; it effectively means that the only thing you can safely do in a backend started with -P is REINDEX. If you make any other changes to the system catalogs then you've re-corrupted the indexes. Also, you don't have any protection against duplicate entries getting made, since the unique indexes are disabled. It would be a lot safer to define IsIgnoringSystemIndexes as preventing the system indexes from being used for lookups, while still updating the indexes on any catalog modification. This will not affect the safety of REINDEX and will make other operations much less prone to pilot error. Comments? regards, tom lane
First it should have been discussed before your commitment or at least it should be discussed after reversing your change. I require you to explain me why you committed the change with no discussion and little investigation. I also noticed that your change for catalog/index.c Revision 1.200 / (download) - annotate - [select for diffs] , Mon Sep23 00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl Changes since 1.199: +124 -161 lines Diff to previous 1.199 Getrid of bogus use of heap_mark4update in reindex operations (cf. recent bug report). Fix processing of nailed-in-cacheindexes; it appears that REINDEX DATABASE has been broken for months :-(. removed an #ifndef ENABLE_REINDEX_NAILED_RELATIONS trial stuff. I'm very surprised to see it now. regards, Hiroshi Inoue > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > I've been looking at the issues involved in reindexing system tables, > and I now have what I think is a fairly defensible set of proposals. > > We should whenever possible use the same reindexing technique used by > CLUSTER: assign a new relfilenode number, build the new index in that > file, and apply an ordinary heap_update operation to update > the index's > pg_class row with the new relfilenode value. This technique is fully > crash-safe and transaction-safe (meaning it can be rolled > back even after > completion, in case of failure later in the same > transaction). However, > there are several pitfalls for applying it to system indexes: > > 1. There is a problem for a shared index, because we have no way to > update pg_class.relfilenode in all the other databases besides ours. > I see no good way around this problem. > > 2. There is a problem for a nailed-in-cache index, because > the relcache > isn't prepared to cope with relfilenode updates for such indexes. > However, that is fixable. > > 3. There is a problem for an index on pg_class itself: doing > heap_update > on a pg_class row must make new index entries. We have to be careful > that the new row does appear in the updated index, while not making > entries in old-and-possibly-broken indexes. This is doable. > > 4. There is a problem with updating indexes that might be > used for catalog > fetches executed during the index_build process: if we try to use the > partially-rebuilt index for such a fetch we will fail. In the case of > disaster recovery the problem doesn't exist because we'll have > IsIgnoringSystemIndexes true, but for an ordinary on-line indexing > operation there's definitely a risk. Presently the code relies on > "deactivating indexes" to prevent this, but I think it can be > done more > simply, because we only need to suppress access to the target index > locally in the reindexing backend. (Other backends will be > locked out by > means of an AccessExclusiveLock on the system catalog in question.) > > In short then, we can fix things so that the new-relfilenode > path can be > taken in all cases except shared indexes. For shared indexes, we have > little choice but to truncate and rebuild the index in-place. > (There is > then no need to update its pg_class row at all.) This is of course > horribly crash-unsafe. > > The code presently uses a "deactivated indexes" flag (namely, setting > pg_class.relhasindex false) to provide some protection against a crash > midway through an in-place reindex. However, this concept is > really quite > useless for a shared index, because only one database could > contain the > deactivation flag. Accesses from any other database would > still try to > use the broken index. > > Based on this analysis, I feel that the "deactivated indexes" > mechanism > is of no further value and should be retired. We should > instead simply > acknowledge that reindexing shared indexes is dangerous. I > propose that > it be allowed only from a standalone backend. Specifically I > think that > REINDEX INDEX and REINDEX TABLE should error out if pointed > to a shared > index or table, unless running in a standalone backend; > REINDEX DATABASE > should skip over the shared indexes unless running in a standalone > backend. (I'm not convinced that either -O or -P needs to be > insisted on > by REINDEX, btw, but we can discuss that separately.) We'll > have to warn > users not to try to restart the database when reindexing of a > shared table > wasn't successfully completed. > > Details to back up the above claims: > > Part of the code needed to fix the relcache restriction on > nailed indexes > is already there, but ifdef'd out; that's the part that re-reads the > index's pg_class row after receiving SI inval for it. There are some > cases not yet handled though. In the first place, if the nailed index > being modified is pg_class_oid_index, ScanPgRelation will try > to use that > same index to load the updated row, and will fail because > it'll be trying > to use the old relfilenode. We can easily force a seqscan in > that case, > however. A more subtle problem is that if an SI buffer > overflow occurs, > RelationCacheInvalidate walks through the relation cache in a random > order. I believe that the correct update order has to be first > pg_class_oid_index, then the other nailed indexes, and > finally all other > relation cache entries. pg_class_oid_index has to be updated > first (with > the aforementioned seqscan), since it will be used by > ScanPgRelation to > reread the pg_class rows for the other nailed indexes. Only > when we are > sure the nailed indexes are up-to-date can we safely rebuild other > relcache entries. > > Assigning a new relfilenode to indexes of pg_class is not a > big deal when > we don't have an index-corruption problem. We simply have to > make the new > heap_updated row before we start index_build (which indeed > the code does > already); both old and new rows will be indexed in the new > index, and all > is well. However, if we suspect index corruption then it is > important not > to try to make entries into the old indexes, because we might > crash trying > to update a corrupt index. I propose the following behavior: > > REINDEX INDEX: no special action; this is not the thing to use > when index corruption is suspected anyway. > > REINDEX TABLE: while operating on pg_class, arrange for the > relcache's list of known indexes of pg_class to contain only the > indexes already reindexed. In this way, only those indexes will > be updated by CatalogUpdateIndexes(), and so no untrustworthy > indexes will be touched while making new pg_class rows. > > REINDEX DATABASE: take care to process pg_class first. > > To avoid catalog accesses via an index that's being rebuilt, > we can simply > generalize the SetReindexProcessing() state to include the OID of the > index currently being rebuilt. Places that presently make checks for > IsIgnoringSystemIndexes can check this as well and fall back > to seqscans > when the targeted index would have been used. > > One other point: IsIgnoringSystemIndexes seems presently to > be taken to > mean not only that we don't *read* the system indexes, but > that we don't > *write* them either. I think this is horribly dangerous; it > effectively > means that the only thing you can safely do in a backend > started with -P > is REINDEX. If you make any other changes to the system catalogs then > you've re-corrupted the indexes. Also, you don't have any protection > against duplicate entries getting made, since the unique indexes are > disabled. It would be a lot safer to define > IsIgnoringSystemIndexes as > preventing the system indexes from being used for lookups, while still > updating the indexes on any catalog modification. This will > not affect > the safety of REINDEX and will make other operations much > less prone to > pilot error. > > Comments? > > regards, tom lane >
"Hiroshi Inoue" <inoue@tpf.co.jp> writes: > I require you to explain me why you committed the change > with no discussion and little investigation. If you want an apology for not having discussed it in advance, I'll gladly offer one. It was poorly done. I do, however, think that the reindexing code needs work. Can we get on with discussing the technical issues? regards, tom lane
On Mon, Sep 22, 2003 at 04:56:35AM +0900, Hiroshi Inoue wrote: > First it should have been discussed before your commitment or at least > it should be discussed after reversing your change. > > I require you to explain me why you committed the change > with no discussion and little investigation. > > I also noticed that your change for catalog/index.c > > Revision 1.200 / (download) - annotate - [select for diffs] , Mon Sep 23 > 00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl You know you're talking about a commit of 1 year ago? Kurt
Kurt Roeckx wrote: > > On Mon, Sep 22, 2003 at 04:56:35AM +0900, Hiroshi Inoue wrote: > > First it should have been discussed before your commitment or at least > > it should be discussed after reversing your change. > > > > I require you to explain me why you committed the change > > with no discussion and little investigation. > > > > I also noticed that your change for catalog/index.c > > > > Revision 1.200 / (download) - annotate - [select for diffs] , Mon Sep 23 > > 00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl > > You know you're talking about a commit of 1 year ago? Yes of cource. Unfortunately I haven't had enough time to check all the changes since long. This time I was lucky to find a change to spoil my effort. Is it strange for me to doubt other places ? I looked again the current code about REINDEX command and found the above change which is closely related to this topic. regards, Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/
Tom Lane wrote: > > "Hiroshi Inoue" <inoue@tpf.co.jp> writes: > > I require you to explain me why you committed the change > > with no discussion and little investigation. > > If you want an apology for not having discussed it in advance, I'll > gladly offer one. It was poorly done. Thanks. > I do, however, think that the reindexing code needs work. Can we > get on with discussing the technical issues? OK but I would need some time to remember the code. regards, Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/
I've just put back your previous change, sorry. As I already mentioned many times it must be the first thing. Though I don't remenber my code completely yet, I would reply to some points. Unfortunately REINDEX wasn't a eagerly wanted command when I implemented it. Though I wanted to introduce a per index flag, I unwillingly used an existent per table flag(relhasindex) instead. Because it was impossible to make REINDEX transaction-safe then, such flag was needed to suppress inconsistency as less as possible. I also unwillingly introduced the ReindexProcessing mode(flag) because I didn't think of other quick solutions. IIRC the main reason why I gave up the REINDEX functionality on nailed relations was the difficulty of reindexing pg_class and the handling of relcache overflow. I didn't have much time to test it. In addtion REINDEX wasn't a recognized command then and I found no one to discuss the situation. Tom Lane wrote: > > I've been looking at the issues involved in reindexing system tables, > and I now have what I think is a fairly defensible set of proposals. > > We should whenever possible use the same reindexing technique used by > CLUSTER: REINDEX was the first command which used the pg_class.relfilenode functionality. The pg_class.relfilenode was essentially my proposal. > 1. There is a problem for a shared index, because we have no way to > update pg_class.relfilenode in all the other databases besides ours. > I see no good way around this problem. So the current REINDEX disallows on-line reindex on shared relations. > 2. There is a problem for a nailed-in-cache index, because the relcache > isn't prepared to cope with relfilenode updates for such indexes. > However, that is fixable. My code works pretty good with nailed relations except pg_class #if defined (ENABLE_REINDEX_NAILED_RELATIONS). > 3. There is a problem for an index on pg_class itself: doing heap_update > on a pg_class row must make new index entries. We have to be careful > that the new row does appear in the updated index, while not making > entries in old-and-possibly-broken indexes. This is doable. Yes. Sorry I have no time to continue the discussion now. regards, Hiroshi Inouehttp://www.geocities.jp/inocchichichi/psqlodbc/
Hiroshi Inoue wrote: > instead. Because it was impossible to make REINDEX transaction-safe > then, such flag was needed to suppress inconsistency as less > as possible. This mean that the actual REINDEX is not transaction-safe ? Regards Gaetano Mendola
> -----Original Message----- > From: Gaetano Mendola [mailto:mendola@bigfoot.com] > > Hiroshi Inoue wrote: > > instead. Because it was impossible to make REINDEX transaction-safe > > then, such flag was needed to suppress inconsistency as less > > as possible. > > This mean that the actual REINDEX is not transaction-safe ? No. It was not transaction-safe long time ago. regards, Hiroshi Inoue
Hiroshi Inoue wrote: Gaetano Mendola [mailto:mendola@bigfoot.com] wrote: >> >>Hiroshi Inoue wrote: >> >>>instead. Because it was impossible to make REINDEX transaction-safe >>>then, such flag was needed to suppress inconsistency as less >>>as possible. >> >>This mean that the actual REINDEX is not transaction-safe ? > > > No. > It was not transaction-safe long time ago. Anyway I think there is a nasty bug somewhere, see my last posts about "duplication primary key". Regards Gaetano Mendola
Tom, would you summarize what REINDEX currently _doesn't_ do? --------------------------------------------------------------------------- Tom Lane wrote: > I've been looking at the issues involved in reindexing system tables, > and I now have what I think is a fairly defensible set of proposals. > > We should whenever possible use the same reindexing technique used by > CLUSTER: assign a new relfilenode number, build the new index in that > file, and apply an ordinary heap_update operation to update the index's > pg_class row with the new relfilenode value. This technique is fully > crash-safe and transaction-safe (meaning it can be rolled back even after > completion, in case of failure later in the same transaction). However, > there are several pitfalls for applying it to system indexes: > > 1. There is a problem for a shared index, because we have no way to > update pg_class.relfilenode in all the other databases besides ours. > I see no good way around this problem. > > 2. There is a problem for a nailed-in-cache index, because the relcache > isn't prepared to cope with relfilenode updates for such indexes. > However, that is fixable. > > 3. There is a problem for an index on pg_class itself: doing heap_update > on a pg_class row must make new index entries. We have to be careful > that the new row does appear in the updated index, while not making > entries in old-and-possibly-broken indexes. This is doable. > > 4. There is a problem with updating indexes that might be used for catalog > fetches executed during the index_build process: if we try to use the > partially-rebuilt index for such a fetch we will fail. In the case of > disaster recovery the problem doesn't exist because we'll have > IsIgnoringSystemIndexes true, but for an ordinary on-line indexing > operation there's definitely a risk. Presently the code relies on > "deactivating indexes" to prevent this, but I think it can be done more > simply, because we only need to suppress access to the target index > locally in the reindexing backend. (Other backends will be locked out by > means of an AccessExclusiveLock on the system catalog in question.) > > In short then, we can fix things so that the new-relfilenode path can be > taken in all cases except shared indexes. For shared indexes, we have > little choice but to truncate and rebuild the index in-place. (There is > then no need to update its pg_class row at all.) This is of course > horribly crash-unsafe. > > The code presently uses a "deactivated indexes" flag (namely, setting > pg_class.relhasindex false) to provide some protection against a crash > midway through an in-place reindex. However, this concept is really quite > useless for a shared index, because only one database could contain the > deactivation flag. Accesses from any other database would still try to > use the broken index. > > Based on this analysis, I feel that the "deactivated indexes" mechanism > is of no further value and should be retired. We should instead simply > acknowledge that reindexing shared indexes is dangerous. I propose that > it be allowed only from a standalone backend. Specifically I think that > REINDEX INDEX and REINDEX TABLE should error out if pointed to a shared > index or table, unless running in a standalone backend; REINDEX DATABASE > should skip over the shared indexes unless running in a standalone > backend. (I'm not convinced that either -O or -P needs to be insisted on > by REINDEX, btw, but we can discuss that separately.) We'll have to warn > users not to try to restart the database when reindexing of a shared table > wasn't successfully completed. > > Details to back up the above claims: > > Part of the code needed to fix the relcache restriction on nailed indexes > is already there, but ifdef'd out; that's the part that re-reads the > index's pg_class row after receiving SI inval for it. There are some > cases not yet handled though. In the first place, if the nailed index > being modified is pg_class_oid_index, ScanPgRelation will try to use that > same index to load the updated row, and will fail because it'll be trying > to use the old relfilenode. We can easily force a seqscan in that case, > however. A more subtle problem is that if an SI buffer overflow occurs, > RelationCacheInvalidate walks through the relation cache in a random > order. I believe that the correct update order has to be first > pg_class_oid_index, then the other nailed indexes, and finally all other > relation cache entries. pg_class_oid_index has to be updated first (with > the aforementioned seqscan), since it will be used by ScanPgRelation to > reread the pg_class rows for the other nailed indexes. Only when we are > sure the nailed indexes are up-to-date can we safely rebuild other > relcache entries. > > Assigning a new relfilenode to indexes of pg_class is not a big deal when > we don't have an index-corruption problem. We simply have to make the new > heap_updated row before we start index_build (which indeed the code does > already); both old and new rows will be indexed in the new index, and all > is well. However, if we suspect index corruption then it is important not > to try to make entries into the old indexes, because we might crash trying > to update a corrupt index. I propose the following behavior: > > REINDEX INDEX: no special action; this is not the thing to use > when index corruption is suspected anyway. > > REINDEX TABLE: while operating on pg_class, arrange for the > relcache's list of known indexes of pg_class to contain only the > indexes already reindexed. In this way, only those indexes will > be updated by CatalogUpdateIndexes(), and so no untrustworthy > indexes will be touched while making new pg_class rows. > > REINDEX DATABASE: take care to process pg_class first. > > To avoid catalog accesses via an index that's being rebuilt, we can simply > generalize the SetReindexProcessing() state to include the OID of the > index currently being rebuilt. Places that presently make checks for > IsIgnoringSystemIndexes can check this as well and fall back to seqscans > when the targeted index would have been used. > > One other point: IsIgnoringSystemIndexes seems presently to be taken to > mean not only that we don't *read* the system indexes, but that we don't > *write* them either. I think this is horribly dangerous; it effectively > means that the only thing you can safely do in a backend started with -P > is REINDEX. If you make any other changes to the system catalogs then > you've re-corrupted the indexes. Also, you don't have any protection > against duplicate entries getting made, since the unique indexes are > disabled. It would be a lot safer to define IsIgnoringSystemIndexes as > preventing the system indexes from being used for lookups, while still > updating the indexes on any catalog modification. This will not affect > the safety of REINDEX and will make other operations much less prone to > pilot error. > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom, would you summarize what REINDEX currently _doesn't_ do? As of CVS tip I think the only deficiency is that indexes on the shared catalogs (pg_database, pg_shadow, pg_group) have to be reindexed in place, rather than being rebuilt with a new relfilenode as is done for CLUSTER or TRUNCATE. In-place reindexing isn't crash-safe, since if you fail you're left with a half-built (effectively corrupt) index. I don't see any way to avoid that, though, since we cannot change the relfilenode value for a shared index. I was toying with the notion of changing btree index build to not write the metapage until the index is fully built; in this way, at least the corrupted state of the index would be obvious. (You'd get "not a btree" failures.) regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom, would you summarize what REINDEX currently _doesn't_ do? > > As of CVS tip I think the only deficiency is that indexes on the shared > catalogs (pg_database, pg_shadow, pg_group) have to be reindexed in > place, rather than being rebuilt with a new relfilenode as is done for > CLUSTER or TRUNCATE. In-place reindexing isn't crash-safe, since if > you fail you're left with a half-built (effectively corrupt) index. > > I don't see any way to avoid that, though, since we cannot change the > relfilenode value for a shared index. > > I was toying with the notion of changing btree index build to not write > the metapage until the index is fully built; in this way, at least the > corrupted state of the index would be obvious. (You'd get "not a btree" > failures.) Oh, that's great. I can't imagine a lot of traffic in those shared tables anyway. So you implemented all the ideas in the email --- great. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, Sep 27, 2003 at 06:37:22PM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom, would you summarize what REINDEX currently _doesn't_ do? > > As of CVS tip I think the only deficiency is that indexes on the shared > catalogs (pg_database, pg_shadow, pg_group) have to be reindexed in > place, rather than being rebuilt with a new relfilenode as is done for > CLUSTER or TRUNCATE. In-place reindexing isn't crash-safe, since if > you fail you're left with a half-built (effectively corrupt) index. > > I don't see any way to avoid that, though, since we cannot change the > relfilenode value for a shared index. What about creating a separate filenode anyway and renaming the files afterwards? It would not be an atomic operation anyway, but it would be better than the current setup IMHO. (It seems unlikely that one would have > 1GB indexes for shared relations...) -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La espina, desde que nace, ya pincha" (Proverbio africano)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sat, Sep 27, 2003 at 06:37:22PM -0400, Tom Lane wrote: >> I don't see any way to avoid that, though, since we cannot change the >> relfilenode value for a shared index. > What about creating a separate filenode anyway and renaming the files > afterwards? It would not be an atomic operation anyway, but it would be > better than the current setup IMHO. I think it would be difficult to persuade the buffer manager and storage manager to work with this; from their point of view you'd be moving a relation underneath them. I doubt it's really worth the trouble; how often do you need to reindex a shared catalog? regards, tom lane
Tom Lane wrote: > > What about creating a separate filenode anyway and renaming the files > > afterwards? It would not be an atomic operation anyway, but it would be > > better than the current setup IMHO. > > I think it would be difficult to persuade the buffer manager and storage > manager to work with this; from their point of view you'd be moving a > relation underneath them. I doubt it's really worth the trouble; how > often do you need to reindex a shared catalog? The point I missed originally is that he is talking about shared catalogs, not system catalogs, which work fine in CVS. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, Sep 27, 2003 at 08:18:07PM -0400, Bruce Momjian wrote: > Tom Lane wrote: > > > What about creating a separate filenode anyway and renaming the files > > > afterwards? It would not be an atomic operation anyway, but it would be > > > better than the current setup IMHO. > > > > I think it would be difficult to persuade the buffer manager and storage > > manager to work with this; from their point of view you'd be moving a > > relation underneath them. I doubt it's really worth the trouble; how > > often do you need to reindex a shared catalog? > > The point I missed originally is that he is talking about shared > catalogs, not system catalogs, which work fine in CVS. Well, my idea was to reduce the window of time during which the index would be corrupt, i.e. not completely rebuilt. This is only an issue with shared indexes, because other system indexes do use the changing relfilenode thingie already. However, the main reason for reindexing is a corrupt index, so if for some reason the new index is also corrupt (e.g. the machine crashes midway) there's no point in having a separate index filenode anyway. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake)