Re: Improving REINDEX for system indexes (long) - Mailing list pgsql-hackers
From | Hiroshi Inoue |
---|---|
Subject | Re: Improving REINDEX for system indexes (long) |
Date | |
Msg-id | 000d01c3807a$763228b0$3d283ddb@PbgX Whole thread Raw |
In response to | Improving REINDEX for system indexes (long) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Improving REINDEX for system indexes (long)
Re: Improving REINDEX for system indexes (long) |
List | pgsql-hackers |
First it should have been discussed before your commitment or at least it should be discussed after reversing your change. I require you to explain me why you committed the change with no discussion and little investigation. I also noticed that your change for catalog/index.c Revision 1.200 / (download) - annotate - [select for diffs] , Mon Sep23 00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl Changes since 1.199: +124 -161 lines Diff to previous 1.199 Getrid of bogus use of heap_mark4update in reindex operations (cf. recent bug report). Fix processing of nailed-in-cacheindexes; it appears that REINDEX DATABASE has been broken for months :-(. removed an #ifndef ENABLE_REINDEX_NAILED_RELATIONS trial stuff. I'm very surprised to see it now. regards, Hiroshi Inoue > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > I've been looking at the issues involved in reindexing system tables, > and I now have what I think is a fairly defensible set of proposals. > > We should whenever possible use the same reindexing technique used by > CLUSTER: assign a new relfilenode number, build the new index in that > file, and apply an ordinary heap_update operation to update > the index's > pg_class row with the new relfilenode value. This technique is fully > crash-safe and transaction-safe (meaning it can be rolled > back even after > completion, in case of failure later in the same > transaction). However, > there are several pitfalls for applying it to system indexes: > > 1. There is a problem for a shared index, because we have no way to > update pg_class.relfilenode in all the other databases besides ours. > I see no good way around this problem. > > 2. There is a problem for a nailed-in-cache index, because > the relcache > isn't prepared to cope with relfilenode updates for such indexes. > However, that is fixable. > > 3. There is a problem for an index on pg_class itself: doing > heap_update > on a pg_class row must make new index entries. We have to be careful > that the new row does appear in the updated index, while not making > entries in old-and-possibly-broken indexes. This is doable. > > 4. There is a problem with updating indexes that might be > used for catalog > fetches executed during the index_build process: if we try to use the > partially-rebuilt index for such a fetch we will fail. In the case of > disaster recovery the problem doesn't exist because we'll have > IsIgnoringSystemIndexes true, but for an ordinary on-line indexing > operation there's definitely a risk. Presently the code relies on > "deactivating indexes" to prevent this, but I think it can be > done more > simply, because we only need to suppress access to the target index > locally in the reindexing backend. (Other backends will be > locked out by > means of an AccessExclusiveLock on the system catalog in question.) > > In short then, we can fix things so that the new-relfilenode > path can be > taken in all cases except shared indexes. For shared indexes, we have > little choice but to truncate and rebuild the index in-place. > (There is > then no need to update its pg_class row at all.) This is of course > horribly crash-unsafe. > > The code presently uses a "deactivated indexes" flag (namely, setting > pg_class.relhasindex false) to provide some protection against a crash > midway through an in-place reindex. However, this concept is > really quite > useless for a shared index, because only one database could > contain the > deactivation flag. Accesses from any other database would > still try to > use the broken index. > > Based on this analysis, I feel that the "deactivated indexes" > mechanism > is of no further value and should be retired. We should > instead simply > acknowledge that reindexing shared indexes is dangerous. I > propose that > it be allowed only from a standalone backend. Specifically I > think that > REINDEX INDEX and REINDEX TABLE should error out if pointed > to a shared > index or table, unless running in a standalone backend; > REINDEX DATABASE > should skip over the shared indexes unless running in a standalone > backend. (I'm not convinced that either -O or -P needs to be > insisted on > by REINDEX, btw, but we can discuss that separately.) We'll > have to warn > users not to try to restart the database when reindexing of a > shared table > wasn't successfully completed. > > Details to back up the above claims: > > Part of the code needed to fix the relcache restriction on > nailed indexes > is already there, but ifdef'd out; that's the part that re-reads the > index's pg_class row after receiving SI inval for it. There are some > cases not yet handled though. In the first place, if the nailed index > being modified is pg_class_oid_index, ScanPgRelation will try > to use that > same index to load the updated row, and will fail because > it'll be trying > to use the old relfilenode. We can easily force a seqscan in > that case, > however. A more subtle problem is that if an SI buffer > overflow occurs, > RelationCacheInvalidate walks through the relation cache in a random > order. I believe that the correct update order has to be first > pg_class_oid_index, then the other nailed indexes, and > finally all other > relation cache entries. pg_class_oid_index has to be updated > first (with > the aforementioned seqscan), since it will be used by > ScanPgRelation to > reread the pg_class rows for the other nailed indexes. Only > when we are > sure the nailed indexes are up-to-date can we safely rebuild other > relcache entries. > > Assigning a new relfilenode to indexes of pg_class is not a > big deal when > we don't have an index-corruption problem. We simply have to > make the new > heap_updated row before we start index_build (which indeed > the code does > already); both old and new rows will be indexed in the new > index, and all > is well. However, if we suspect index corruption then it is > important not > to try to make entries into the old indexes, because we might > crash trying > to update a corrupt index. I propose the following behavior: > > REINDEX INDEX: no special action; this is not the thing to use > when index corruption is suspected anyway. > > REINDEX TABLE: while operating on pg_class, arrange for the > relcache's list of known indexes of pg_class to contain only the > indexes already reindexed. In this way, only those indexes will > be updated by CatalogUpdateIndexes(), and so no untrustworthy > indexes will be touched while making new pg_class rows. > > REINDEX DATABASE: take care to process pg_class first. > > To avoid catalog accesses via an index that's being rebuilt, > we can simply > generalize the SetReindexProcessing() state to include the OID of the > index currently being rebuilt. Places that presently make checks for > IsIgnoringSystemIndexes can check this as well and fall back > to seqscans > when the targeted index would have been used. > > One other point: IsIgnoringSystemIndexes seems presently to > be taken to > mean not only that we don't *read* the system indexes, but > that we don't > *write* them either. I think this is horribly dangerous; it > effectively > means that the only thing you can safely do in a backend > started with -P > is REINDEX. If you make any other changes to the system catalogs then > you've re-corrupted the indexes. Also, you don't have any protection > against duplicate entries getting made, since the unique indexes are > disabled. It would be a lot safer to define > IsIgnoringSystemIndexes as > preventing the system indexes from being used for lookups, while still > updating the indexes on any catalog modification. This will > not affect > the safety of REINDEX and will make other operations much > less prone to > pilot error. > > Comments? > > regards, tom lane >
pgsql-hackers by date: