Re: Improving REINDEX for system indexes (long) - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Improving REINDEX for system indexes (long) |
Date | |
Msg-id | 200309272205.h8RM5Dp16195@candle.pha.pa.us Whole thread Raw |
In response to | Improving REINDEX for system indexes (long) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Improving REINDEX for system indexes (long)
|
List | pgsql-hackers |
Tom, would you summarize what REINDEX currently _doesn't_ do? --------------------------------------------------------------------------- Tom Lane wrote: > I've been looking at the issues involved in reindexing system tables, > and I now have what I think is a fairly defensible set of proposals. > > We should whenever possible use the same reindexing technique used by > CLUSTER: assign a new relfilenode number, build the new index in that > file, and apply an ordinary heap_update operation to update the index's > pg_class row with the new relfilenode value. This technique is fully > crash-safe and transaction-safe (meaning it can be rolled back even after > completion, in case of failure later in the same transaction). However, > there are several pitfalls for applying it to system indexes: > > 1. There is a problem for a shared index, because we have no way to > update pg_class.relfilenode in all the other databases besides ours. > I see no good way around this problem. > > 2. There is a problem for a nailed-in-cache index, because the relcache > isn't prepared to cope with relfilenode updates for such indexes. > However, that is fixable. > > 3. There is a problem for an index on pg_class itself: doing heap_update > on a pg_class row must make new index entries. We have to be careful > that the new row does appear in the updated index, while not making > entries in old-and-possibly-broken indexes. This is doable. > > 4. There is a problem with updating indexes that might be used for catalog > fetches executed during the index_build process: if we try to use the > partially-rebuilt index for such a fetch we will fail. In the case of > disaster recovery the problem doesn't exist because we'll have > IsIgnoringSystemIndexes true, but for an ordinary on-line indexing > operation there's definitely a risk. Presently the code relies on > "deactivating indexes" to prevent this, but I think it can be done more > simply, because we only need to suppress access to the target index > locally in the reindexing backend. (Other backends will be locked out by > means of an AccessExclusiveLock on the system catalog in question.) > > In short then, we can fix things so that the new-relfilenode path can be > taken in all cases except shared indexes. For shared indexes, we have > little choice but to truncate and rebuild the index in-place. (There is > then no need to update its pg_class row at all.) This is of course > horribly crash-unsafe. > > The code presently uses a "deactivated indexes" flag (namely, setting > pg_class.relhasindex false) to provide some protection against a crash > midway through an in-place reindex. However, this concept is really quite > useless for a shared index, because only one database could contain the > deactivation flag. Accesses from any other database would still try to > use the broken index. > > Based on this analysis, I feel that the "deactivated indexes" mechanism > is of no further value and should be retired. We should instead simply > acknowledge that reindexing shared indexes is dangerous. I propose that > it be allowed only from a standalone backend. Specifically I think that > REINDEX INDEX and REINDEX TABLE should error out if pointed to a shared > index or table, unless running in a standalone backend; REINDEX DATABASE > should skip over the shared indexes unless running in a standalone > backend. (I'm not convinced that either -O or -P needs to be insisted on > by REINDEX, btw, but we can discuss that separately.) We'll have to warn > users not to try to restart the database when reindexing of a shared table > wasn't successfully completed. > > Details to back up the above claims: > > Part of the code needed to fix the relcache restriction on nailed indexes > is already there, but ifdef'd out; that's the part that re-reads the > index's pg_class row after receiving SI inval for it. There are some > cases not yet handled though. In the first place, if the nailed index > being modified is pg_class_oid_index, ScanPgRelation will try to use that > same index to load the updated row, and will fail because it'll be trying > to use the old relfilenode. We can easily force a seqscan in that case, > however. A more subtle problem is that if an SI buffer overflow occurs, > RelationCacheInvalidate walks through the relation cache in a random > order. I believe that the correct update order has to be first > pg_class_oid_index, then the other nailed indexes, and finally all other > relation cache entries. pg_class_oid_index has to be updated first (with > the aforementioned seqscan), since it will be used by ScanPgRelation to > reread the pg_class rows for the other nailed indexes. Only when we are > sure the nailed indexes are up-to-date can we safely rebuild other > relcache entries. > > Assigning a new relfilenode to indexes of pg_class is not a big deal when > we don't have an index-corruption problem. We simply have to make the new > heap_updated row before we start index_build (which indeed the code does > already); both old and new rows will be indexed in the new index, and all > is well. However, if we suspect index corruption then it is important not > to try to make entries into the old indexes, because we might crash trying > to update a corrupt index. I propose the following behavior: > > REINDEX INDEX: no special action; this is not the thing to use > when index corruption is suspected anyway. > > REINDEX TABLE: while operating on pg_class, arrange for the > relcache's list of known indexes of pg_class to contain only the > indexes already reindexed. In this way, only those indexes will > be updated by CatalogUpdateIndexes(), and so no untrustworthy > indexes will be touched while making new pg_class rows. > > REINDEX DATABASE: take care to process pg_class first. > > To avoid catalog accesses via an index that's being rebuilt, we can simply > generalize the SetReindexProcessing() state to include the OID of the > index currently being rebuilt. Places that presently make checks for > IsIgnoringSystemIndexes can check this as well and fall back to seqscans > when the targeted index would have been used. > > One other point: IsIgnoringSystemIndexes seems presently to be taken to > mean not only that we don't *read* the system indexes, but that we don't > *write* them either. I think this is horribly dangerous; it effectively > means that the only thing you can safely do in a backend started with -P > is REINDEX. If you make any other changes to the system catalogs then > you've re-corrupted the indexes. Also, you don't have any protection > against duplicate entries getting made, since the unique indexes are > disabled. It would be a lot safer to define IsIgnoringSystemIndexes as > preventing the system indexes from being used for lookups, while still > updating the indexes on any catalog modification. This will not affect > the safety of REINDEX and will make other operations much less prone to > pilot error. > > Comments? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: