Re: Improving REINDEX for system indexes (long) - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Improving REINDEX for system indexes (long)
Date
Msg-id 200309272205.h8RM5Dp16195@candle.pha.pa.us
Whole thread Raw
In response to Improving REINDEX for system indexes (long)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Improving REINDEX for system indexes (long)
List pgsql-hackers
Tom, would you summarize what REINDEX currently _doesn't_ do?

---------------------------------------------------------------------------

Tom Lane wrote:
> I've been looking at the issues involved in reindexing system tables,
> and I now have what I think is a fairly defensible set of proposals.
> 
> We should whenever possible use the same reindexing technique used by
> CLUSTER: assign a new relfilenode number, build the new index in that
> file, and apply an ordinary heap_update operation to update the index's
> pg_class row with the new relfilenode value.  This technique is fully
> crash-safe and transaction-safe (meaning it can be rolled back even after
> completion, in case of failure later in the same transaction).  However,
> there are several pitfalls for applying it to system indexes:
> 
> 1. There is a problem for a shared index, because we have no way to
> update pg_class.relfilenode in all the other databases besides ours.
> I see no good way around this problem.
> 
> 2. There is a problem for a nailed-in-cache index, because the relcache
> isn't prepared to cope with relfilenode updates for such indexes.
> However, that is fixable.
> 
> 3. There is a problem for an index on pg_class itself: doing heap_update
> on a pg_class row must make new index entries.  We have to be careful
> that the new row does appear in the updated index, while not making
> entries in old-and-possibly-broken indexes.  This is doable.
> 
> 4. There is a problem with updating indexes that might be used for catalog
> fetches executed during the index_build process: if we try to use the
> partially-rebuilt index for such a fetch we will fail.  In the case of
> disaster recovery the problem doesn't exist because we'll have
> IsIgnoringSystemIndexes true, but for an ordinary on-line indexing
> operation there's definitely a risk.  Presently the code relies on
> "deactivating indexes" to prevent this, but I think it can be done more
> simply, because we only need to suppress access to the target index
> locally in the reindexing backend.  (Other backends will be locked out by
> means of an AccessExclusiveLock on the system catalog in question.)
> 
> In short then, we can fix things so that the new-relfilenode path can be
> taken in all cases except shared indexes.  For shared indexes, we have
> little choice but to truncate and rebuild the index in-place.  (There is
> then no need to update its pg_class row at all.)  This is of course
> horribly crash-unsafe.
> 
> The code presently uses a "deactivated indexes" flag (namely, setting
> pg_class.relhasindex false) to provide some protection against a crash
> midway through an in-place reindex.  However, this concept is really quite
> useless for a shared index, because only one database could contain the
> deactivation flag.  Accesses from any other database would still try to
> use the broken index.
> 
> Based on this analysis, I feel that the "deactivated indexes" mechanism
> is of no further value and should be retired.  We should instead simply
> acknowledge that reindexing shared indexes is dangerous.  I propose that
> it be allowed only from a standalone backend.  Specifically I think that
> REINDEX INDEX and REINDEX TABLE should error out if pointed to a shared
> index or table, unless running in a standalone backend; REINDEX DATABASE
> should skip over the shared indexes unless running in a standalone
> backend.  (I'm not convinced that either -O or -P needs to be insisted on
> by REINDEX, btw, but we can discuss that separately.)  We'll have to warn
> users not to try to restart the database when reindexing of a shared table
> wasn't successfully completed.
> 
> Details to back up the above claims:
> 
> Part of the code needed to fix the relcache restriction on nailed indexes
> is already there, but ifdef'd out; that's the part that re-reads the
> index's pg_class row after receiving SI inval for it.  There are some
> cases not yet handled though.  In the first place, if the nailed index
> being modified is pg_class_oid_index, ScanPgRelation will try to use that
> same index to load the updated row, and will fail because it'll be trying
> to use the old relfilenode.  We can easily force a seqscan in that case,
> however.  A more subtle problem is that if an SI buffer overflow occurs, 
> RelationCacheInvalidate walks through the relation cache in a random
> order.  I believe that the correct update order has to be first
> pg_class_oid_index, then the other nailed indexes, and finally all other
> relation cache entries.  pg_class_oid_index has to be updated first (with
> the aforementioned seqscan), since it will be used by ScanPgRelation to
> reread the pg_class rows for the other nailed indexes.  Only when we are
> sure the nailed indexes are up-to-date can we safely rebuild other
> relcache entries.
> 
> Assigning a new relfilenode to indexes of pg_class is not a big deal when
> we don't have an index-corruption problem.  We simply have to make the new
> heap_updated row before we start index_build (which indeed the code does
> already); both old and new rows will be indexed in the new index, and all
> is well.  However, if we suspect index corruption then it is important not
> to try to make entries into the old indexes, because we might crash trying
> to update a corrupt index.  I propose the following behavior:
> 
>     REINDEX INDEX: no special action; this is not the thing to use
>     when index corruption is suspected anyway.
> 
>     REINDEX TABLE: while operating on pg_class, arrange for the
>     relcache's list of known indexes of pg_class to contain only the
>     indexes already reindexed.  In this way, only those indexes will
>     be updated by CatalogUpdateIndexes(), and so no untrustworthy
>     indexes will be touched while making new pg_class rows.
> 
>     REINDEX DATABASE: take care to process pg_class first.
> 
> To avoid catalog accesses via an index that's being rebuilt, we can simply
> generalize the SetReindexProcessing() state to include the OID of the
> index currently being rebuilt.  Places that presently make checks for
> IsIgnoringSystemIndexes can check this as well and fall back to seqscans
> when the targeted index would have been used.
> 
> One other point: IsIgnoringSystemIndexes seems presently to be taken to
> mean not only that we don't *read* the system indexes, but that we don't
> *write* them either.  I think this is horribly dangerous; it effectively
> means that the only thing you can safely do in a backend started with -P
> is REINDEX.  If you make any other changes to the system catalogs then
> you've re-corrupted the indexes.  Also, you don't have any protection
> against duplicate entries getting made, since the unique indexes are
> disabled.  It would be a lot safer to define IsIgnoringSystemIndexes as
> preventing the system indexes from being used for lookups, while still
> updating the indexes on any catalog modification.  This will not affect
> the safety of REINDEX and will make other operations much less prone to
> pilot error.
> 
> Comments?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] pgindented tsearch2 for 7.3.4
Next
From: "Hiroshi Inoue"
Date:
Subject: Re: 2-phase commit