Re: Improving REINDEX for system indexes (long) - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject Re: Improving REINDEX for system indexes (long)
Date
Msg-id 000d01c3807a$763228b0$3d283ddb@PbgX
Whole thread Raw
In response to Improving REINDEX for system indexes (long)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Improving REINDEX for system indexes (long)
Re: Improving REINDEX for system indexes (long)
List pgsql-hackers
First it should have been discussed before your commitment or at least
it should be discussed after reversing your change.

I require you to explain me why you committed the change
with no discussion and little investigation.

I also noticed that your change for catalog/index.c   Revision 1.200 / (download) - annotate - [select for diffs] , Mon
Sep23 
00:42:48 2002 UTC (11 months, 4 weeks ago) by tgl    Changes since 1.199: +124 -161 lines   Diff to previous 1.199
Getrid of bogus use of heap_mark4update in reindex operations (cf.   recent bug report).  Fix processing of
nailed-in-cacheindexes;   it appears that REINDEX DATABASE has been broken for months :-(. 

removed an #ifndef ENABLE_REINDEX_NAILED_RELATIONS
trial stuff. I'm very surprised to see it now.

regards,
Hiroshi Inoue

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> I've been looking at the issues involved in reindexing system tables,
> and I now have what I think is a fairly defensible set of proposals.
>
> We should whenever possible use the same reindexing technique used by
> CLUSTER: assign a new relfilenode number, build the new index in that
> file, and apply an ordinary heap_update operation to update
> the index's
> pg_class row with the new relfilenode value.  This technique is fully
> crash-safe and transaction-safe (meaning it can be rolled
> back even after
> completion, in case of failure later in the same
> transaction).  However,
> there are several pitfalls for applying it to system indexes:
>
> 1. There is a problem for a shared index, because we have no way to
> update pg_class.relfilenode in all the other databases besides ours.
> I see no good way around this problem.
>
> 2. There is a problem for a nailed-in-cache index, because
> the relcache
> isn't prepared to cope with relfilenode updates for such indexes.
> However, that is fixable.
>
> 3. There is a problem for an index on pg_class itself: doing
> heap_update
> on a pg_class row must make new index entries.  We have to be careful
> that the new row does appear in the updated index, while not making
> entries in old-and-possibly-broken indexes.  This is doable.
>
> 4. There is a problem with updating indexes that might be
> used for catalog
> fetches executed during the index_build process: if we try to use the
> partially-rebuilt index for such a fetch we will fail.  In the case of
> disaster recovery the problem doesn't exist because we'll have
> IsIgnoringSystemIndexes true, but for an ordinary on-line indexing
> operation there's definitely a risk.  Presently the code relies on
> "deactivating indexes" to prevent this, but I think it can be
> done more
> simply, because we only need to suppress access to the target index
> locally in the reindexing backend.  (Other backends will be
> locked out by
> means of an AccessExclusiveLock on the system catalog in question.)
>
> In short then, we can fix things so that the new-relfilenode
> path can be
> taken in all cases except shared indexes.  For shared indexes, we have
> little choice but to truncate and rebuild the index in-place.
>  (There is
> then no need to update its pg_class row at all.)  This is of course
> horribly crash-unsafe.
>
> The code presently uses a "deactivated indexes" flag (namely, setting
> pg_class.relhasindex false) to provide some protection against a crash
> midway through an in-place reindex.  However, this concept is
> really quite
> useless for a shared index, because only one database could
> contain the
> deactivation flag.  Accesses from any other database would
> still try to
> use the broken index.
>
> Based on this analysis, I feel that the "deactivated indexes"
> mechanism
> is of no further value and should be retired.  We should
> instead simply
> acknowledge that reindexing shared indexes is dangerous.  I
> propose that
> it be allowed only from a standalone backend.  Specifically I
> think that
> REINDEX INDEX and REINDEX TABLE should error out if pointed
> to a shared
> index or table, unless running in a standalone backend;
> REINDEX DATABASE
> should skip over the shared indexes unless running in a standalone
> backend.  (I'm not convinced that either -O or -P needs to be
> insisted on
> by REINDEX, btw, but we can discuss that separately.)  We'll
> have to warn
> users not to try to restart the database when reindexing of a
> shared table
> wasn't successfully completed.
>
> Details to back up the above claims:
>
> Part of the code needed to fix the relcache restriction on
> nailed indexes
> is already there, but ifdef'd out; that's the part that re-reads the
> index's pg_class row after receiving SI inval for it.  There are some
> cases not yet handled though.  In the first place, if the nailed index
> being modified is pg_class_oid_index, ScanPgRelation will try
> to use that
> same index to load the updated row, and will fail because
> it'll be trying
> to use the old relfilenode.  We can easily force a seqscan in
> that case,
> however.  A more subtle problem is that if an SI buffer
> overflow occurs,
> RelationCacheInvalidate walks through the relation cache in a random
> order.  I believe that the correct update order has to be first
> pg_class_oid_index, then the other nailed indexes, and
> finally all other
> relation cache entries.  pg_class_oid_index has to be updated
> first (with
> the aforementioned seqscan), since it will be used by
> ScanPgRelation to
> reread the pg_class rows for the other nailed indexes.  Only
> when we are
> sure the nailed indexes are up-to-date can we safely rebuild other
> relcache entries.
>
> Assigning a new relfilenode to indexes of pg_class is not a
> big deal when
> we don't have an index-corruption problem.  We simply have to
> make the new
> heap_updated row before we start index_build (which indeed
> the code does
> already); both old and new rows will be indexed in the new
> index, and all
> is well.  However, if we suspect index corruption then it is
> important not
> to try to make entries into the old indexes, because we might
> crash trying
> to update a corrupt index.  I propose the following behavior:
>
>     REINDEX INDEX: no special action; this is not the thing to use
>     when index corruption is suspected anyway.
>
>     REINDEX TABLE: while operating on pg_class, arrange for the
>     relcache's list of known indexes of pg_class to contain only the
>     indexes already reindexed.  In this way, only those indexes will
>     be updated by CatalogUpdateIndexes(), and so no untrustworthy
>     indexes will be touched while making new pg_class rows.
>
>     REINDEX DATABASE: take care to process pg_class first.
>
> To avoid catalog accesses via an index that's being rebuilt,
> we can simply
> generalize the SetReindexProcessing() state to include the OID of the
> index currently being rebuilt.  Places that presently make checks for
> IsIgnoringSystemIndexes can check this as well and fall back
> to seqscans
> when the targeted index would have been used.
>
> One other point: IsIgnoringSystemIndexes seems presently to
> be taken to
> mean not only that we don't *read* the system indexes, but
> that we don't
> *write* them either.  I think this is horribly dangerous; it
> effectively
> means that the only thing you can safely do in a backend
> started with -P
> is REINDEX.  If you make any other changes to the system catalogs then
> you've re-corrupted the indexes.  Also, you don't have any protection
> against duplicate entries getting made, since the unique indexes are
> disabled.  It would be a lot safer to define
> IsIgnoringSystemIndexes as
> preventing the system indexes from being used for lookups, while still
> updating the indexes on any catalog modification.  This will
> not affect
> the safety of REINDEX and will make other operations much
> less prone to
> pilot error.
>
> Comments?
>
>             regards, tom lane
>



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Can't Build 7.3.4 on OS X
Next
From: Tom Lane
Date:
Subject: Re: Improving REINDEX for system indexes (long)