Re: REINDEX INDEX results in a crash for an index of pg_class since9.6 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: REINDEX INDEX results in a crash for an index of pg_class since9.6
Date
Msg-id 20190425215624.t724lzphoinzgnrb@alap3.anarazel.de
Whole thread Raw
In response to Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2019-04-25 17:12:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > My point was that given the current coding the code in
> > ATExecSetTableSpace() would make changes to the *old* relfilenode, after
> > having already copied the contents to the new relfilenode. Which means
> > that if ATExecSetTableSpace() is ever used on pg_class or one of it's
> > indexes, it'd just loose those changes, afaict.
> 
> Hmm.

I think there's no a live bug in because we a) require
allow_system_table_mods to modify system tables, and then b) have
another check

    /*
     * We cannot support moving mapped relations into different tablespaces.
     * (In particular this eliminates all shared catalogs.)
     */
    if (RelationIsMapped(rel))
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot move system relation \"%s\"",
                        RelationGetRelationName(rel))));

that triggers even when allow_system_table_mods is off.


> There's another reason why we'd like the relcache contents to only change
> at CCI, which is that if we get a relcache invalidation somewhere before
> we get to the CCI, relcache.c would proceed to reload it based on the
> *current* catalog contents (ie, pre-update, thanks to the magic of MVCC),
> so that the entry would revert back to its previous state until we did get
> to CCI.  I wonder whether there's any demonstrable bug there.  Though
> you'd think the CLOBBER_CACHE_ALWAYS animals would've found it if so.

I think we basically assume that there's nothing triggering relcache
invals here inbetween updating the relcache entry, and doing the actual
catalog modification. That looks like it's currently somewhat OK in the
places we've talked about so far.

This made me look at cluster.c and damn, I'd forgotten about that.
Look at the following code in copy_table_data():

        /*
         * When doing swap by content, any toast pointers written into NewHeap
         * must use the old toast table's OID, because that's where the toast
         * data will eventually be found.  Set this up by setting rd_toastoid.
         * This also tells toast_save_datum() to preserve the toast value
         * OIDs, which we want so as not to invalidate toast pointers in
         * system catalog caches, and to avoid making multiple copies of a
         * single toast value.
         *
         * Note that we must hold NewHeap open until we are done writing data,
         * since the relcache will not guarantee to remember this setting once
         * the relation is closed.  Also, this technique depends on the fact
         * that no one will try to read from the NewHeap until after we've
         * finished writing it and swapping the rels --- otherwise they could
         * follow the toast pointers to the wrong place.  (It would actually
         * work for values copied over from the old toast table, but not for
         * any values that we toast which were previously not toasted.)
         */
        NewHeap->rd_toastoid = OldHeap->rd_rel->reltoastrelid;
    }
    else
        *pSwapToastByContent = false;

which then goes on to do things like a full blown sort or index
scan. Sure, that's for the old relation, but that's so ugly. It works
because RelationClearRelation() copies over rd_toastoid :/.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: TRACE_SORT defined by default
Next
From: Andres Freund
Date:
Subject: Re: Pluggable Storage - Andres's take