Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Date
Msg-id 4541.1556736252@sss.pgh.pa.us
Whole thread Raw
In response to Re: Commits 8de72b and 5457a1 (COPY FREEZE)  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
List pgsql-hackers
[ blast-from-the-past department ]

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 December 2012 03:01, Noah Misch <noah@leadboat.com> wrote:
>> Until these threads, I did not know that a relcache invalidation could trip up
>> the WAL avoidance optimization, and now this.  I poked at the relevant
>> relcache.c code, and it already takes pains to preserve the needed facts.  The
>> header comment of RelationCacheInvalidate() indicates that entries bearing an
>> rd_newRelfilenodeSubid can safely survive the invalidation, but the code does
>> not implement that.  I think the comment is right, and this is just an
>> oversight in the code going back to its beginning (fba8113c).

> I think the comment is right also and so the patch is good. I will
> apply, barring objections.

I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found
while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've
gotten a failure, your session is hosed:

regression=# select 1;
 ?column?
----------
        1
(1 row)

regression=# reindex index pg_class_oid_index;
psql: ERROR:  could not read block 0 in file "base/16384/35344": read only 0 of 8192 bytes
regression=# select 1;
psql: ERROR:  could not open file "base/16384/35344": No such file or directory


The problem is that the nailed relcache entry for pg_class_oid_index got
updated to point to the new relfilenode, and that change did not get
undone by transaction abort.  There seem to be several bugs contributing
to that, but the one I'm asking about right now is that commit ae9aba69a
caused RelationCacheInvalidate to skip over relations that have
rd_newRelfilenodeSubid set, as indeed pg_class_oid_index will have at the
instant of the failure.  This seems quite wrong, because it prevents us
from rebuilding the entry with corrected values.  In particular notice
that the change causes us to skip the RelationInitPhysicalAddr call that
would normally be applied to a nailed mapped index in that loop.  That's
completely fatal in this case --- it keeps us from restoring the correct
relfilenode that the mapper would now tell us, if we only bothered to ask.

I think perhaps what we should do is revert ae9aba69a and instead do

        relcacheInvalsReceived++;

-        if (RelationHasReferenceCountZero(relation))
+        if (RelationHasReferenceCountZero(relation) &&
+            relation->rd_newRelfilenodeSubid == InvalidSubTransactionId)
        {
            /* Delete this entry immediately */
            Assert(!relation->rd_isnailed);

so that entries with rd_newRelfilenodeSubid nonzero are preserved but are
rebuilt.

There might be an argument for treating rd_createSubid the same way,
not sure.  That field wouldn't ever be set on a system catalog, so
it's less of a hazard, but there's something to be said for treating
the two fields alike.

Thoughts?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Adding a test for speculative insert abort case
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)