Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Date
Msg-id 20190501190550.jigb3tem3xxzspad@alap3.anarazel.de
Whole thread Raw
In response to Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
List pgsql-hackers
Hi,

On 2019-05-01 14:44:12 -0400, Tom Lane wrote:
> I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found
> while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've
> gotten a failure, your session is hosed:
> 
> regression=# select 1;
>  ?column? 
> ----------
>         1
> (1 row)
> 
> regression=# reindex index pg_class_oid_index;
> psql: ERROR:  could not read block 0 in file "base/16384/35344": read only 0 of 8192 bytes
> regression=# select 1;
> psql: ERROR:  could not open file "base/16384/35344": No such file or directory

Yea, I noticed that too. Lead me astray for a while, because it
triggered apparent REINDEX failures for pg_index too, even though not
actually related to the reindex.


> The problem is that the nailed relcache entry for pg_class_oid_index got
> updated to point to the new relfilenode, and that change did not get
> undone by transaction abort.  There seem to be several bugs contributing
> to that, but the one I'm asking about right now is that commit ae9aba69a
> caused RelationCacheInvalidate to skip over relations that have
> rd_newRelfilenodeSubid set, as indeed pg_class_oid_index will have at the
> instant of the failure.

That commit message is quite unhelpful about what exactly this was
intended to fix. I assume it was intended to make COPY FREEZE usage
predictable, but...


> This seems quite wrong, because it prevents us from rebuilding the
> entry with corrected values.  In particular notice that the change
> causes us to skip the RelationInitPhysicalAddr call that would
> normally be applied to a nailed mapped index in that loop.  That's
> completely fatal in this case --- it keeps us from restoring the
> correct relfilenode that the mapper would now tell us, if we only
> bothered to ask.

Indeed. I'm a bit surprised that doesn't lead to more problems.

Not sure I understand where the RelationCacheInvalidate() call is coming
from in this case though. Shouldn't the entry have been invalidated
individually through ATEOXact_Inval(false)?


> I think perhaps what we should do is revert ae9aba69a and instead do
> 
>         relcacheInvalsReceived++;
> 
> -        if (RelationHasReferenceCountZero(relation))
> +        if (RelationHasReferenceCountZero(relation) &&
> +            relation->rd_newRelfilenodeSubid == InvalidSubTransactionId)
>         {
>             /* Delete this entry immediately */
>             Assert(!relation->rd_isnailed);
> 
> so that entries with rd_newRelfilenodeSubid nonzero are preserved but are
> rebuilt.

Seems like a reasonable approach.


> There might be an argument for treating rd_createSubid the same way,
> not sure.  That field wouldn't ever be set on a system catalog, so
> it's less of a hazard, but there's something to be said for treating
> the two fields alike.

Seems sensible to treat it the same way. Not sure if there's an actual
problem where the current treatment could cause a problem, but seems
worth improving.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
Next
From: Tom Lane
Date:
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6