Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE) |
Date | |
Msg-id | 4541.1556736252@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Commits 8de72b and 5457a1 (COPY FREEZE) (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: [HACKERS] Commits 8de72b and 5457a1 (COPY FREEZE)
|
List | pgsql-hackers |
[ blast-from-the-past department ] Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 December 2012 03:01, Noah Misch <noah@leadboat.com> wrote: >> Until these threads, I did not know that a relcache invalidation could trip up >> the WAL avoidance optimization, and now this. I poked at the relevant >> relcache.c code, and it already takes pains to preserve the needed facts. The >> header comment of RelationCacheInvalidate() indicates that entries bearing an >> rd_newRelfilenodeSubid can safely survive the invalidation, but the code does >> not implement that. I think the comment is right, and this is just an >> oversight in the code going back to its beginning (fba8113c). > I think the comment is right also and so the patch is good. I will > apply, barring objections. I'm busy looking into the REINDEX-on-pg_class mess, and one thing I found while testing HEAD is that with CLOBBER_CACHE_ALWAYS on, once you've gotten a failure, your session is hosed: regression=# select 1; ?column? ---------- 1 (1 row) regression=# reindex index pg_class_oid_index; psql: ERROR: could not read block 0 in file "base/16384/35344": read only 0 of 8192 bytes regression=# select 1; psql: ERROR: could not open file "base/16384/35344": No such file or directory The problem is that the nailed relcache entry for pg_class_oid_index got updated to point to the new relfilenode, and that change did not get undone by transaction abort. There seem to be several bugs contributing to that, but the one I'm asking about right now is that commit ae9aba69a caused RelationCacheInvalidate to skip over relations that have rd_newRelfilenodeSubid set, as indeed pg_class_oid_index will have at the instant of the failure. This seems quite wrong, because it prevents us from rebuilding the entry with corrected values. In particular notice that the change causes us to skip the RelationInitPhysicalAddr call that would normally be applied to a nailed mapped index in that loop. That's completely fatal in this case --- it keeps us from restoring the correct relfilenode that the mapper would now tell us, if we only bothered to ask. I think perhaps what we should do is revert ae9aba69a and instead do relcacheInvalsReceived++; - if (RelationHasReferenceCountZero(relation)) + if (RelationHasReferenceCountZero(relation) && + relation->rd_newRelfilenodeSubid == InvalidSubTransactionId) { /* Delete this entry immediately */ Assert(!relation->rd_isnailed); so that entries with rd_newRelfilenodeSubid nonzero are preserved but are rebuilt. There might be an argument for treating rd_createSubid the same way, not sure. That field wouldn't ever be set on a system catalog, so it's less of a hazard, but there's something to be said for treating the two fields alike. Thoughts? regards, tom lane
pgsql-hackers by date: