Re: Inval reliability, especially for inplace updates - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Inval reliability, especially for inplace updates
Date
Msg-id 20240615223718.42.nmisch@google.com
Whole thread Raw
In response to Inval reliability, especially for inplace updates  (Noah Misch <noah@leadboat.com>)
Responses Re: Inval reliability, especially for inplace updates
Re: Inval reliability, especially for inplace updates
List pgsql-hackers
On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> https://postgr.es/m/20240512232923.aa.nmisch@google.com wrote:
> > Separable, nontrivial things not fixed in the attached patch stack:
> > 
> > - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK of
> >   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
> >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this right.
> 
> I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
> inside the critical section.  Send it in heap_xlog_inplace(), too.

> a. Within logical decoding, cease processing invalidations for inplace

I'm attaching the implementation.  This applies atop the v3 patch stack from
https://postgr.es/m/20240614003549.c2.nmisch@google.com, but the threads are
mostly orthogonal and intended for independent review.  Translating a tuple
into inval messages uses more infrastructure than relmapper, which needs just
a database ID.  Hence, this ended up more like a miniature of inval.c's
participation in the transaction commit sequence.

I waffled on whether to back-patch inplace150-inval-durability-atcommit.  The
consequences of that bug are plenty bad, but reaching them requires an error
between TransactionIdCommitTree() and AtEOXact_Inval().  I've not heard
reports of that, and I don't have a recipe for making it happen on demand.
For now, I'm leaning toward back-patch.  The main risk would be me overlooking
an LWLock deadlock scenario reachable from the new, earlier RelCacheInitLock
timing.  Alternatives for RelCacheInitLock:

- RelCacheInitLock before PreCommit_Notify(), because notify concurrency
  matters more than init file concurrency.  I chose this.
- RelCacheInitLock after PreCommit_Notify(), because PreCommit_Notify() uses a
  heavyweight lock, giving it less risk of undetected deadlock.
- Replace RelCacheInitLock with a heavyweight lock, and keep it before
  PreCommit_Notify().
- Fold PreCommit_Inval() back into AtCommit_Inval(), accepting that EIO in
  unlink_initfile() will PANIC.

Opinions on that?

The patch changes xl_heap_inplace of XLOG_HEAP_INPLACE.  For back branches, we
could choose between:

- Same change, no WAL version bump.  Standby must update before primary.  This
  is best long-term, but the transition is more disruptive.  I'm leaning
  toward this one, but the second option isn't bad:

- heap_xlog_inplace() could set the shared-inval-queue overflow signal on
  every backend.  This is more wasteful, but inplace updates might be rare
  enough (~once per VACUUM) to make it tolerable.

- Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This isn't
  correct if one ends recovery between the two records, but you'd need to be
  unlucky to notice.  Noticing would need a procedure like the following.  A
  hot standby backend populates a relcache entry, then does DDL on the rel
  after recovery ends.

Future cleanup work could eliminate LogStandbyInvalidations() and the case of
!markXidCommitted && nmsgs != 0.  Currently, the src/test/regress suite still
reaches that case:

- AlterDomainDropConstraint() queues an inval even if !found; it can stop
  that.

- ON COMMIT DELETE ROWS nontransactionally rebuilds an index, which sends a
  relcache inval.  The point of that inval is, I think, to force access
  methods like btree and hash to reload the metapage copy that they store in
  rd_amcache.  Since no assigned XID implies no changes to the temp index, the
  no-XID case could simply skip the index rebuild.  (temp.sql reaches this
  with a read-only transaction that selects from an ON COMMIT DELETE ROWS
  table.  Realistic usage will tend not to do that.)  ON COMMIT DELETE ROWS
  has another preexisting problem for indexes, mentioned in a code comment.

Thanks,
nm

Attachment

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: RFC: adding pytest as a supported test framework
Next
From: Gayatri Singh
Date:
Subject: Backup and Restore of Partitioned Table in PG-15