Inval reliability, especially for inplace updates - Mailing list pgsql-hackers

From Noah Misch
Subject Inval reliability, especially for inplace updates
Date
Msg-id 20240523000548.58.nmisch@google.com
Whole thread Raw
Responses Re: Inval reliability, especially for inplace updates
List pgsql-hackers
https://postgr.es/m/20240512232923.aa.nmisch@google.com wrote:
> Separable, nontrivial things not fixed in the attached patch stack:
> 
> - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK of
>   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
>   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this right.

I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
inside the critical section.  Send it in heap_xlog_inplace(), too.  The
interesting decision is how to handle RelationCacheInitFilePreInvalidate(),
which has an unlink_initfile() that can fail with e.g. EIO.  Options:

1. Unlink during critical section, and accept that EIO becomes PANIC.  Replay
   may reach the same EIO, and the system won't reopen to connections until
   the storage starts cooperating.a  Interaction with checkpoints is not ideal.
   If we checkpoint and then crash between inplace XLogInsert() and inval,
   we'd be relying on StartupXLOG() -> RelationCacheInitFileRemove().  That
   uses elevel==LOG, so replay would neglect to PANIC on EIO.

2. Unlink before critical section, so normal xact abort suffices.  This would
   hold RelCacheInitLock and a buffer content lock at the same time.  In
   RecordTransactionCommit(), it would hold RelCacheInitLock and e.g. slru
   locks at the same time.

The PANIC risk of (1) seems similar to the risk of PANIC at
RecordTransactionCommit() -> XLogFlush(), which hasn't been a problem.  The
checkpoint-related risk bothers me more, and (1) generally makes it harder to
reason about checkpoint interactions.  The lock order risk of (2) feels
tolerable.  I'm leaning toward (2), but that might change.  Other preferences?

Another decision is what to do about LogLogicalInvalidations().  Currently,
inplace update invalidations do reach WAL via LogLogicalInvalidations() at the
next CCI.  Options:

a. Within logical decoding, cease processing invalidations for inplace
   updates.  Inplace updates don't affect storage or interpretation of table
   rows, so they don't affect logicalrep_write_tuple() outcomes.  If they did,
   invalidations wouldn't make it work.  Decoding has no way to retrieve a
   snapshot-appropriate version of the inplace-updated value.

b. Make heap_decode() of XLOG_HEAP_INPLACE recreate the invalidation.  This
   would be, essentially, cheap insurance against invalidations having a
   benefit I missed in (a).

I plan to pick (a).

> - AtEOXact_Inval(true) is outside the RecordTransactionCommit() critical
>   section, but it is critical.  We must not commit transactional DDL without
>   other backends receiving an inval.  (When the inplace inval becomes
>   nontransactional, it will face the same threat.)

This faces the same RelationCacheInitFilePreInvalidate() decision, and I think
the conclusion should be the same as for inplace update.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Martijn Wallet
Date:
Subject: Re: processes stuck in shutdown following OOM/recovery
Next
From: Michael Paquier
Date:
Subject: Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)