Home > mailing lists
Re: Inval reliability, especially for inplace updates - Mailing list pgsql-hackers

From	Noah Misch
Subject	Re: Inval reliability, especially for inplace updates
Date	August 25 02:39:27
Msg-id	20250824233927.89.nmisch@google.com Whole thread Raw
In response to	Re: Inval reliability, especially for inplace updates (Paul A Jungwirth <pj@illuminatedcomputing.com>)
List	pgsql-hackers
Tree view
On Thu, Jul 31, 2025 at 09:53:20AM -0700, Paul A Jungwirth wrote:
> Ian Ilyasov and I reviewed this patch. We think it is ready to commit
> to back branches.

Thank you!

> We assume you will also un-revert 8e7e672cda ("WAL-log inplace update
> before revealing it to other sessions.")? We didn't look closely at
> that patch, but it seems like there are no known problems with it. It
> was just reverted because it depends on this patch.

Right.  I'm fine self-certifying that one.

> Is there any way to add more testing around non-transactional
> invalidations? It is a new "feature" but it is not really tested
> anywhere. I don't think we could do this with regress tests, but
> perhaps isolation tests would be suitable.

I think src/test/isolation/specs/inplace-inval.spec is testing it.

> Some of the comments felt a bit compressed. They make sense in the
> context of this fix, but reading them cold seems like it will be
> challenging. For example this took a lot of thinking to follow:
> 
>      * Construct shared cache inval if necessary.  Because we pass a tuple
>      * version without our own inplace changes or inplace changes other
>      * sessions complete while we wait for locks, inplace update mustn't
>      * change catcache lookup keys.  But we aren't bothering with index
>      * updates either, so that's true a fortiori.

What do you think of the attached rewrite?  I also removed this part:

-     * If we're mutating a tuple visible only to this transaction, there's an
-     * equivalent transactional inval from the action that created the tuple,
-     * and this inval is superfluous.

That would have needed s/this transaction/this command/ to be correct, and I
didn't feel it was saying something important enough to keep.  There are
plenty of ways for invals to be redundant.

> Or this:
> 
>      * WAL contains likely-unnecessary commit-time invals from the
>      * CacheInvalidateHeapTuple() call in heap_inplace_update().
> 
> Why likely-unnecessary? I know you explain it at that callsite, but
> some hint might help here.

On further study, I think one could construct a logical decoding output plugin
for which it's necessary.  I've detailed that in the attached edits.  This was
in the heap_decode() changes, which is the part of the patch I understand the
least.  I likely should have made it a separate patch.  Here's the surrounding
change I made in 2024, in context diff format:

*** 508,530 **** heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
  
              /*
               * Inplace updates are only ever performed on catalog tuples and
!              * can, per definition, not change tuple visibility.  Since we
!              * don't decode catalog tuples, we're not interested in the
!              * record's contents.
               *
!              * In-place updates can be used either by XID-bearing transactions
!              * (e.g.  in CREATE INDEX CONCURRENTLY) or by XID-less
!              * transactions (e.g.  VACUUM).  In the former case, the commit
!              * record will include cache invalidations, so we mark the
!              * transaction as catalog modifying here. Currently that's
!              * redundant because the commit will do that as well, but once we
!              * support decoding in-progress relations, this will be important.
               */
-             if (!TransactionIdIsValid(xid))
-                 break;
- 
-             (void) SnapBuildProcessChange(builder, xid, buf->origptr);
-             ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
              break;
  
          case XLOG_HEAP_CONFIRM:
--- 508,526 ----
  
              /*
               * Inplace updates are only ever performed on catalog tuples and
!              * can, per definition, not change tuple visibility.  Inplace
!              * updates don't affect storage or interpretation of table rows,
!              * so they don't affect logicalrep_write_tuple() outcomes.  Hence,
!              * we don't process invalidations from the original operation.  If
!              * inplace updates did affect those things, invalidations wouldn't
!              * make it work, since there are no snapshot-specific versions of
!              * inplace-updated values.  Since we also don't decode catalog
!              * tuples, we're not interested in the record's contents.
               *
!              * WAL contains likely-unnecessary commit-time invals from the
!              * CacheInvalidateHeapTuple() call in heap_inplace_update().
!              * Excess invalidation is safe.
               */
              break;
  
          case XLOG_HEAP_CONFIRM:


This code had been unchanged since commit b89e1510 (v9.4) introduced logical
decoding.  I'm making the following assumptions about the old code:

- I guessed "decoding in-progress relations" was a typo for "decoding
  in-progress transactions", something we've supported since at least commit
  4648243 "Add support for streaming to built-in logical replication"
  (2020-09, v14).  If it's not a typo, I don't know what "in-progress
  relations" denoted here.

- It said "redundant because the commit will do that as well", but I didn't
  find such code.  I bet that it referenced the DecodeCommit() lines removed
  in commit c55040c "WAL Log invalidations at command end with
  wal_level=logical" (2020-07, v14).  The same commit likely made the
  ReorderBufferXidSetCatalogChanges() call obsolete.

- I had no idea why we call SnapBuildProcessChange().  Every other caller uses
  its return value.  I guess there was a desire for its snapshot side effects,
  but I didn't follow that.  Nothing snapshot-relevant happens at
  XLOG_HEAP_INPLACE.  Removing this call doesn't break any test on v13 or on
  v9.4.  Similarly, no test fails after removing both this and the
  ReorderBufferXidSetCatalogChanges() call.

- In v9.4, this area of code (per-heapam-record-type decoding) had inval
  responsibilities.  That went away in v14, so there's no need for the comment
  here to keep discussing invals.


I now think it would be prudent to omit from back-patch the non-comment
heap_decode() changes.  While the above assumptions argue against needing the
removed ReorderBufferXidSetCatalogChanges(), that's the sort of speculation I
should keep out back-branch changes.

> It's a bit surprising that wrongly leaving relhasindex=t is safe (for
> example after BEGIN; CREATE INDEX; ROLLBACK;). I guess this column is
> just to save us a lookup for tables with no index, and no harm is done
> if we do the lookup needlessly but find no indexes.

> And vacuum can
> repair it later. Still it's a little unnerving.

DROP INDEX doesn't clear it, either.  That's longstanding, and it doesn't
involve narrow race conditions.  Hence, I'm not worrying about it.  If it were
broken, we'd have heard by now.

> On Thu, Oct 31, 2024 at 09:20:52PM -0700, Noah Misch wrote:
> > Here, one of the autovacuum workers had the guilty stack trace, appearing at
> > the end of this message.  heap_inplace_update_and_unlock() calls
> > CacheInvalidateHeapTupleInplace() while holding BUFFER_LOCK_EXCLUSIVE on a
> > buffer of pg_class.  CacheInvalidateHeapTupleInplace() may call
> > CatalogCacheInitializeCache(), which opens the cache's rel.  If there's not a
> > valid relcache entry for the catcache's rel, we scan pg_class to make a valid
> > relcache entry.  The ensuing hang makes sense.
> 
> Personally I never expected that catcache could depend on relcache,
> since it seems lower-level. But it makes sense that you need a
> relcache of pg_class at least, so their relationship is more
> complicated than just layers.

Yes, relcache.c is indeed ... eclectic.

>  PrepareToInvalidateCacheTuple(Relation relation,
>                                HeapTuple tuple,
>                                HeapTuple newtuple,
> -                              void (*function) (int, uint32, Oid))
> +                              void (*function) (int, uint32, Oid, void *),
> +                              void *context)
> 
> It's a little odd that PrepareToInvalidateCacheTuple takes a callback
> function when it only has one caller, so it always calls
> RegisterCatcacheInvalidation. Is it just to avoid adding dependencies
> to inval.c? But it already #includes catcache.h and contains lots of
> knowledge about catcache specifics. Maybe originally
> PrepareToInvalidateCacheTuple was built to take
> RegisterRelcacheInvalidation as well? Is it worth still passing the
> callback?

Looking at the history, it did have two callbacks in Postgres95 1.01.  It was
down to one callback when it got the name PrepareToInvalidateCacheTuple() in
commit 81d08fc of 2001-01.  I think the main alternative to today's callback
would be to make RegisterCatcacheInvalidation() an extern for catcache.c to
call.  All the other Register* functions would remain static.  I left things
unchanged since that would be cleaner in one way and dirtier in another.

> @@ -6511,6 +6544,7 @@ heap_inplace_unlock(Relation relation,
>  {
>      LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>      UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
> +    ForgetInplace_Inval();
>  }
> 
> Is this the right place to add this? We think on balance yes, but the
> question crossed my mind: Clearing the invals seems like a separate
> responsibility from unlocking the buffer & tuple. After this patch,
> our only remaining caller of heap_inplace_unlock is
> systable_inplace_update_cancel, so perhaps it should call
> ForgetInplace_Inval itself? OTOH we like that putting it here
> guarantees it gets called, as a complement to building the invals in
> heap_inplace_lock.

Style principles behind the placement:

- Inplace update puts some responsibilities in genam.c and others in heapam.c.
  A given task, e.g. draining the inval queue, should consistently appear on
  the same side of that boundary.

- The heapam.c side of inplace update should cover roughly as much as the
  heapam.c side of heap_update().  Since heap_update() handles invals and the
  critical section, so should the heapam.c side of inplace update.

It wouldn't take a strong reason to override those principles.

postgr.es/m/20240831011043.2b.nmisch@google.com has an inventory of the ways
to assign inval responsibility to heapam.c vs. genam.c.
postgr.es/m/20241013004511.9c.nmisch@google.com has a related discussion,
about the need for AtInplace_Inval() to be in the critical section.

On Sun, Aug 03, 2025 at 12:32:31PM -0700, Paul A Jungwirth wrote:

[details of the hang]

> The concurrency, then, is analyzing pg_attribute in one worker and
> pg_class in another. No locks would prevent that. It doesn't have to
> be pg_attribute; it could be some user table. But since the repro
> script adds & drops tables, pg_attribute is getting churned.

Thanks for that detailed write-up!  I agree with your findings.
Attachment

inplace290-comments202508-v1.patch
pgsql-hackers by date:
From: Thomas Munro
Date: 25 August, 01:43:21
Subject: Re: Non-reproducible AIO failure
From: Nico Williams
Date: 25 August, 02:42:29
Subject: Re: Non-reproducible AIO failure
Re: Inval reliability, especially for inplace updates - Mailing list pgsql-hackers

Attachment

Previous

Next