Re: Inval reliability, especially for inplace updates - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Inval reliability, especially for inplace updates
Date
Msg-id 20251205001902.7a.nmisch@google.com
Whole thread Raw
In response to Re: Inval reliability, especially for inplace updates  (Paul A Jungwirth <pj@illuminatedcomputing.com>)
List pgsql-hackers
Thanks for the review.

On Wed, Dec 03, 2025 at 12:45:58PM -0800, Paul A Jungwirth wrote:
> Surya Poondla (cc'd) and I took another look at this as part of the
> November Patch Review Workshop.
> 
> We think it looks good. But I couldn't get the latest patch to apply
> on top of REL_17_STABLE until I did this:
> 
> ```
> git am inplace160-inval-durability-inplace-v7.patch_v17
> git revert bc6bad8857 # revert the revert of "WAL-log inplace update"
> git am inplace280-comment-fix-v1.patch.nocfbot # attached
> git am inplace290-comments202508-v1.patch
> ```
> 
> The inplace280 step adds a small comment change that seems to be in
> your git history, but I couldn't find it in the email chain.

inplace290 targets master.  inplace280-comment-fix-v1 is doing
s/heap_inplace_update/heap_inplace_update_and_unlock/ on decode.c, like
11012c5 did on master.  I'll incorporate inplace280-comment-fix-v1 when
back-patching.

> Also the
> 290 patch has context from reverting the WAL-log revert.

That makes sense.  I've not yet probed that level of detail.  FYI, if it ends
up making the back-patch clearer, I may split inplace290 into two patches, one
for the inplace160 bits and one for the inplace180 ("WAL-log inplace update")
bits.

> The patch avoids deadlocks by reordering invalidation prep before
> buffer locking. While no explicit assertion exists to detect future
> violations, would it be helpful to add a helper or macro that enforces
> this lock ordering rule more visibly? Probably not for a backpatch,
> but in master?

I think f4ece89 added that.  I plan to back-patch it on the same day as
$SUBJECT.

> On Sun, Aug 24, 2025 at 4:39 PM Noah Misch <noah@leadboat.com> wrote:
> > > Some of the comments felt a bit compressed.

> Thanks for expanding on this! Here are some thoughts about the new comment:
> 
> +    * Register shared cache invals if necessary.  Our input to inval can be
> +    * weaker than heap_update() input to inval in these ways:
> 
> Perhaps "than the heap_update() input" or "than heap_update()'s input"?

I think adding "the" would make it less correct, since we're contemplating a
class of inputs, not a specific population.  I think it's fine with or without
"'s", because there's no rule about treating a C function as a noun adjunct.

> +    * - This passes only the old version of the tuple.  Inval reacts only to
> +    * catcache lookup key columns and pg_class.oid values stored in
> +    * relcache-relevant catalog columns.  All of those columns are indexed.
> +    * Inplace update mustn't be used for any operations that could change
> +    * those.  Hence, the new tuple would provide no additional inval-relevant
> +    * information.  Those facts also make it fine to skip updating indexes.
> 
> This is confusing to me. "Inval only reacts": who is inval? Are you
> talking about the other backends when they receive the message? After
> spending a lot of time, I think you mean
> CacheInvalidateHeapTupleCommon, how it decides whether to invalidate
> first the catcache and then the relcache.

Right, the last interpretation.

> Also I wondered, if an
> inplace update never changes index keys, and [something] only cares
> about inval messages that change index keys, why are we sending an
> inval message at all?

An invalidation message contains only the key, not the value to cache.  Hence,
any change to a cacheable tuple must queue an invalidation message, but the
keys suffice to compute which invalidation message to insert.

The cache value is the whole tuple, but the cache keys are a function of:
- list of columns in syscache_info.h
- indexed columns that reference pg_class.oid, e.g. pg_constraint.conrelid

> After a few months away from this patch, it was
> hard for me to remember. I think the comment is a bit misleading. We
> *do* send catcache invalidations if non-key columns change. What about
> this?:
> 
> +    * - This passes only the old version of the tuple. Catcache
> invalidation doesn't need newtuple because inplace updates never
> change key columns, so it only needs to invalidate one hash value, not
> two. [For the same reason, we don't need to update indexes.] Relcache
> invalidation (in CacheInvalidateHeapTupleCommon) ignores newtuple
> altogether, even for regular updates, because an update can never move
> a tuple from one relcache entry to another.

This influenced some of my new wording.

> I bracketed the line about indexes, because I don't understand why
> we're talking about updating indexes here. I don't see anything about
> that in CacheInvalidateHeapTupleCommon or PrepareInvalidationState
> (which doesn't have access to newtuple anyway).

It's basically saying "don't worry about cache key inplace updates until we
solve inplace updates of indexed cols".  If we ever wanted to inplace-update a
cache key column, we'd first need to solve inplace-update of indexed columns,
since all cache key columns are indexed.  Solving indexed columns is hard with
the buffer locks involved, and there's no reason to expect a wish to do it.

> Also it feels like this comment (or something similar) really belongs
> on CacheInvalidateHeapTupleInplace. And that function doesn't need its
> newtuple parameter.

I like that, so I've made edits along those lines.

> +    * - The xwait found below may COMMIT between now and this function
> +    * returning, making the tuple dead.  That can change inval decisions, so
> 
> Is this third bullet point still explaining why an inplace update can
> be looser about invalidating caches than heap_update? Or is it making
> a separate point? It seems like it should be a paragraph, not a bullet
> point.

It's a separate point.  Fixed.

> Incidentally, this comment on heap_inplace_lock looks suspicious:
> 
>  * One could modify this to return true for tuples with delete in progress,
>  * All inplace updaters take a lock that conflicts with DROP.  If explicit
>  * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
>  * update.
> 
> I think it should be "While one could modify this . . . , all inplace
> updaters . . . ." Something to consider for a non-backpatch commit
> anyway.

I did break English grammar there.  But I now find the paragraph's argument
faulty, so I've rewritten it:

 * heap_delete() is a rarer source of blocking transactions (xwait).  We'll
 * wait for such a transaction just like for the normal heap_update() case.
 * Normal concurrent DROP commands won't cause that, because all inplace
 * updaters take some lock that conflicts with DROP.  An explicit SQL "DELETE
 * FROM pg_class" can cause it.  By waiting, if the concurrent transaction
 * executed both "DELETE FROM pg_class" and "INSERT INTO pg_class", our caller
 * can find the successor tuple.

I considered just deleting the paragraph as being too esoteric to be worth
hackers reading.  But if you're reading heap_inplace_lock(), esoterica is
expected.

The attached version doesn't need a comprehensive re-review, but I'd
particularly value hearing about any places where you find it's reducing
comprehensibility rather than enhancing.

Attachment

pgsql-hackers by date:

Previous
From: Mihail Nikalayeu
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Michael Paquier
Date:
Subject: Re: PG version is not seen in pg_upgrade test log