Re: Expanding HOT updates for expression and partial indexes - Mailing list pgsql-hackers

From Greg Burd
Subject Re: Expanding HOT updates for expression and partial indexes
Date
Msg-id 092E6AFE-81DA-4869-91C7-8F9F5A7541E5@greg.burd.me
Whole thread Raw
In response to Re: Expanding HOT updates for expression and partial indexes  (Greg Burd <greg@burd.me>)
List pgsql-hackers
On Nov 19 2025, at 1:00 pm, Greg Burd <greg@burd.me> wrote:

> 
> On Nov 16 2025, at 1:53 pm, Greg Burd <greg@burd.me> wrote:
> 
>> 0004 - Enable HOT updates for expression and partial indexes
>> 
>> This finally gets us back to where this project started, but on much
>> more firm ground than before because we're not going to
>> self-deadlock. 
>> The idea has grown from a small function into something larger, but only
>> out of necessity.
>> 
>> In this patch I add ExecWhichIndexesRequireUpdates() in execIndexing.c
>> which implements (c) finding the set of attributes that force new index
>> updates.  This set can be very different from the modified indexed
>> attributes.  We know that some attributes are not equal to their
>> previous versions, but does that mean that the index that references
>> that attribute needs a new index tuple?  It may, or it may not.  Here's
>> the comment on that function that explains:
>> 
>> /*
>> * ExecWhichIndexesRequireUpdates
>> *
>> * Determine which indexes need updating given modified indexed attributes.
>> * This function is a companion to ExecCheckIndexedAttrsForChanges(). 
>> On the
>> * surface, they appear similar but they are doing two very different things.
>> *
>> * For a standard index on a set of attributes this is the
>> intersection of
>> * the mix_attrs and the index attrs (key, expression, but not predicate).
>> *
>> * For expression indexes and indexes which implement the amcomparedatums()
>> * index AM API we'll need to form index datum and compare each
>> attribute to
>> * see if any actually changed.
>> *
>> * For expression indexes the result of the expression might not change
>> at all,
>> * this is common with JSONB columns which require expression indexes
>> and where
>> * it is commonplace to index a field within a document and have
>> updates that
>> * generally don't update that field.
>> *
>> * Partial indexes won't trigger index tuples when the old/new tuples
>> are both
>> * outside of the predicate range.
>> *
>> * For nbtree the amcomparedatums() API is critical as it requires
>> that key
>> * attributes are equal when they memcmp(), which might not be the
>> case when
>> * using type-specific comparison or factoring in collation which
>> might make
>> * an index case insensitive.
>> *
>> * All of this is to say that the goal is for the executor to know,
>> ahead of
>> * calling into the table AM for the update and before calling into
>> the index
>> * AM for inserting new index tuples, which attributes at a minimum will
>> * necessitate a new index tuple.
>> *
>> ...
>> */
> 
> Attached are rebased (d5b4f3a6d4e) patches with the only changes
> happening in the last patch in the series.
> 
> 0004 - Enable HOT updates for expression and partial indexes
> 
> I was never happy with the dual functions
> ExecCheckIndexedAttrsForChanges() and ExecWhichIndexesRequireUpdates(),
> it felt like too much overhead and duplication of effort.  While
> updating my tests, adding a few cases, I found that there was also a
> flaw in the logic.  So, time to rewrite and combine them.
> 
> What did I discover?  Before the logic was to find the set of modified
> indexed attributes then review all the indexes for changed attributes
> using FormIndexDatum() and comparing before/after to see if expressions
> really changed the value to be indexed or not.  The first pass didn't
> take into account expressions, the second did.  So, an expression index
> over JSONB data wouldn't extract and test the field within the document,
> it was just comparing the entire document before/after using the jsonb
> comparison function, no bueno.
> 
> This approach wraps both functions into one somewhat simplified
> function. The logic is basically, iterate over the indexes reviewing
> indexed attributes for changes.  Along the way we call into the new
> index AM's comparison function when present, otherwise we find and use
> the proper type-specific comparison function for the datum.  At the end
> of the function we have our Bitmapset of attributes that should trigger
> new index tuples.
> 
>> What's left undone?
>> 
>> * I need to check code coverage so that I might
> 
> I did this and it was quite good, I'll do it again for this new series
> but it's nice to see that the tests are exercising the vast majority of
> the code paths.
> 
>> * create tests covering all the new cases
> 
> I think the coverage is good, maybe even redundant or overly complex
> in places.
> 
>> * update the README.HOT documentation, wiki, etc.
> 
> Soon, I hope to have this approach solid and under review before
> solidifying the docs.
> 
>> * performance...
> 
> Still as yet unmeasured, I know that there is more work per-update to
> perform these checks, so some overhead, but I don't know if that
> overhead is more than before with HeapDetermineColumnsInfo() and
> index_unchanged_by_update().  Those two functions did essentially the
> same thing, only with binary comparison (datumIsEqual()). I need to
> measure that.  What about doing all this work outside of the buffer lock
> in heap_update()?  Surely that'll give back a bit or at least add to
> concurrency.  Forming index tuples a few extra times and evaluating the
> expressions 3 times rather than 1 is going to hurt, I think I can come
> up with a way to cache the formed datum and use it later on, but is that
> worth it?  Complex expressions, yes.  Also, what about expressions that
> expect to be executed once... and now are 3x?  That's what forced my
> update to the insert-conflict-specconflict.out test, but AFAICT there is
> no way to test if an expression's value is going to change on update
> without exercising it once for the old tuple and once for the new tuple.
> Even if it were possible for an index to provide the key it might have
> changed after the expression evaluation (as is the case in hash), so I
> don't think this is avoidable.  Maybe that's reason enough to add a
> reloption to disable the expression evaluation piece of this?  Given
> that it might create a logic or performance regression.  The flip side
> is the potential to use the HOT path, that's a real savings.
> 
> One concerning thing is that nbtree's assumption that key attributes for
> TIDs must use binary comparison for equality.  This means that for our
> common case (heap/btree) there is more work per-update than before,
> which is why I need to measure.  I could look into eliminating the
> nbtree requirement, I don't understand it too well as yet by I believe
> that on page split there is an attempt to deduplicate TIDs into a
> TIDBitmap and the test for when that's possible is datumIsEqual().  If
> that were the same as in this new code, possibly evening using
> tts_attr_equal(), then... I don't know, I'll have to investigate.  Chime
> in here if you can educate me on this one. :)
> 
> best.
> 
> -greg

Doh!

I forgot to commit the fixed regression test expected output before
formatting the patch set, here it is.

-greg

Attachment

pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: PRI?64 vs Visual Studio (2022)
Next
From: Antonin Houska
Date:
Subject: Avoid unnecessary processing of DISTINCT clause (Was: Unique Keys)