Re: Expanding HOT updates for expression and partial indexes - Mailing list pgsql-hackers
| From | Greg Burd |
|---|---|
| Subject | Re: Expanding HOT updates for expression and partial indexes |
| Date | |
| Msg-id | db765e0d-4674-4547-9a9d-6d4d9a0a123c@app.fastmail.com Whole thread Raw |
| In response to | Re: Expanding HOT updates for expression and partial indexes (Nathan Bossart <nathandbossart@gmail.com>) |
| Responses |
Re: Expanding HOT updates for expression and partial indexes
|
| List | pgsql-hackers |
On Mon, Mar 23, 2026, at 2:39 PM, Nathan Bossart wrote:
> Thanks for the new patch. As a general note, please be sure to run
> pgindent on patches. My review is still rather surface-level, sorry.
Hello Nathan,
Thanks for continuing to review my work. I appreciate your time. I do run pgindent on all patches, maybe something
slippedit. Apologies if that's the case. :)
> On Tue, Mar 17, 2026 at 02:04:11PM -0400, Greg Burd wrote:
>> - id_attrs = RelationGetIndexAttrBitmap(relation,
>> - INDEX_ATTR_BITMAP_IDENTITY_KEY);
>> [...]
>> + rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
>
> I'm nitpicking, but it took me a while to parse the
> replica-identity-related code in heap_update() until I discovered that this
> variable was renamed. I think we ought to leave the name alone.
Okay, reverted to "id_attrs".
>> /*
>> * At this point newbuf and buffer are both pinned and locked, and newbuf
>> - * has enough space for the new tuple. If they are the same buffer, only
>> - * one pin is held.
>> + * has enough space for the new tuple so we can use the HOT update path if
>> + * the caller determined that it is allowable.
>> + *
>> + * NOTE: If newbuf == buffer then only one pin is held.
>> */
>> -
>> if (newbuf == buffer)
>
> Sorry, more nitpicks. In addition to the unnecessary removal of the blank
> line, I'm not sure the changes to this comment are needed.
Okay, reverted to earlier comment and blank line re-inserted. :)
>> - /*
>> - * If it is a HOT update, the update may still need to update summarized
>> - * indexes, lest we fail to update those summaries and get incorrect
>> - * results (for example, minmax bounds of the block may change with this
>> - * update).
>> - */
>> - if (use_hot_update)
>> - {
>> - if (summarized_update)
>> - *update_indexes = TU_Summarizing;
>> - else
>> - *update_indexes = TU_None;
>> - }
>> - else
>> - *update_indexes = TU_All;
>
> So, the "HOT but still need to update summarized indexes" code has been
> moved from heap_update() to HeapUpdateHotAllowable(), which is called by
> heap_update()'s callers (i.e., simple_heap_update() and
> heapam_tuple_update()). That looks correct to me at a glance.
Yes, that's indeed what that is.
>> -simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
>> +simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
>
> nitpick: This variable name change looks unnecessary.
Okay, reverted to "tup".
>> @@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
>> if (rel->rd_rel->relispartition)
>> ExecPartitionCheck(resultRelInfo, slot, estate, true);
>>
>> + modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
>> + estate, searchslot, slot);
>> +
>> simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
>> - &update_indexes);
>> + modified_idx_attrs, &update_indexes);
>> + bms_free(modified_idx_attrs);
>
> I don't know how constructive of a comment this is, but this change in
> particular seems quite out of place. It feels odd to me that we expect
> callers of simple_table_tuple_update() to determine the
> modified-index-attributes. I guess I'm confused why this work doesn't
> belong one level down, i.e., in the tuple_update function.
Problem is that simple_table_tuple_update() has the old TID and the new slot, but not the old slot (searchslot), so I'd
haveto change the signature of that function either way. Passing modified_idx_attrs is the new pattern, so I am just
reusingthat here.
I could replicate what's in simple_heap_update() and call HeapUpdateModifiedIdxAttrs() after re-constructing the
HeapTuple,but that feels very ugly/unnecessary to me given that the caller has that information already in slot form.
I've left this as is, but I'm happy to continue discussing options.
>> - * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
>> + * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
>> + * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
>
>> - Bitmapset *summarizedattrs; /* columns with summarizing indexes */
>> + Bitmapset *indexedattrs; /* columns referenced by indexes */
>> + Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
>
> As before, the comment changes for the summarized-attr-related stuff seem
> unnecessary.
I disagree, the "only" is required to highlight the logic change here. Before this patch summarized attrs could overlap
withindexed attrs, now it should not. This makes the logic a bit easier later in HeapUpdateHotAllowable().
>> if (indexDesc->rd_indam->amsummarizing)
>> attrs = &summarizedattrs;
>> else
>> - attrs = &hotblockingattrs;
>> + attrs = &indexedattrs;
>
>> + /*
>> + * Record what attributes are only referenced by summarizing indexes. Then
>> + * add that into the other indexed attributes to track all referenced
>> + * attributes.
>> + */
>> + summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
>> + indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
>
> The difference between hotblockingattrs and indexedattrs seems quite
> subtle.
I feel it was *much* more subtle before and mis-named ("hot blocking"). But, let's review. On master today in
heapam.cheap_update() near the start and before the buffer lock there is the following:
hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_HOT_BLOCKING);
sum_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_SUMMARIZED);
key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
It turns out that hot_attrs includes all INDEX_ATTR_BITMAP_IDENTITY_KEY and INDEX_ATTR_BITMAP_IDENTITY_KEY except for
thosefound when scanning a summarizing index. So, what comes next is a bit wasteful.
interesting_attrs = NULL;
interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
interesting_attrs = bms_add_members(interesting_attrs, key_attrs); <- unnecessary
interesting_attrs = bms_add_members(interesting_attrs, id_attrs); <- unnecessary
And that in the end, what's passed to HeapDetermineColumnsInfo() is all indexed attributes, including summarized, and
thosefound in expressions. That function then reduces the set from "interesting" to "modified (and indexed is
implied)". It does this by testing before/after datum for equality (memcmp() via datumIsEqual()) and that becomes our
"modified_attrs"set used for lockmode and HOT eligibility tests.
When testing later on (on master) for the HOT or NOT decision the code:
if (newbuf == buffer)
{
// first test to see if any modified/indexed attributes are used
// by non-summarizing indexes
if (!bms_overlap(modified_attrs, hot_attrs))
{
// and if not, we're going HOT
use_hot_update = true;
// at this point if there is any overlap it means that the only
// attributes that might be referenced by an index and modified
// are summarizing, there can't be any non-summarizing attributes
// in the modified_attrs set otherwise our first test would have
// failed, so this tests for the "only summarizing" case
if (bms_overlap(modified_attrs, sum_attrs))
only_summarized = true;
}
}
My thinking was, why re-create this every update? Why not have the cached representation of these bitmaps have what's
needed?
Now, I've changed the logic in this patch. First in the executor nodeModifyTable.c ExecUpdateModifiedIdxAttrs()
identifywhich indexed attributes were modified (changed value):
// get all attributes indexed on a relation, including summarized
// note how we no longer construct "interesting_attrs" from a number
// of bitmaps, the map we want is the map we cached and the name matches
// the content, *all* indexed attributes (not indexed, but not summarized)
attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
// compare the old/new reducing the set to only those that changed
// as determined by datum_is_equal() to produce the modified/indexed
// attribute set
attrs = ExecCompareSlotAttrs(attrs, tupdesc, old_tts, new_tts);
Then in heapam_handler.c heapam_tuple_update():
// call our helper function
hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
HeapUpdateHotAllowable()
{
// if no indexed attributes were modified, we're done
if (bms_is_empty(modified_idx_attrs))
return true;
else
{
// now we need the *only* summarized attributes
Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_SUMMARIZED);
// if the modified set is a sumset of the summarized,
// we're only updating summarized
if (bms_is_subset(modified_idx_attrs, sum_attrs))
{
hot_allowed = true;
*summarized_only = true;
}
else
// at least one attribute is modified, referenced by an index
// that isn't summarizing, HOT isn't allowed
hot_allowed = false;
bms_free(sum_attrs);
}
}
So, we go from 3 calls to RelationGetIndexAttrBitmap() to 1, or at most 2 when there's a summarizing index (which is
frequentlythe case).
This feels more logical, cleaner, and has less overhead but supports the same HOT logic.
> Am I understanding correctly that indexedattrs is essentially just
> hotblockingattrs + summarizedattrs? And that this is all meant for
> INDEX_ATTR_BITMAP_INDEXED?
>
> - INJECTION_POINT("heap_update-before-pin", NULL);
> + INJECTION_POINT("simple_heap_update-before-pin", NULL);
>
> Why was this changed in heap_update()?
Oops, that's a mistake. Fixed it.
> --
> nathan
Thanks for your review, v38 attached.
best.
-greg
Attachment
pgsql-hackers by date: