Re: Expanding HOT updates for expression and partial indexes - Mailing list pgsql-hackers

From Greg Burd
Subject Re: Expanding HOT updates for expression and partial indexes
Date
Msg-id 97769442-2ed7-4fe0-a393-7e2b5ff7ff58@app.fastmail.com
Whole thread Raw
In response to Re: Expanding HOT updates for expression and partial indexes  (Greg Burd <greg@burd.me>)
Responses Re: Expanding HOT updates for expression and partial indexes
List pgsql-hackers
Hello,

TL;DR, I'm going to put a pin in this idea for now.

I'll net out where this patch set is quickly and leave the majority of the
detail for anyone curious in an addendum below.

The patch set enables HOT updates a few new cases:
* when expressions on indexes remain constant
* when updates fall outside partial index predicates
* when indexes report that their keys are unchanged by the update

The benefits are:
* more HOT updates
* reduced index bloat
* modified index attributes identified ahead of table AM update

The drawbacks are:
* the overhead of doing net-new work determining modified attributes
* higher concurrency causing retries, compounding that work
* index and predicate expressions executed more frequently

Performance:
------------

The good,
* HOT improvements: 95-100%
* Bloat reduction: 50-98%
* Update 100 rows at once: +27% throughput, -98% bloat

the bad,
* -2% to -6% throughput regression on updates across the board

and the ugly.
- net new work finding modified indexed attributes
- net new retries on update due to added concurrency (TU_Updated)
- retries then must re-identify modified indexed attributes


While I still believe in expanding HOT updates and that moving the logic for
finding the set of modified index attributes into the executor was a good idea,
it is late in the v19 cycle and at this stage the trade-offs feel unacceptable.

So I'm going to withdraw this patch and pause here on development and reconsider
my approach in the v20 cycle. I have some ideas on how to avoid the need to
evaluate these expressions entirely and I still have a fondness for the idea of
only updating those indexes that changed (PHOT/WARM/whatever) on update, maybe
I'll combine the ideas.

best.

-greg


ADDENDUM:

ExecUpdate (nodeModifyTable.c)
  -> [PATCHED] after the l2: label, ExecCheckIndexedAttrsForChanges()
  -> [UNPATCHED] (nothing at this point...)
  -> table_tuple_update()
    -> heap_update()
      -> LockBuffer(BUFFER_LOCK_EXCLUSIVE)
      -> [UNPATCHED] HeapDetermineColumnsInfo() // uses memcmp() for datum comparison
      -> returns TM_Updated immediately, frees modified_attrs
  -> EvalPlanQual()  // recheck with updated tuple
  -> ExecUpdate (retry)
    -> [PATCHED] ExecCheckIndexedAttrsForChanges()  // second evaluation
    -> [UNPATCHED] (nothing at this point...)
    -> table_tuple_update()
      -> heap_update()
        -> LockBuffer(BUFFER_LOCK_EXCLUSIVE)
        -> HeapDetermineColumnsInfo()  // second evaluation

As illustrated above, the patches move indexed attribute checking from
HeapDetermineColumnsInfo() (called once in heap_update under buffer lock
BUFFER_LOCK_EXCLUSIVE) to ExecCheckIndexedAttrsForChanges() (called in the
executor before calling table_tuple_update() when the buffer with the old tuple
is pinned, but not holding locked exclusively). In the unpatched code, when
heap_update() returns TM_Updated due to a concurrent update, it immediately
returns to the executor without retrying - the modified_attrs bitmapset is freed
and the executor performs an EPQ (EvalPlanQual) recheck before calling
table_tuple_update() again. On this second call, the unpatched code re-executes
HeapDetermineColumnsInfo() with the new tuple because the modified_attrs may
have changed. The patched code similarly calls ExecCheckIndexedAttrsForChanges()
again, as it will have jumped to the l2: label, before retrying
table_tuple_update().

Both versions re-evaluate to find the modified attributes bitmap on concurrent
conflicts, the key differences are:

1) the patched version performs richer comparisons (partial index predicates,
   index AM amcomparedatums(), expression evaluation) while the unpatched
   version only does binary datum comparison using memcmp() and,

2) under high contention running outside the buffer lock has increased the
   probability TM_Updated because he window for concurrent modifications has
   increased.

It may be that expanding the window for concurrency (2) during updates is a
mistake as it is impossible for the heap to update two different tuples on the
same page concurrently so this has no potential benefit and compounds the
expense.

However, it is entirely possible that other table AMs could have different
concurrency models and so in theory we should allow for that.  Consider how some
LSMs work, concurrent updates are just appends to the current log file. Maybe
the table AM should indicate a need for a buffer lock or not during updates?


PATCHES:

0001: Refactors heapam_tuple_update()/heap_update() by removing some code in the
      beginning of heap_update() into both heapam_tuple_update() and
      simple_heap_update() as a preface for removing the need for
      HeapDetermineColumnsInfo() when calling heapam_tuple_update().

0002: Moves the detection of modified indexed columns from within heap into the
      executor in a new function ExecWhichIndexesRequireUpdates() that replaces
      HeapDetermineColumnsInfo().

      This shift now detects the set of modified indexed columns earlier in the
      query execution process (just after the l2: label in nodeModifyTable.c)
      and crucially outside of an exclusive buffer page lock as was the case
      before in heap_update().

      ExecCheckIndexedAttrsForChanges() returns a Bitmapset, identical to
      HeapDetermineColumsInfo(), of the modified indexed columns. While similar
      in spirit to HeapDetermineColumsInfo(), this new function also evaluates
      expressions, and potentially calls into a new index AM API
      amcomparedatums(). This is the key change that enables HOT updates when
      expressions are unchanged on update.

      The new optional index AM function amcomparedatums() allows index
      implementations to define their own custom logic for determining if a
      datum has changed and if that change requires that the index be updated
      or not. This has been implemented for GIN and HASH.

      The patch also updates the replication logic in execReplication() to
      correctly provide the set of modified indexed attributes extracted from
      the replicated tuples to avoid overhead of rediscovery later on.

      Catalog tuple updates are unchanged and continue to use
      simple_heap_update() which in turn calls HeapDetermineColumsInfo(). Note
      that I proposed a separate patch set to remove HeapDetermineColumsInfo()
      from simple_heap_update() by changing the way catalog tuples are updated,
      but that has since been abandoned. See cf-6221 for more details on this.

0003: Removes the now-redundant index_unchanged_by_update() function. Its
      functionality is superseded by simply reusing the Bitmapset created by
      ExecWhichIndexesRequireUpdates() and stored in ri_ChangedIndexedCols.

0004: Moves the evaluation of partial index predicates from after the table
      update to before it, within the ExecWhichIndexesRequireUpdates(). If both
      the old and new tuples are outside the index's predicate, the index is not
      considered impacted, and so the attributes in the predicate are not
      recorded in the resulting Bitmapset potentially allowing for a HOT update.


PERFORMANCE TEST RESULTS:

Platform: FreeBSD 15 (Intel NUC)
Duration: 600 seconds (10 minutes) per-test, ~3.5 hours total
Baseline: 8ebdf41c262 (origin/master)
Patched:  bdfe58c4537 (origin/master + 4 patches)

HOT Update Percentages:
────────────────────────────────────────────────────────────────
Test                              Baseline    Patched        Δ
────────────────────────────────────────────────────────────────
JSONB BTREE expression index          0.0%      95.0%      95.0%
JSONB GIN expression index            0.0%      98.1%      98.1%
Partial index                         0.0%      96.3%      96.3%
3 expression indexes                  0.0%      98.2%      98.2%
6 expression indexes                  0.0%     100.0%     100.0%
Array expression index                0.0%      97.8%      97.8%
UPDATE 100 rows                       0.0%      99.7%      99.7%
Control: indexed field changes       47.7%      48.2%        .5%
Control: expensive expression         0.0%       0.0%         0%
Control: Induced TU_Updated           0.1%       0.1%         0%

Throughput (10 clients):
────────────────────────────────────────────────────────────────
Test                              Baseline    Patched        Δ
────────────────────────────────────────────────────────────────
JSONB BTREE expression index         563.6      543.2      -3.6%
JSONB GIN expression index           556.5      536.6      -3.5%
Partial index                        415.6      406.4      -2.2%
3 expression indexes                 553.6      535.1      -3.3%
6 expression indexes                 528.9      511.8      -3.2%
Array expression index               560.7      542.6      -3.2%
UPDATE 100 rows                      840.9     1066.7      26.8%
Control: indexed field changes       564.6      529.0      -6.3%
Control: expensive expression        563.4      527.3      -6.4%
Control: Induced TU_Updated         1677.5     1730.4       3.1%

Index Bloat After Updates (MB):
────────────────────────────────────────────────────────────────
Test                              Baseline    Patched  Reduction
────────────────────────────────────────────────────────────────
JSONB BTREE expression index           8.2        2.9      64.6%
JSONB GIN expression index             8.8        6.3      27.3%
Partial index                          4.3        2.1      49.8%
3 expression indexes                  13.4        4.1      68.7%
6 expression indexes                  26.7        5.9      77.7%
Array expression index                12.8        4.3      66.5%
UPDATE 100 rows                      453.5        6.9      98.4%
Control: indexed field changes         4.6        4.5        .6%
Control: expensive expression         27.2       25.9       4.8%
Control: Induced TU_Updated            7.9        7.9         0%

Attachment

pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Tatsuya Kawata
Date:
Subject: Re: [PATCH] Add sampling statistics to autoanalyze log output