On Sat, Dec 6, 2025 at 3:07 AM Victor Yegorov <vyegorov@gmail.com> wrote:
> I like this change and I agree that it's both handy and gives an easy performance boost.
There's a number of things that I find counterintuitive about the
performance impact of this patch:
* It doesn't make very much difference (under a 1% improvement) on
similar index-only scans, with queries such as "SELECT count(*) FROM
pgbench_accounts WHERE aid between :aid AND :endrange". One might
expect this case to be improved at least as much as the plain index
scan case, since the relative cost of _bt_readpage is higher (since
it's much cheaper to access the visibility map than to access the
heap).
* The patch makes an even larger difference when I make pgbench use a
larger range of values in the "between". For example, if I increase
the number of values/tuples returned from 2,000 (which is what I used
initially/reported on in the first email) to 15,000, I find that the
patch increases TPS by as much as 5.5%.
* These queries make maximum use of the _bt_set_startikey optimization
-- most individual leaf pages don't need to evaluate any scan key
(after an initial page-level check within _bt_set_startikey). So the
patch really helps in exactly those cases where we don't truly need to
access the scan direction at all -- the loop inside _bt_check_compare
always has 0 iterations with these queries, which means that scan
direction doesn't actually ever need to be considered at that point.
My best guess is that the benefits I see come from eliminating a
dependent load. Without the second patch applied, I see this
disassembly for _bt_checkkeys:
mov rax,QWORD PTR [rdi+0x38] ; Load scan->opaque
mov r15d,DWORD PTR [rax+0x70] ; Load so->dir
A version with the second patch applied still loads a pointer passed
by the _bt_checkkeys caller (_bt_readpage), but doesn't have to chase
another pointer to get to it. Maybe this significantly ameliorates
execution port pressure in the cases where I see a speedup?
> Patch applies and compiles cleanly. I can barely see a performance boost on my end (VM on a busy host), round 1%, but
Istill consider this change beneficial.
It seems to have no downsides, and some upside. I wouldn't be
surprised if the results I'm seeing are dependent on
microarchitectural details. I myself use a Zen 3 chip (a Ryzen 9
5950X).
--
Peter Geoghegan