Hi,
On 2025-07-16 15:39:58 -0400, Andres Freund wrote:
> Looking at the actual patches now.
I just did an initial, not particularly in depth look. A few comments and
questions below.
For either patch, I think it's high time we split the index/table buffer stats
in index scans. It's really annoying to not be able to see if IO time was
inside the index itself or in the table. What we're discussing here obviously
can never avoid stalls due to fetching index pages, but so far neither patch
is able to fully utilize hardware when bound on heap fetches, but that's
harder to know without those stats.
The BufferMatches() both patches add seems to check more than needed? It's not
like the old buffer could have changed what relation it is for while pinned.
Seems like it'd be better to just keep track what the prior block was and not
go into bufmgr.c at all.
WRT the complex patch:
Maybe I'm missing something, but the current interface doesn't seem to work
for AMs that don't have a 1:1 mapping between the block number portion of the
tid and the actual block number?
Currently the API wouldn't easily allow the table AM to do batched TID lookups
- if you have a query that looks at a lot of table tuples in the same buffer
consecutively, we spend a lot of time locking/unlocking said buffer. We also
spend a lot of time dispatching from nodeIndexscan.c to tableam in such
queries.
I'm not suggesting to increase the scope to handle that, but it might be worth
keeping in mind.
I think the potential gains here are really substantial. Even just not having
to lock/unlock the heap block for every tuple in the page would be a huge win,
a quick and incorrect hack suggests it's like 25% faster A batched
heap_hot_search_buffer() could be a larger improvement, it's often bound by
memory latency and per-call overhead.
I see some slowdown for well-cached queries with the patch, I've not dug into
why.
WRT the simple patch:
Seems to have the same issue that it assumes TID block numbers correspond to
actual disk location?
Greetings,
Andres Freund