On Tue, Jul 22, 2025 at 6:53 PM Andres Freund <andres@anarazel.de> wrote:
> That may be true with local fast NVMe disks, but won't be true for networked
> storage like in common clouds. Latencies of 0.3 - 4ms leave a lot of CPU
> cycles for actual processing of the data.
I don't understand why it wouldn't be a problem for NVMe disks, too.
Take a range scan on pgbench_accounts_pkey, for example -- something
like your ORDER BY ... LIMIT N test case, but with pgbench data
instead of TPC-H data. There are 6 heap blocks per leaf page. As I
understand it, the simple patch will only be able to see up to 6 heap
blocks "into the future", at any given time. Why isn't that quite a
significant drawback, regardless of the underlying storage?
> Also, plenty indexes are on multiple columns and/or wider datatypes, making
> bubbles triggered due to "crossing-the-leaf-page" more common.
I actually don't think that that's a significant factor. Even with
fairly wide tuples, we'll still tend to be able to fit about 200 on
each leaf page. For a variety of reasons that doesn't compare too
badly to simple indexes (like pgbench_accounts_pkey), which will store
about 370 when the index is in a pristine state.
It does matter, but in the grand scheme of things it's unlikely to be decisive.
--
Peter Geoghegan