Re: index prefetching - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: index prefetching
Date
Msg-id CAH2-Wz=xSEsrmL37JHMwCuQ=J0AuKbOdyuc6ODeP-53-4-Ykqw@mail.gmail.com
Whole thread Raw
In response to Re: index prefetching  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: index prefetching
List pgsql-hackers
On Tue, Jul 22, 2025 at 9:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I'll get back to you on this soon. There are plenty of indexes that
> are not perfectly correlated (like pgbench_accounts_pkey is) that'll
> nevertheless benefit significantly from the approach taken by the
> complex patch.

I'll give you a few examples that look like this. I'm not necessarily
suggesting that you adopt these example indexes into your test suite
-- these are just to stimulate discussion.

* The TPC-C order line table primary key.

This is the single largest index used by TPC-C, by quite some margin.
This is the index that my Postgres 12 "split after new tuple"
optimization made about 40% smaller with retail inserts -- I've
already studied it in detail.

It's a composite index on 4 integer columns, where each leaf page
contains about 260 index tuples. Note that this is true regardless of
whether retail inserts or CREATE INDEX were used (thanks to the
"split after new tuple" thing). And yet I see that nhblks is even
*lower* than pgbench_accounts: the average per-leaf-page nhblks is about
4 or 5. While the odd leaf page has an nhblks of 7 or 8, some
individual leaf pages that are full, but have an nhblks of only 4.

I would expect such an index to benefit to the maximum possible extent
from the complex patch/eager leaf page reading. This is true in spite
of the fact that technically the overall correlation is weak.

Here's what a random leaf page looks like, in terms of TIDs:

┌────────────┬───────────┐
│ itemoffset │   htid    │
├────────────┼───────────┤
│          1 │ ∅         │
│          2 │ (1510,55) │
│          3 │ (1510,56) │
│          4 │ (1510,57) │
│          5 │ (1510,58) │
│          6 │ (1510,59) │
│          7 │ (1510,60) │
│          8 │ (1510,61) │
│          9 │ (1510,62) │
│         10 │ (1510,63) │
│         11 │ (1510,64) │
│         12 │ (1510,65) │
│         13 │ (1510,66) │
│         14 │ (1510,67) │
│         15 │ (1510,68) │
│         16 │ (1510,69) │
│         17 │ (1510,70) │
│         18 │ (1510,71) │
│         19 │ (1510,72) │
│         20 │ (1510,73) │
│         21 │ (1510,74) │
│         22 │ (1510,75) │
│         23 │ (1510,76) │
│         24 │ (1510,77) │
│         25 │ (1510,78) │
│         26 │ (1510,79) │
│         27 │ (1510,80) │
│         28 │ (1510,81) │
│         29 │ (1517,1)  │
│         30 │ (1517,2)  │
│         31 │ (1517,3)  │
│         32 │ (1517,4)  │
│         33 │ (1517,5)  │
│         34 │ (1517,6)  │
│         35 │ (1517,7)  │
│         36 │ (1517,8)  │
│         37 │ (1517,9)  │
│         38 │ (1517,10) │
│         39 │ (1517,11) │
│         40 │ (1517,12) │
│         41 │ (1517,13) │
│         42 │ (1517,14) │
│         43 │ (1517,15) │
│         44 │ (1517,16) │
│         45 │ (1517,17) │
│         46 │ (1517,18) │
│         47 │ (1517,19) │
│         48 │ (1517,20) │
│         49 │ (1517,21) │
│         50 │ (1517,22) │
│         51 │ (1517,23) │
│         52 │ (1517,24) │
│         53 │ (1517,25) │
│         54 │ (1517,26) │
│         55 │ (1517,27) │
│         56 │ (1517,28) │
│         57 │ (1517,29) │
│         58 │ (1517,30) │
│         59 │ (1517,31) │
│         60 │ (1517,32) │
│         61 │ (1517,33) │
│         62 │ (1517,34) │
│         63 │ (1517,35) │
│         64 │ (1517,36) │
│         65 │ (1517,37) │
│         66 │ (1517,38) │
│         67 │ (1517,39) │
│         68 │ (1517,40) │
│         69 │ (1517,41) │
│         70 │ (1517,42) │
│         71 │ (1517,43) │
│         72 │ (1517,44) │
│         73 │ (1517,45) │
│         74 │ (1517,46) │
│         75 │ (1517,47) │
│         76 │ (1517,48) │
│         77 │ (1517,49) │
│         78 │ (1517,50) │
│         79 │ (1517,51) │
│         80 │ (1517,52) │
│         81 │ (1517,53) │
│         82 │ (1517,54) │
│         83 │ (1517,55) │
│         84 │ (1517,56) │
│         85 │ (1517,57) │
│         86 │ (1517,58) │
│         87 │ (1517,59) │
│         88 │ (1517,60) │
│         89 │ (1517,62) │
│         90 │ (1523,1)  │
│         91 │ (1523,2)  │
│         92 │ (1523,3)  │
│         93 │ (1523,4)  │
│         94 │ (1523,5)  │
│         95 │ (1523,6)  │
│         96 │ (1523,7)  │
│         97 │ (1523,8)  │
│         98 │ (1523,9)  │
│         99 │ (1523,10) │
│        100 │ (1523,11) │
│        101 │ (1523,12) │
│        102 │ (1523,13) │
│        103 │ (1523,14) │
│        104 │ (1523,15) │
│        105 │ (1523,16) │
│        106 │ (1523,17) │
│        107 │ (1523,18) │
│        108 │ (1523,19) │
│        109 │ (1523,20) │
│        110 │ (1523,21) │
│        111 │ (1523,22) │
│        112 │ (1523,23) │
│        113 │ (1523,24) │
│        114 │ (1523,25) │
│        115 │ (1523,26) │
│        116 │ (1523,27) │
│        117 │ (1523,28) │
│        118 │ (1523,29) │
│        119 │ (1523,30) │
│        120 │ (1523,31) │
│        121 │ (1523,32) │
│        122 │ (1523,33) │
│        123 │ (1523,34) │
│        124 │ (1523,35) │
│        125 │ (1523,36) │
│        126 │ (1523,37) │
│        127 │ (1523,38) │
│        128 │ (1523,39) │
│        129 │ (1523,40) │
│        130 │ (1523,41) │
│        131 │ (1523,42) │
│        132 │ (1523,43) │
│        133 │ (1523,44) │
│        134 │ (1523,45) │
│        135 │ (1523,46) │
│        136 │ (1523,47) │
│        137 │ (1523,48) │
│        138 │ (1523,49) │
│        139 │ (1523,50) │
│        140 │ (1523,51) │
│        141 │ (1523,52) │
│        142 │ (1523,53) │
│        143 │ (1523,54) │
│        144 │ (1523,55) │
│        145 │ (1523,56) │
│        146 │ (1523,57) │
│        147 │ (1523,58) │
│        148 │ (1523,59) │
│        149 │ (1523,60) │
│        150 │ (1523,61) │
│        151 │ (1523,62) │
│        152 │ (1523,63) │
│        153 │ (1523,64) │
│        154 │ (1523,65) │
│        155 │ (1523,66) │
│        156 │ (1523,67) │
│        157 │ (1523,68) │
│        158 │ (1523,69) │
│        159 │ (1523,70) │
│        160 │ (1523,71) │
│        161 │ (1523,72) │
│        162 │ (1523,73) │
│        163 │ (1523,74) │
│        164 │ (1523,75) │
│        165 │ (1523,76) │
│        166 │ (1523,77) │
│        167 │ (1523,78) │
│        168 │ (1523,79) │
│        169 │ (1523,80) │
│        170 │ (1523,81) │
│        171 │ (1531,1)  │
│        172 │ (1531,2)  │
│        173 │ (1531,3)  │
│        174 │ (1531,4)  │
│        175 │ (1531,5)  │
│        176 │ (1531,6)  │
│        177 │ (1531,7)  │
│        178 │ (1531,8)  │
│        179 │ (1531,9)  │
│        180 │ (1531,10) │
│        181 │ (1531,11) │
│        182 │ (1531,12) │
│        183 │ (1531,13) │
│        184 │ (1531,14) │
│        185 │ (1531,15) │
│        186 │ (1531,16) │
│        187 │ (1531,17) │
│        188 │ (1531,18) │
│        189 │ (1531,19) │
│        190 │ (1531,20) │
│        191 │ (1531,21) │
│        192 │ (1531,22) │
│        193 │ (1531,23) │
│        194 │ (1531,24) │
│        195 │ (1531,25) │
│        196 │ (1531,26) │
│        197 │ (1531,27) │
│        198 │ (1531,28) │
│        199 │ (1531,29) │
│        200 │ (1531,30) │
│        201 │ (1531,31) │
│        202 │ (1531,32) │
│        203 │ (1531,33) │
│        204 │ (1531,34) │
│        205 │ (1531,35) │
│        206 │ (1531,36) │
│        207 │ (1531,37) │
│        208 │ (1531,38) │
│        209 │ (1531,39) │
│        210 │ (1531,40) │
│        211 │ (1531,41) │
│        212 │ (1531,42) │
│        213 │ (1531,43) │
│        214 │ (1531,44) │
│        215 │ (1531,45) │
│        216 │ (1531,46) │
│        217 │ (1531,47) │
│        218 │ (1531,48) │
│        219 │ (1531,49) │
│        220 │ (1531,50) │
│        221 │ (1531,51) │
│        222 │ (1531,52) │
│        223 │ (1531,53) │
│        224 │ (1531,54) │
│        225 │ (1531,55) │
│        226 │ (1531,56) │
│        227 │ (1531,57) │
│        228 │ (1531,58) │
│        229 │ (1531,59) │
│        230 │ (1531,60) │
│        231 │ (1531,61) │
│        232 │ (1531,62) │
│        233 │ (1531,63) │
│        234 │ (1531,64) │
│        235 │ (1531,65) │
│        236 │ (1531,66) │
│        237 │ (1531,67) │
│        238 │ (1531,68) │
│        239 │ (1531,69) │
│        240 │ (1531,70) │
│        241 │ (1531,71) │
│        242 │ (1531,72) │
│        243 │ (1531,73) │
│        244 │ (1531,74) │
│        245 │ (1531,75) │
│        246 │ (1531,76) │
│        247 │ (1531,77) │
│        248 │ (1531,78) │
│        249 │ (1531,79) │
│        250 │ (1531,80) │
│        251 │ (1531,81) │
│        252 │ (1539,1)  │
│        253 │ (1539,2)  │
│        254 │ (1539,3)  │
│        255 │ (1539,4)  │
│        256 │ (1539,5)  │
│        257 │ (1539,6)  │
│        258 │ (1539,7)  │
│        259 │ (1539,8)  │
│        260 │ (1539,9)  │
│        261 │ (1539,10) │
└────────────┴───────────┘
(261 rows)

Notice that there are contiguous groups of tuples that all point to
the same heap block. These groups are really groups of items (on
average 10 items) from a given order. Individual warehouses seem to
have a tendency to insert multiple orders together, which further
lowers nhtids.

You can tell that tuples aren't inserted in strict ascending order
because there are "heap TID discontinuities". For example, item 165
(which is the last item from a given order) points to (15377,81),
while item 166 (which is the first item from the next order made to
the same warehouse) points to (15385,1). There is a "heap block gap"
between index tuple item 165 and 166 -- these "missing" heap blocks
don't appear anywhere on the same leaf page.

Note also that many of the other TPC-C indexes have this same quality
to them. They also consist of groups of related tuples, that get
inserted together in ascending order -- and yet the *overall* pattern
for the index is pretty far from inserts happening in ascending key
space order.

* A low cardinality index.

In one way, this works against the complex patch: if there are ~1350
TIDs on every leaf page (thanks to nbtree deduplication), we're
presumably less likely to ever need to read very many leaf pages
eagerly. But in another way it favors the complex patch: each
individual distinct value will have its TIDs stored/read in TID order,
which can be enough of a factor to get us a low nhtids value for each
leaf page.

I see a nhtids of 5 - 7 for leaf pages from the following index:

create table low_cardinality(foo int4);
CREATE TABLE
create index on low_cardinality (foo);
CREATE INDEX
insert into low_cardinality select hashint4(j) from
generate_series(1,10_000) i, generate_series(1,100) j;
INSERT 0 1000000

This is actually kinda like the TPC-C index, in a way: "foo" column
values all look random. But within a given value, the TIDs are in
ascending order, which (at least here) is enough to get us a very low
nhtids -- even in spite of each leaf page storing more than 4x as many
TIDs than could be stored within each of the TPC-C index's pages.

Note that the number of CPU cycles needed within nbtree to read a leaf
page from a low cardinality index is probably *lower* than the typical
case for a unique index. This is due to a variety of factors -- the
main factor is that there aren't very many index tuples to evaluate on
the page. So the scan isn't bottlenecked at that (certainly not to an
extent that is commensurate with the overall number of TIDs).

The terminology in this area is tricky. We say "correlation", when
perhaps we should say something like "heap clustering factor" -- a
concept that seems hard to define precisely. It doesn't help that the
planner models all this using a correlation stat -- that encourages us
to reduce everything to a single scalar correlation number, which can
be quite misleading.

I could give more examples, if you want. But they'd all just be
variations of the same thing.

--
Peter Geoghegan

pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: pgbench - adding pl/pgsql versions of tests
Next
From: Álvaro Herrera
Date:
Subject: Re: trivial grammar refactor