Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-Wz=xSEsrmL37JHMwCuQ=J0AuKbOdyuc6ODeP-53-4-Ykqw@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On Tue, Jul 22, 2025 at 9:31 PM Peter Geoghegan <pg@bowt.ie> wrote: > I'll get back to you on this soon. There are plenty of indexes that > are not perfectly correlated (like pgbench_accounts_pkey is) that'll > nevertheless benefit significantly from the approach taken by the > complex patch. I'll give you a few examples that look like this. I'm not necessarily suggesting that you adopt these example indexes into your test suite -- these are just to stimulate discussion. * The TPC-C order line table primary key. This is the single largest index used by TPC-C, by quite some margin. This is the index that my Postgres 12 "split after new tuple" optimization made about 40% smaller with retail inserts -- I've already studied it in detail. It's a composite index on 4 integer columns, where each leaf page contains about 260 index tuples. Note that this is true regardless of whether retail inserts or CREATE INDEX were used (thanks to the "split after new tuple" thing). And yet I see that nhblks is even *lower* than pgbench_accounts: the average per-leaf-page nhblks is about 4 or 5. While the odd leaf page has an nhblks of 7 or 8, some individual leaf pages that are full, but have an nhblks of only 4. I would expect such an index to benefit to the maximum possible extent from the complex patch/eager leaf page reading. This is true in spite of the fact that technically the overall correlation is weak. Here's what a random leaf page looks like, in terms of TIDs: ┌────────────┬───────────┐ │ itemoffset │ htid │ ├────────────┼───────────┤ │ 1 │ ∅ │ │ 2 │ (1510,55) │ │ 3 │ (1510,56) │ │ 4 │ (1510,57) │ │ 5 │ (1510,58) │ │ 6 │ (1510,59) │ │ 7 │ (1510,60) │ │ 8 │ (1510,61) │ │ 9 │ (1510,62) │ │ 10 │ (1510,63) │ │ 11 │ (1510,64) │ │ 12 │ (1510,65) │ │ 13 │ (1510,66) │ │ 14 │ (1510,67) │ │ 15 │ (1510,68) │ │ 16 │ (1510,69) │ │ 17 │ (1510,70) │ │ 18 │ (1510,71) │ │ 19 │ (1510,72) │ │ 20 │ (1510,73) │ │ 21 │ (1510,74) │ │ 22 │ (1510,75) │ │ 23 │ (1510,76) │ │ 24 │ (1510,77) │ │ 25 │ (1510,78) │ │ 26 │ (1510,79) │ │ 27 │ (1510,80) │ │ 28 │ (1510,81) │ │ 29 │ (1517,1) │ │ 30 │ (1517,2) │ │ 31 │ (1517,3) │ │ 32 │ (1517,4) │ │ 33 │ (1517,5) │ │ 34 │ (1517,6) │ │ 35 │ (1517,7) │ │ 36 │ (1517,8) │ │ 37 │ (1517,9) │ │ 38 │ (1517,10) │ │ 39 │ (1517,11) │ │ 40 │ (1517,12) │ │ 41 │ (1517,13) │ │ 42 │ (1517,14) │ │ 43 │ (1517,15) │ │ 44 │ (1517,16) │ │ 45 │ (1517,17) │ │ 46 │ (1517,18) │ │ 47 │ (1517,19) │ │ 48 │ (1517,20) │ │ 49 │ (1517,21) │ │ 50 │ (1517,22) │ │ 51 │ (1517,23) │ │ 52 │ (1517,24) │ │ 53 │ (1517,25) │ │ 54 │ (1517,26) │ │ 55 │ (1517,27) │ │ 56 │ (1517,28) │ │ 57 │ (1517,29) │ │ 58 │ (1517,30) │ │ 59 │ (1517,31) │ │ 60 │ (1517,32) │ │ 61 │ (1517,33) │ │ 62 │ (1517,34) │ │ 63 │ (1517,35) │ │ 64 │ (1517,36) │ │ 65 │ (1517,37) │ │ 66 │ (1517,38) │ │ 67 │ (1517,39) │ │ 68 │ (1517,40) │ │ 69 │ (1517,41) │ │ 70 │ (1517,42) │ │ 71 │ (1517,43) │ │ 72 │ (1517,44) │ │ 73 │ (1517,45) │ │ 74 │ (1517,46) │ │ 75 │ (1517,47) │ │ 76 │ (1517,48) │ │ 77 │ (1517,49) │ │ 78 │ (1517,50) │ │ 79 │ (1517,51) │ │ 80 │ (1517,52) │ │ 81 │ (1517,53) │ │ 82 │ (1517,54) │ │ 83 │ (1517,55) │ │ 84 │ (1517,56) │ │ 85 │ (1517,57) │ │ 86 │ (1517,58) │ │ 87 │ (1517,59) │ │ 88 │ (1517,60) │ │ 89 │ (1517,62) │ │ 90 │ (1523,1) │ │ 91 │ (1523,2) │ │ 92 │ (1523,3) │ │ 93 │ (1523,4) │ │ 94 │ (1523,5) │ │ 95 │ (1523,6) │ │ 96 │ (1523,7) │ │ 97 │ (1523,8) │ │ 98 │ (1523,9) │ │ 99 │ (1523,10) │ │ 100 │ (1523,11) │ │ 101 │ (1523,12) │ │ 102 │ (1523,13) │ │ 103 │ (1523,14) │ │ 104 │ (1523,15) │ │ 105 │ (1523,16) │ │ 106 │ (1523,17) │ │ 107 │ (1523,18) │ │ 108 │ (1523,19) │ │ 109 │ (1523,20) │ │ 110 │ (1523,21) │ │ 111 │ (1523,22) │ │ 112 │ (1523,23) │ │ 113 │ (1523,24) │ │ 114 │ (1523,25) │ │ 115 │ (1523,26) │ │ 116 │ (1523,27) │ │ 117 │ (1523,28) │ │ 118 │ (1523,29) │ │ 119 │ (1523,30) │ │ 120 │ (1523,31) │ │ 121 │ (1523,32) │ │ 122 │ (1523,33) │ │ 123 │ (1523,34) │ │ 124 │ (1523,35) │ │ 125 │ (1523,36) │ │ 126 │ (1523,37) │ │ 127 │ (1523,38) │ │ 128 │ (1523,39) │ │ 129 │ (1523,40) │ │ 130 │ (1523,41) │ │ 131 │ (1523,42) │ │ 132 │ (1523,43) │ │ 133 │ (1523,44) │ │ 134 │ (1523,45) │ │ 135 │ (1523,46) │ │ 136 │ (1523,47) │ │ 137 │ (1523,48) │ │ 138 │ (1523,49) │ │ 139 │ (1523,50) │ │ 140 │ (1523,51) │ │ 141 │ (1523,52) │ │ 142 │ (1523,53) │ │ 143 │ (1523,54) │ │ 144 │ (1523,55) │ │ 145 │ (1523,56) │ │ 146 │ (1523,57) │ │ 147 │ (1523,58) │ │ 148 │ (1523,59) │ │ 149 │ (1523,60) │ │ 150 │ (1523,61) │ │ 151 │ (1523,62) │ │ 152 │ (1523,63) │ │ 153 │ (1523,64) │ │ 154 │ (1523,65) │ │ 155 │ (1523,66) │ │ 156 │ (1523,67) │ │ 157 │ (1523,68) │ │ 158 │ (1523,69) │ │ 159 │ (1523,70) │ │ 160 │ (1523,71) │ │ 161 │ (1523,72) │ │ 162 │ (1523,73) │ │ 163 │ (1523,74) │ │ 164 │ (1523,75) │ │ 165 │ (1523,76) │ │ 166 │ (1523,77) │ │ 167 │ (1523,78) │ │ 168 │ (1523,79) │ │ 169 │ (1523,80) │ │ 170 │ (1523,81) │ │ 171 │ (1531,1) │ │ 172 │ (1531,2) │ │ 173 │ (1531,3) │ │ 174 │ (1531,4) │ │ 175 │ (1531,5) │ │ 176 │ (1531,6) │ │ 177 │ (1531,7) │ │ 178 │ (1531,8) │ │ 179 │ (1531,9) │ │ 180 │ (1531,10) │ │ 181 │ (1531,11) │ │ 182 │ (1531,12) │ │ 183 │ (1531,13) │ │ 184 │ (1531,14) │ │ 185 │ (1531,15) │ │ 186 │ (1531,16) │ │ 187 │ (1531,17) │ │ 188 │ (1531,18) │ │ 189 │ (1531,19) │ │ 190 │ (1531,20) │ │ 191 │ (1531,21) │ │ 192 │ (1531,22) │ │ 193 │ (1531,23) │ │ 194 │ (1531,24) │ │ 195 │ (1531,25) │ │ 196 │ (1531,26) │ │ 197 │ (1531,27) │ │ 198 │ (1531,28) │ │ 199 │ (1531,29) │ │ 200 │ (1531,30) │ │ 201 │ (1531,31) │ │ 202 │ (1531,32) │ │ 203 │ (1531,33) │ │ 204 │ (1531,34) │ │ 205 │ (1531,35) │ │ 206 │ (1531,36) │ │ 207 │ (1531,37) │ │ 208 │ (1531,38) │ │ 209 │ (1531,39) │ │ 210 │ (1531,40) │ │ 211 │ (1531,41) │ │ 212 │ (1531,42) │ │ 213 │ (1531,43) │ │ 214 │ (1531,44) │ │ 215 │ (1531,45) │ │ 216 │ (1531,46) │ │ 217 │ (1531,47) │ │ 218 │ (1531,48) │ │ 219 │ (1531,49) │ │ 220 │ (1531,50) │ │ 221 │ (1531,51) │ │ 222 │ (1531,52) │ │ 223 │ (1531,53) │ │ 224 │ (1531,54) │ │ 225 │ (1531,55) │ │ 226 │ (1531,56) │ │ 227 │ (1531,57) │ │ 228 │ (1531,58) │ │ 229 │ (1531,59) │ │ 230 │ (1531,60) │ │ 231 │ (1531,61) │ │ 232 │ (1531,62) │ │ 233 │ (1531,63) │ │ 234 │ (1531,64) │ │ 235 │ (1531,65) │ │ 236 │ (1531,66) │ │ 237 │ (1531,67) │ │ 238 │ (1531,68) │ │ 239 │ (1531,69) │ │ 240 │ (1531,70) │ │ 241 │ (1531,71) │ │ 242 │ (1531,72) │ │ 243 │ (1531,73) │ │ 244 │ (1531,74) │ │ 245 │ (1531,75) │ │ 246 │ (1531,76) │ │ 247 │ (1531,77) │ │ 248 │ (1531,78) │ │ 249 │ (1531,79) │ │ 250 │ (1531,80) │ │ 251 │ (1531,81) │ │ 252 │ (1539,1) │ │ 253 │ (1539,2) │ │ 254 │ (1539,3) │ │ 255 │ (1539,4) │ │ 256 │ (1539,5) │ │ 257 │ (1539,6) │ │ 258 │ (1539,7) │ │ 259 │ (1539,8) │ │ 260 │ (1539,9) │ │ 261 │ (1539,10) │ └────────────┴───────────┘ (261 rows) Notice that there are contiguous groups of tuples that all point to the same heap block. These groups are really groups of items (on average 10 items) from a given order. Individual warehouses seem to have a tendency to insert multiple orders together, which further lowers nhtids. You can tell that tuples aren't inserted in strict ascending order because there are "heap TID discontinuities". For example, item 165 (which is the last item from a given order) points to (15377,81), while item 166 (which is the first item from the next order made to the same warehouse) points to (15385,1). There is a "heap block gap" between index tuple item 165 and 166 -- these "missing" heap blocks don't appear anywhere on the same leaf page. Note also that many of the other TPC-C indexes have this same quality to them. They also consist of groups of related tuples, that get inserted together in ascending order -- and yet the *overall* pattern for the index is pretty far from inserts happening in ascending key space order. * A low cardinality index. In one way, this works against the complex patch: if there are ~1350 TIDs on every leaf page (thanks to nbtree deduplication), we're presumably less likely to ever need to read very many leaf pages eagerly. But in another way it favors the complex patch: each individual distinct value will have its TIDs stored/read in TID order, which can be enough of a factor to get us a low nhtids value for each leaf page. I see a nhtids of 5 - 7 for leaf pages from the following index: create table low_cardinality(foo int4); CREATE TABLE create index on low_cardinality (foo); CREATE INDEX insert into low_cardinality select hashint4(j) from generate_series(1,10_000) i, generate_series(1,100) j; INSERT 0 1000000 This is actually kinda like the TPC-C index, in a way: "foo" column values all look random. But within a given value, the TIDs are in ascending order, which (at least here) is enough to get us a very low nhtids -- even in spite of each leaf page storing more than 4x as many TIDs than could be stored within each of the TPC-C index's pages. Note that the number of CPU cycles needed within nbtree to read a leaf page from a low cardinality index is probably *lower* than the typical case for a unique index. This is due to a variety of factors -- the main factor is that there aren't very many index tuples to evaluate on the page. So the scan isn't bottlenecked at that (certainly not to an extent that is commensurate with the overall number of TIDs). The terminology in this area is tricky. We say "correlation", when perhaps we should say something like "heap clustering factor" -- a concept that seems hard to define precisely. It doesn't help that the planner models all this using a correlation stat -- that encourages us to reduce everything to a single scalar correlation number, which can be quite misleading. I could give more examples, if you want. But they'd all just be variations of the same thing. -- Peter Geoghegan
pgsql-hackers by date: