Re: Avoiding superfluous buffer locking during nbtree backwards scans - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Avoiding superfluous buffer locking during nbtree backwards scans |
Date | |
Msg-id | CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com Whole thread Raw |
In response to | Avoiding superfluous buffer locking during nbtree backwards scans (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
On Fri, Jul 5, 2024 at 12:47 PM Peter Geoghegan <pg@bowt.ie> wrote: > On HEAD, the following query requires 24 buffer hits (I'm getting a > standard/forward index scan for this): > > select > abalance > from > pgbench_accounts > where > aid in (1, 500, 1000, 1500, 2000, 3000) order by aid asc; > > However, we don't see that with the backwards scan variant: > > select > abalance > from > pgbench_accounts > where > aid in (1, 500, 1000, 1500, 2000, 3000) order by aid desc; > > We actually see 30 buffer hits for this on HEAD. Whereas with the > patch, both variants get only 24 buffer hits -- there's parity, at > least in cases like these two. I'll now present a similar example that shows the remaining issue that my patch (which became my commit 3f44959f) didn't address, and how the issue if fixed by Matthias' more recent follow-up patch: $ pgbench -i -s 1 Now run: explain (analyze, buffers, costs off, summary off) select abalance from pgbench_accounts order by aid; With a warm cache, this query gets a full index scan requiring 1915 buffer hits. However, we still see worse performance for an equivalent backwards scan: explain (analyze, buffers, costs off, summary off) select abalance from pgbench_accounts order by aid desc; On master, the latter query gets 2188 buffer hits -- so clearly my commit 3f44959f left money on the table. These extra buffer hits just don't make sense. Matthias' patch (I tested v3-0001-Avoid-unneeded-nbtree-backwards-scan-buffer-locks.patch) will result in both variants requiring 1915 buffer hits. (Plus, of course, query execution should be faster with Matthias' new approach.) Though we do need special extra steps for dealing with concurrent page splits during backwards scans (steps involving a reread of the original page when its sibling link turns out to be stale), there hasn't been a concurrent split here -- which is why I find the inconsistency to be unnatural. Matthias' patch fixes the problem by passing _bt_walk_left the left link that will now have been stashed in the local scan state back when the page was read by _bt_readpage, so that it (_bt_walk_left) can optimistically *start* there (and *not* start at the page that has already been read, as on the master branch). Of course, the left link might be stale, but it was always inherently necessary for _bt_walk_left to be able to deal with that. Technically we're being more optimistic about it now, but that optimism is extremely well justified (concurrent page splits are rare). Arguably, this latest patch from Matthias actually makes the code simpler. It makes the backwards scan case more similar to the forwards scan case. This is another missed opportunity for commit 2ed5b87f96, I'd say -- that 2015 commit left things in a slightly odd in-between state, which this patch fully corrects. Note: my example was deliberately constructed to use an index scan backwards, rather than an index-only scan backwards. While both types of backwards scan will benefit in the same way (less buffer lock traffic), you'll only see a reduction in buffers hit for an EXPLAIN(ANALYZE, BUFFERS) involving a plain index scan backwards. This is due to the fact that the mechanism added by commit 2ed5b87f96 will only drop a leaf page buffer pin early for plain index scans, never for index-only scans. My example also wouldn't make the difference apparent with an unlogged table, for the same reason -- this difference isn't really important, but seems worth noting to avoid any confusion. A nice side benefit of this approach is that it'll make it easier to add better coverage for _bt_walk_left. The case where _bt_walk_left notices a stale left link will still be very rare (we're really not being all that optimistic in doing things this way), but it'll now be easier to hit on purpose. The code in v3-0001-Avoid-unneeded-nbtree-backwards-scan-buffer-locks.patch looks reasonable to me. I don't think that there are any impediments to committing it sometime this week. -- Peter Geoghegan
pgsql-hackers by date: