Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-Wzko86NwiENCJGtakJ=fOhWpr-Yz-F+1oxgv2Ku1mvXwvA@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra <tomas@vondra.me> wrote: > Actually, this might be a consequence of how backwards scans work (at > least in btree). I logged the block in index_scan_stream_read_next, and > this is what I see in the forward scan (at the beginning): Just to be clear: you did disable deduplication and then reindex, right? You're accounting for the known issue with posting list TIDs returning TIDs in the wrong order, relative to the scan direction (when the scan direction is backwards)? It won't be necessary to do this once I commit my patch that fixes the issue directly, on the nbtree side, but for now deduplication messes things up here. And so for now you have to work around it. > But with the backwards scan we apparently scan the values backwards, but > then the blocks for each value are accessed in forward direction. So we > do a couple blocks "forward" and then jump to the preceding value - but > that's a couple blocks *back*. And that breaks the lastBlock check. I don't think that this should be happening. The read stream ought to be seeing blocks in exactly the same order as everything else. > I believe this applies both to master and the prefetching, except that > master doesn't have read stream - so it only does sync I/O. In what sense is it an issue on master? On master, we simply access the TIDs in whatever order amgettuple returns TIDs in. That should always be scan order/index key space order, where heap TID counts as a tie-breaker/affects the key space in the presence of duplicates (at least once that issue with posting lists is fixed, or once deduplication has been disabled in a way that leaves no posting list TIDs around via a reindex). It is certainly not surprising that master does poorly on backwards scans. And it isn't all that surprising that master does worse on backwards scans when direct I/O is in use (per the explanation Andres offered just now). But master should nevertheless always read the TIDs in whatever order it gets them from amgettuple in. It sounds like amgetbatch doesn't really behave analogously to master here, at least with backwards scans. It sounds like you're saying that we *won't* feed TIDs heap block numbers to the read stream in exactly scan order (when we happen to be scanning backwards) -- which seems wrong to me. As you pointed out, a forwards scan of a DESC column index should feed heap blocks to the read stream in a way that is very similar to an equivalent backwards scan of a similar ASC column on the same table. There might be some very minor differences, due to differences in the precise leaf page boundaries among each of the indexes. But that should hardly be noticeable at all. > Could that hide the extra buffer accesses, somehow? I think that you meant to ask about *missing* buffer hits with the patch, for the forwards scan. That doesn't agree with the backwards scan with the patch, nor does it agree with master (with either the forwards or backwards scan). Note that the heap accesses themselves appear to have sane/consistent numbers, since we always see "read=49933" as expected for those, for all 4 query executions that I showed. The "missing buffer hits" issue seems like an issue with the instrumentation itself. Possibly one that is totally unrelated to everything else we're discussing. -- Peter Geoghegan
pgsql-hackers by date: