Re: index prefetching - Mailing list pgsql-hackers
| From | Andres Freund |
|---|---|
| Subject | Re: index prefetching |
| Date | |
| Msg-id | 6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev Whole thread Raw |
| In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: index prefetching
|
| List | pgsql-hackers |
Hi,
On 2025-08-14 19:36:49 -0400, Andres Freund wrote:
> On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote:
> > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > > We can optimize that by deferring the StartBufferIO() if we're encountering a
> > > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure
> > > > real-world queries will often encounter the pattern of the same block being
> > > > read in by a read stream multiple times in close proximity sufficiently often
> > > > to make that worth it.
> > >
> > > We definitely need to be prepared for duplicate prefetch requests in
> > > the context of index scans.
> >
> > Can you (or anybody else) think of a quick and dirty way of working
> > around the problem on the read stream side? I would like to prioritize
> > getting the patch into a state where its overall performance profile
> > "feels right". From there we can iterate on fixing the underlying
> > issues in more principled ways.
>
> I think I can see a way to fix the issue, below read stream. Basically,
> whenever AsyncReadBuffers() finds a buffer that has ongoing IO, instead of
> waiting, as we do today, copy the wref to the ReadBuffersOperation() and set a
> new flag indicating that we are waiting for an IO that was not started by the
> wref. Then, in WaitReadBuffers(), we wait for such foreign started IOs. That
> has to be somewhat different code from today, because we have to deal with the
> fact of the "foreign" IO potentially having failed.
>
> I'll try writing a prototype for that tomorrow. I think to actually get that
> into a committable shape we need a test harness (probably a read stream
> controlled by an SQL function that gets an array of buffers).
Attached is a prototype of this approach. It does seem to fix this issue.
New code disabled:
#### backwards sequential table ####
┌──────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=10291 read=49933 │
│ I/O Timings: shared read=213.277 │
│ Planning: │
│ Buffers: shared hit=91 read=19 │
│ I/O Timings: shared read=2.124 │
│ Planning Time: 3.269 ms │
│ Execution Time: 1023.279 ms │
└──────────────────────────────────────────────────────────────────────┘
(10 rows)
New code enabled:
#### backwards sequential table ####
┌──────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├──────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=10291 read=49933 │
│ I/O Timings: shared read=217.225 │
│ Planning: │
│ Buffers: shared hit=91 read=19 │
│ I/O Timings: shared read=2.009 │
│ Planning Time: 2.685 ms │
│ Execution Time: 602.987 ms │
└──────────────────────────────────────────────────────────────────────┘
(10 rows)
With the change enabled, the sequential query is faster than the random query:
#### backwards random table ####
┌────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan Backward using t_randomized_pk on t_randomized (actual rows=1048576.00 loops=1) │
│ Index Cond: ((a >= 16336) AND (a <= 49103)) │
│ Index Searches: 1 │
│ Buffers: shared hit=6085 read=77813 │
│ I/O Timings: shared read=347.285 │
│ Planning: │
│ Buffers: shared hit=127 read=5 │
│ I/O Timings: shared read=1.001 │
│ Planning Time: 1.751 ms │
│ Execution Time: 820.544 ms │
└────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)
Greetings,
Andres Freund
Attachment
pgsql-hackers by date: