Re: index prefetching - Mailing list pgsql-hackers

From Andres Freund
Subject Re: index prefetching
Date
Msg-id 20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de
Whole thread Raw
In response to Re: index prefetching  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: index prefetching
List pgsql-hackers
Hi,

On 2024-02-15 12:53:10 -0500, Peter Geoghegan wrote:
> On Thu, Feb 15, 2024 at 12:26 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> > I may be missing something, but it seems fairly self-evident to me an
> > entry at the beginning of an index page won't get prefetched (assuming
> > the page-at-a-time thing).
> 
> Sure, if the first item on the page is also the first item that we
> need the scan to return (having just descended the tree), then it
> won't get prefetched under a scheme that sticks with the current
> page-at-a-time behavior (at least in v1). Just like when the first
> item that we need the scan to return is from the middle of the page,
> or more towards the end of the page.
> 
> It is of course also true that we can't prefetch the next page's
> first item until we actually visit the next page -- clearly that's
> suboptimal. Just like we can't prefetch any other, later tuples from
> the next page (until such time as we have determined for sure that
> there really will be a next page, and have called _bt_readpage for
> that next page.)
>
> This is why I don't think that the tuples with lower page offset
> numbers are in any way significant here.  The significant part is
> whether or not you'll actually need to visit more than one leaf page
> in the first place (plus the penalty from not being able to reorder
> the work across page boundaries in your initial v1 of prefetching).

To me this your phrasing just seems to reformulate the issue.

In practical terms you'll have to wait for the full IO latency when fetching
the table tuple corresponding to the first tid on a leaf page. Of course
that's also the moment you had to visit another leaf page. Whether the stall
is due to visit another leaf page or due to processing the first entry on such
a leaf page is a distinction without a difference.


> > That's certainly true / helpful, and it makes the "first entry" issue
> > much less common. But the issue is still there. Of course, this says
> > nothing about the importance of the issue - the impact may easily be so
> > small it's not worth worrying about.
> 
> Right. And I want to be clear: I'm really *not* sure how much it
> matters. I just doubt that it's worth worrying about in v1 -- time
> grows short. Although I agree that we should commit a v1 that leaves
> the door open to improving matters in this area in v2.

I somewhat doubt that it's realistic to aim for 17 at this point. We seem to
still be doing fairly fundamental architectual work. I think it might be the
right thing even for 18 to go for the simpler only-a-single-leaf-page
approach though.

I wonder if there are prerequisites that can be tackled for 17. One idea is to
work on infrastructure to provide executor nodes with information about the
number of tuples likely to be fetched - I suspect we'll trigger regressions
without that in place.



One way to *sometimes* process more than a single leaf page, without having to
redesign kill_prior_tuple, would be to use the visibilitymap to check if the
target pages are all-visible. If all the table pages on a leaf page are
all-visible, we know that we don't need to kill index entries, and thus can
move on to the next leaf page

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Maiquel Grassi
Date:
Subject: RE: Psql meta-command conninfo+
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2