Re: index prefetching - Mailing list pgsql-hackers

From Robert Haas
Subject Re: index prefetching
Date
Msg-id CA+TgmoaDfvFF_krboNjjMYzOiPsGM_ioYqfngL-pNaj-nSs1DA@mail.gmail.com
Whole thread Raw
In response to Re: index prefetching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: index prefetching
List pgsql-hackers
On Wed, Feb 14, 2024 at 7:43 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I don't think it's just a bookkeeping problem. In a way, nbtree already
> does keep an array of tuples to kill (see btgettuple), but it's always
> for the current index page. So it's not that we immediately go and kill
> the prior tuple - nbtree already stashes it in an array, and kills all
> those tuples when moving to the next index page.
>
> The way I understand the problem is that with prefetching we're bound to
> determine the kill_prior_tuple flag with a delay, in which case we might
> have already moved to the next index page ...

Well... I'm not clear on all of the details of how this works, but
this sounds broken to me, for the reasons that Peter G. mentions in
his comments about desynchronization. If we currently have a rule that
you hold a pin on the index page while processing the heap tuples it
references, you can't just throw that out the window and expect things
to keep working. Saying that kill_prior_tuple doesn't work when you
throw that rule out the window is probably understating the extent of
the problem very considerably.

I would have thought that the way this prefetching would work is that
we would bring pages into shared_buffers sooner than we currently do,
but not actually pin them until we're ready to use them, so that it's
possible they might be evicted again before we get around to them, if
we prefetch too far and the system is too busy. Alternately, it also
seems OK to read those later pages and pin them right away, as long as
(1) we don't also give up pins that we would have held in the absence
of prefetching and (2) we have some mechanism for limiting the number
of extra pins that we're holding to a reasonable number given the size
of shared_buffers.

However, it doesn't seem OK at all to give up pins that the current
code holds sooner than the current code would do.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: 039_end_of_wal: error in "xl_tot_len zero" test
Next
From: David Rowley
Date:
Subject: Re: Properly pathify the union planner