Re: should vacuum's first heap pass be read-only? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: should vacuum's first heap pass be read-only?
Date
Msg-id CAH2-WzmQN4bcs8XmFrd3d9Gsk24AKdL5EQa4Pg0hYEko40--LQ@mail.gmail.com
Whole thread Raw
In response to Re: should vacuum's first heap pass be read-only?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: should vacuum's first heap pass be read-only?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Apr 5, 2022 at 2:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 5, 2022 at 4:30 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Tue, Apr 5, 2022 at 1:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > I had assumed that this would not be the case, because if the page is
> > > being accessed by the workload, it can be pruned - and probably frozen
> > > too, if we wanted to write code for that and spend the cycles on it -
> > > and if it isn't, pruning and freezing probably aren't needed.
> >
> > [ a lot of things ]
>
> I don't understand what any of this has to do with the point I was raising here.

Why do you assume that we'll ever have an accurate idea of how many
LP_DEAD items there are, before we've looked? And if we're wrong about
that, persistently, why should anything else we think about it really
matter? This is an inherently dynamic and cyclic process. Statistics
don't really work here. That was how my remarks were related to yours.
That should be in scope -- getting better information about what work
we need to do by blurring the boundaries between deciding what to do,
and executing that plan.

On a long enough timeline the LP_DEAD items in heap pages are bound to
become the dominant concern in almost any interesting case for the
conveyor belt, for the obvious reason: you can't do anything about
LP_DEAD items without also doing every other piece of processing
involving those same heap pages. So in that sense, yes, they will be
the dominant problem at times, for sure.

On the other hand it seems very hard to imagine an interesting
scenario in which LP_DEAD items are the dominant problem from the
earliest stage of processing by VACUUM. But even if it was somehow
possible, would it matter? That would mean that there'd be occasional
instances of the conveyor belt being ineffective -- hardly the end of
the world. What has it cost us to keep it as an option that wasn't
used? I don't think we'd have to do any extra work, other than
in-memory bookkeeping.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Joe Wildish"
Date:
Subject: MERGE bug report
Next
From: Zhihong Yu
Date:
Subject: Re: MERGE bug report