Re: Why mark empty pages all visible? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Why mark empty pages all visible?
Date
Msg-id CAH2-Wz=OjN_jQuJaF7f_iF8YVHbAFP9cKx1eJRf4XAdTsj0raw@mail.gmail.com
Whole thread Raw
In response to Re: Why mark empty pages all visible?  (Andres Freund <andres@anarazel.de>)
Responses Re: Why mark empty pages all visible?
List pgsql-hackers
On Mon, Mar 27, 2023 at 9:32 PM Andres Freund <andres@anarazel.de> wrote:
> On 2023-03-27 20:12:11 -0700, Peter Geoghegan wrote:
> > On Mon, Mar 27, 2023 at 6:48 PM Andres Freund <andres@anarazel.de> wrote:
> > > It seems odd that we enter the page into the VM at this point. That means that
> > > use of that page will now require a bit more work (including
> > > RelationGetBufferForTuple() pinning it).
> >
> > I think that it's fairly obvious that it's *not* odd at all. If it
> > didn't do this then the pages would have to be scanned by VACUUM.
>
> Yes - just like in the case of new pages.

I'm not saying that the status quo is free of contradictions. Only
that there seem to be contradictions in what you're saying now.

> > You haven't said anything about the leading cause of marking empty
> > pages all-frozen, by far: lazy_vacuum_heap_page(). Should it also stop
> > marking empty pages all-frozen?
>
> It's not obvious that it should - but it's not as clear a case as the
> ConditionalLockBufferForCleanup() -> lazy_scan_new_or_empty() one. In the
> latter, we know
> a) that we don't have to do any work to be able to advance the horizon
> b) we know that somebody else has the page pinned
>
> What's the point in marking it all-visible at that point? In quite likely be
> from RelationGetBufferForTuple() having extended the relation and then briefly
> needed to release the lock (to acquire the lock on otherBuffer or in
> GetVisibilityMapPins()).

I think that there is significant value in avoiding special cases, on
general principle. If we stopped doing this in
lazy_scan_new_or_empty() we'd be inventing a new thing that cleanup
locks are supposed to protect against. Maybe something like that would
make sense, but if so then make that argument, and make it explicitly
represented in the code.

> I don't follow. In the ConditionalLockBufferForCleanup() ->
> lazy_scan_new_or_empty() case we are dealing with an new or empty
> page. Whereas lazy_vacuum_heap_page() deals with a page that definitely has
> dead tuples on it.  How are those two cases comparable?

It doesn't have dead tuples anymore, though.

ISTM that there is an issue here with the definition of an empty page.
You're concerned about PageIsEmpty() pages. Which actually aren't
quite the same thing as an empty page left behind by
lazy_vacuum_heap_page(). It's just that this distinction isn't quite
acknowledged anywhere, and probably didn't exist at all at some point.
Maybe that should be addressed.

> > > It seems pretty clear that we shouldn't enter a currently-in-use page into the
> > > VM or freespacemap. All that's going to do is to "disturb" the backend trying
> > > to use that page (by directing other backends to it) and to make its job more
> > > expensive.
> >
> > I don't think that it's clear. What about the case where there is only
> > one tuple, on a page that we cannot cleanup lock? Where do you draw
> > the line?
>
> I don't see how that's comparable? For one, we might need to clean up that
> tuple for vacuum to be able to advance the horizon. And as far as the
> non-cleanup lock path goes, it actually can perform work there. And it doesn't
> even need to acquire an exclusive lock.

So we should put space in the FSM if it has one tuple, but not if it
has zero tuples? Though not if it has zero tuples following processing
by lazy_vacuum_heap_page()?

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Next
From: Peter Smith
Date:
Subject: Re: PGdoc: add missing ID attribute to create_subscription.sgml