Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Date
Msg-id CAH2-Wz=Y9RwvkCxu9+0ypSpO88ctPC5_gdChdJc84nqOd1rfTw@mail.gmail.com
Whole thread Raw
In response to Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
List pgsql-hackers
On Thu, Feb 17, 2022 at 6:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Let's back up a minute and talk about the commit of $SUBJECT. The
> commit message contains a Discussion link to this thread. This thread,
> at the time you put that link in there, had exactly one post: from
> you. That's not much of a discussion, although I do acknowledge that
> sometimes we commit things that have bugs, and those bugs need to be
> fixed even if nobody has responded.

All true. I don't like to leave bugs unfixed for very long. I'm happy
to revise the fix in light of new information, or opinions.

> I would say it differently: I think the commit message does a poor job
> describing what the commit actually does. For example, it says nothing
> about changing VACUUM to always scan the last page of every heap
> relation. This whole thread is about fixing a problem that was caused
> by a significant behavior change that was *not even mentioned* in the
> original commit message.

It's not that simple. As I said in the fix-up commit message, and in
the opening email to this thread, it basically isn't a new behavior at
all. It would be much more accurate to describe it as a behavior that
originated with commit e8429082, from late 2015. There was a subtle
interaction with that existing behavior, and the refactoring work,
that caused the reltuples problem.

> If it had been mentioned, I likely would have
> complained, because it's very similar to behavior that Tom eliminated
> in b503da135ab0bdd97ac3d3f720c35854e084e525, which he did because it
> was distorting reltuples estimates.

I cited that commit myself. I think that it's a good idea to not be
completely unconcerned about how the visibility map skipping behavior
tends to affect the reltuples estimate -- especially if it's easy to
do it in a way that doesn't cause problems anyway. And so I agree that
the "looks ahead rather than behind" behavior added by that commit
makes sense. Even still, the whole idea that scanned_pages is a random
sample from the table is kinda bogus, and always has been. The
consequences of this are usually not too bad, but even still: why
wouldn't it be possible for there to be all kinds of distortion to
reltuples, since the tuple density logic is built on an assumption
that's self evidently wrong?

It's not reasonable to expect vacuumlazy.c to truly contort itself,
just to conceal that shaky assumption. At the same time, we need the
model used by vac_estimate_reltuples() to work as well as it can, with
the limited information that it has close at hand. That doesn't seem
to leave a lot of options, as far as addressing the bug goes. Happy to
discuss it further if you have any ideas, though.

> I think you *really* need to put more effort into
> making your patches, and the emails about your patches, and the commit
> messages for your patches understandable to other people. Otherwise,
> waiting 3 months between when you post the patch and when you commit
> it means nothing. You can wait 10 years to commit and still get
> objections, if other people don't understand what you're doing.

I don't think it's fair to just suppose that much of what I write is
incomprehensible, just like that. Justin expressed the exact opposite
view about the commit message of 44fa8488, for one. I'm not saying
that there is no room at all for improvement, or that my reputation
for being hard to follow is completely undeserved. Just that it's
frustrating to be accused of not having put enough effort into
explaining what's going on with commit 44fa8488 -- I actually put a
great deal of effort into it.

I think that you'd acknowledge that overall, on balance, I must be
doing something right. Have you considered that that might have
happened because of my conceptual style, not in spite of it? I have a
tendency to define things in abstract terms, because it really works
for me. I try to abstract away inessential details, and emphasize
invariance, so that the concepts are easier to manipulate with in many
different contexts -- something that is almost essential when working
on B-Tree stuff. It's harder at first, but much easier later on (at
least for me). If I ever seem to go overboard with it, I ask that you
consider this perspective -- though you should also push back when
that seems appropriate.

Separately, I think that you're missing how hard it would have been to
"just say what you did" while making much sense, given the specifics
here. The complexity in commit 44fa8488 is more or less all due to
historical factors -- it's quite a messy, winding story. For example,
commit 1914c5ea from 2017 added the tupcount_pages field (that I just
removed) to fix a reltuples estimation related bug, that could be
thought of as a bug in the aforementioned commit e8429082 (from 2015).
Now, commit 1914c5ea didn't actually argue that it was fixing a bug in
that earlier commit directly -- whether or not you'd call it that is a
matter of perspective.

This is one of those situations where it's far easier to start with
the ideal, and work forwards, rather than the other way around (the
other way around is often the better approach, but not always). To
some degree it's a return to an earlier era for vacuumlazy.c, when
scanned_pages had a simple and unambiguous definition.

> I would guess that's really the root of Andres's concern here. I
> believe that both Andres and I are in favor of the kinds of things you
> want to do here *in principle*.

I don't doubt that.

> But in practice I feel like it's not
> working well, and thereby putting the project at risk. What if some
> day one of us needs to fix a bug in your code?

Although I hope and expect to be able to fix any bugs that turn up in
my code, and have a good track record there, anything is possible. I
don't think that you'd have a problem fixing any bug in my code,
though. I genuinely think that you'd be able to figure it out pretty
quickly, if it really came down to it.

> It's not like VACUUM is
> some peripheral system where bugs aren't that critical -- and it's
> also not the case that the risk of introducing new bugs is low.
> Historically, it's anything but.

Very true. I'm totally dependent on you and Andres here, as fellow
experts on VACUUM. That's hard for everybody, myself included. I
greatly appreciate your working with me on this, and on the earlier
VACUUM projects. I'm doing my best.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: O(n) tasks cause lengthy startups and checkpoints
Next
From: Andres Freund
Date:
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints