Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date
Msg-id CAH2-WzmP0awh8d2+dPd7Umq+HyJ-28HxjELzcPu7gO0G7VAvcQ@mail.gmail.com
Whole thread Raw
In response to Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Andres Freund <andres@anarazel.de>)
Responses Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Thu, Mar 31, 2022 at 10:50 AM Andres Freund <andres@anarazel.de> wrote:
> > So, to be clear: vac_update_relstats() never actually considered the
> > new relfrozenxid value from its vacuumlazy.c caller to be "in the
> > future"?
>
> No, I added separate debug messages for those, and also applied your patch,
> and it didn't trigger.

The assert is "Assert(diff > 0)", and not "Assert(diff >= 0)". Plus
the other related assert I mentioned did not trigger. So when this
"diff" assert did trigger, the value of "diff" must have been 0 (not a
negative value). While this state does technically indicate that the
"existing" relfrozenxid value (actually a stale version) appears to be
"in the future" (because the OldestXmin XID might still never have
been allocated), it won't ever be in the future according to
vac_update_relstats() (even if it used that version).

I suppose that I might be wrong about that, somehow -- anything is
possible. The important point is that there is currently no evidence
that this bug (or any very recent bug) could ever allow
vac_update_relstats() to actually believe that it needs to update
relfrozenxid/relminmxid, purely because the existing value is in the
future.

The fact that vac_update_relstats() doesn't log/warn when this happens
is very unfortunate, but there is nevertheless no evidence that that
would have informed us of any bug on HEAD, even including the actual
bug here, which is a bug in vacuumlazy.c (not in vac_update_relstats).

> I do think we should apply a version of the warnings you have (with a WARNING
> instead of PANIC obviously). I think it's bordering on insanity that we have
> so many paths to just silently fix stuff up around vacuum. It's like we want
> things to be undebuggable, and to give users no warnings about something being
> up.

Yeah, it's just totally self defeating to not at least log it. I mean
this is a code path that is only hit once per VACUUM, so there is
practically no risk of that causing any new problems.

> Can you repro the issue with my recipe? FWIW, adding log_min_messages=debug5
> and fsync=off made the crash trigger more quickly.

I'll try to do that today. I'm not feeling the most energetic right
now, to be honest.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PATCH] Accept IP addresses in server certificate SANs
Next
From: Daniel Gustafsson
Date:
Subject: Re: Unit tests for SLRU