On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I'm on the fence about adding a PANIC. We do PANIC in other places
> where we notice corruption (like PageAddItemExtended()). But, in most
> of the cases, it seems like we are PANICing because there isn't a
> reasonable way to accomplish the intended task. In this case, we
> probably can't trust the visibility map counts for that page, but the
> pg_class columns are just estimates, so just capping relallfrozen
> might be good enough.
+1 for not unnecessarily inflating the log level. I agree with the
principle you articulate here: a PANIC is reasonable when there's no
sane way to continue, but for other corruption events, a WARNING is
better. Note that one really bad consequence of using ERROR or any
higher level in VACUUM is that VACUUM doesn't ever complete. We just
keep retrying and dying. Even if it's only an ERROR, now we're no
longer controlling bloat or preventing wraparound. That's extremely
painful for users, so we don't want to end up there if we have a
reasonable alternative.
Also, I think that it's usually a bad idea to use an Assert to test
for data corruption scenarios. The problem is that programmers are
more or less entitled to assume that an assertion will always hold
true, barring bugs, and just ignore completely what the code might do
if the assertion turns out to be false. But data does get corrupted on
disk, so you SHOULD think about what the code is going to do in such
scenarios. Depending on the details, you might want to WARNING or
ERROR or PANIC or try to repair it or try to just tolerate it being
wrong or something else -- but you should be thinking about what the
code is actually going to do, not just going "eh, well, that can't
happen".
--
Robert Haas
EDB: http://www.enterprisedb.com