Re: Trigger more frequent autovacuums of heavy insert tables - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Trigger more frequent autovacuums of heavy insert tables |
Date | |
Msg-id | CAAKRu_Y4BPM2it4NaTLx+8OdYL2v-jLNzgN8e3eqkCA6DLBRbQ@mail.gmail.com Whole thread Raw |
In response to | Re: Trigger more frequent autovacuums of heavy insert tables (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Trigger more frequent autovacuums of heavy insert tables
Re: Trigger more frequent autovacuums of heavy insert tables Re: Trigger more frequent autovacuums of heavy insert tables |
List | pgsql-hackers |
On Tue, Feb 25, 2025 at 10:39 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I'm on the fence about adding a PANIC. We do PANIC in other places > > where we notice corruption (like PageAddItemExtended()). But, in most > > of the cases, it seems like we are PANICing because there isn't a > > reasonable way to accomplish the intended task. In this case, we > > probably can't trust the visibility map counts for that page, but the > > pg_class columns are just estimates, so just capping relallfrozen > > might be good enough. > > +1 for not unnecessarily inflating the log level. I agree with the > principle you articulate here: a PANIC is reasonable when there's no > sane way to continue, but for other corruption events, a WARNING is > better. Note that one really bad consequence of using ERROR or any > higher level in VACUUM is that VACUUM doesn't ever complete. We just > keep retrying and dying. Even if it's only an ERROR, now we're no > longer controlling bloat or preventing wraparound. That's extremely > painful for users, so we don't want to end up there if we have a > reasonable alternative. That makes sense. I think I'll add a WARNING if we don't cap the value. > Also, I think that it's usually a bad idea to use an Assert to test > for data corruption scenarios. The problem is that programmers are > more or less entitled to assume that an assertion will always hold > true, barring bugs, and just ignore completely what the code might do > if the assertion turns out to be false. But data does get corrupted on > disk, so you SHOULD think about what the code is going to do in such > scenarios. Depending on the details, you might want to WARNING or > ERROR or PANIC or try to repair it or try to just tolerate it being > wrong or something else -- but you should be thinking about what the > code is actually going to do, not just going "eh, well, that can't > happen". I agree with this. I don't see what the assertion would catch in this case. Even as I try to think of a scenario where this would help the developer, if you write code that corrupts the visibility map, I don't think waiting for an assert at the end of a vacuum will be your first or best signal. This does however leave me with the question of how to handle the original question of whether or not to cap the proposed relallfrozen to the value of relallvisible when updating stats at the end of vacuum. The current code in heap_vacuum_rel() caps relallvisible to relpages, so capping relallfrozen to relallvisible would follow that pattern. However, the other places relallvisible is updated do no such capping (do_analyze_rel(), index_update_stats()). It doesn't seem like there is a good reason to do it one place and not the others. So, I suggest either removing all the caps and adding a WARNING or capping the value in all places. Because users can now manually update these values in pg_class, there wouldn't be a way to detect the difference between a bogus relallfrozen value due to VM corruption or a bogus value due to manual statistics intervention. This led me to think that a WARNING and no cap would be more effective for heap_vacuum_rel(). - Melanie
pgsql-hackers by date: