On Thu, Apr 6, 2023 at 11:52 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> > Gah, I think I misunderstood you. You are saying that only calling
> > AutoVacuumUpdateCostLimit() after napping while vacuuming a table may
> > not be enough. The frequency at which the number of workers changes will
> > likely be different. This is a good point.
> > It's kind of weird to call AutoVacuumUpdateCostLimit() only after napping...
>
> A not fully baked idea for a solution:
>
> Why not keep the balanced limit in the atomic instead of the number of
> workers for balance. If we expect all of the workers to have the same
> value for cost limit, then why would we just count the workers and not
> also do the division and store that in the atomic variable. We are
> worried about the division not being done often enough, not the number
> of workers being out of date. This solves that, right?
A bird in the hand is worth two in the bush, though. We don't really
have time to redesign the patch before feature freeze, and I can't
convince myself that there's a big enough problem with what you
already did that it would be worth putting off fixing this for another
year. Reading your newer emails, I think that the answer to my
original question is "we don't want to do it at every
vacuum_delay_point because it might be too costly," which is
reasonable.
I don't particularly like this new idea, either, I think. While it may
be true that we expect all the workers to come up with the same
answer, they need not, because rereading the configuration file isn't
synchronized. It would be pretty lame if a worker that had reread an
updated value from the configuration file recomputed the value, and
then another worker that still had an older value recalculated it
again just afterward. Keeping only the number of workers in memory
avoids the possibility of thrashing around in situations like that.
I do kind of wonder if it would be possible to rejigger things so that
we didn't have to keep recalculating av_nworkersForBalance, though.
Perhaps now is not the time due to the impending freeze, but maybe we
should explore maintaining that value in such a way that it is correct
at every instant, instead of recalculating it at intervals.
--
Robert Haas
EDB: http://www.enterprisedb.com