Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum
Date
Msg-id CAD21AoA6r_U0cXpmkFwFZmxkde+t06oT4c7DN=S1bVaeGq2zrg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] Too many autovacuum workers spawned during forced auto-vacuum  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On Mon, Jan 16, 2017 at 1:50 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 13 January 2017 at 19:15, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> I think this is the same problem as reported in
>> https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=YPyFPQ@mail.gmail.com
>
> Ah yes, this is the same problem. Not sure why I didn't land on that
> thread when I tried to search pghackers using relevant keywords.
>>
>>> === Fix ===
>> [...]
>>> Instead, the attached patch (prevent_useless_vacuums.patch) prevents
>>> the repeated cycle by noting that there's no point in doing whatever
>>> vac_update_datfrozenxid() does, if we didn't find anything to vacuum
>>> and there's already another worker vacuuming the same database. Note
>>> that it uses wi_tableoid field to check concurrency. It does not use
>>> wi_dboid field to check for already-processing worker, because using
>>> this field might cause each of the workers to think that there is some
>>> other worker vacuuming, and eventually no one vacuums. We have to be
>>> certain that the other worker has already taken a table to vacuum.
>>
>> Hmm, it seems reasonable to skip the end action if we didn't do any
>> cleanup after all. This would normally give enough time between vacuum
>> attempts for the first worker to make further progress and avoid causing
>> a storm.  I'm not really sure that it fixes the problem completely, but
>> perhaps it's enough.
>
> I had thought about this : if we didn't clean up anything, skip the
> end action unconditionally without checking if there was any
> concurrent worker. But then thought it is better to skip only if we
> know there is another worker doing the same job, because :
> a) there might be some reason we are just calling
> vac_update_datfrozenxid() without any condition. But I am not sure
> whether it was intentionally kept like that. Didn't get any leads from
> the history.
> b) it's no harm in updating datfrozenxid() it if there was no other
> worker. In this case, we *know* that there was indeed nothing to be
> cleaned up. So the next time this database won't be chosen again, so
> there's no harm just calling this function.
>

Since autovacuum worker wakes up autovacuum launcher after launched
the autovacuum launcher could try to spawn worker process at high
frequently if you have database with very large table in it that has
just passed autovacuum_freeze_max_age.

autovacuum.c:L1605       /* wake up the launcher */       if (AutoVacuumShmem->av_launcherpid != 0)
kill(AutoVacuumShmem->av_launcherpid,SIGUSR2);
 

I think we should deal with this case as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Mithun Cy
Date:
Subject: [HACKERS] Tuple sort is broken. It crashes on simple test.
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE