Re: (auto)vacuum truncate exclusive lock - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: (auto)vacuum truncate exclusive lock |
Date | |
Msg-id | 1365783133.1460.YahooMailNeo@web162903.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: (auto)vacuum truncate exclusive lock (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: (auto)vacuum truncate exclusive lock
|
List | pgsql-hackers |
Jeff Janes <jeff.janes@gmail.com> wrote: >> If we're going to have the message, we should make it useful. >> My biggest question here is not whether we should add this info, >> but whether it should be DEBUG instead of LOG > I like it being LOG. If it were DEBUG, I don't think anyone > would be likely to see it when they needed to, as it happens > sporadically on busy servers and I don't think people would run > those with DEBUG on. I figure it is analogous to an autovacuum > cancel message it partially replaces, and those are LOG. OK, sold. >>> Also, I think that permanently boycotting doing autoanalyze >>> because someone is camping out on an access share lock (or >>> because there are a never-ending stream of overlapping locks) >>> and so the truncation cannot be done is a bit drastic, >>> especially for inclusion in a point release. >> That much is not a change in the event that the truncation does >> not complete. > OK, I see that now. In the old behavior, of the lock was > acquired, but then we were shoved off from it, the analyze was > not done. But, in the old behavior if the lock was never > acquired at all, then it would go ahead to do the autoanalyze, > and that has changed. That is they way I was testing it > (camping out on an access shared lock so the access exclusive > could never be granted in the first place; because intercepting > it during the truncate phase was hard to do) and I just assumed > the behavior I saw would apply to both cases, but it does not. Ah, I see now. So the actual worst case for the old code, in terms of both head-banging and statistics, was if autovacuum was able to acquire the lock but then many tasks all piled up behind its lock. If the system was even *more* busy it would not acquire the lock at all, and would behave better. >> I have seen cases where the old logic head-banged for >> hours or days without succeeding at the truncation attempt in >> autovacuum, absolutely killing performance until the user ran an >> explicit VACUUM. And in the meantime, since the deadlock >> detection logic was killing autovacuum before it got to the >> analyze phase, the autoanalyze was never done. > OK, so there three problems. It would take a second to yield, in > doing so it would abandon all the progress it had made in that > second rather than saving it, and it would tight loop (restricted > by naptime) on this because of the lack of analyze. So it fixed > the first two in a way that seems an absolute improvement for the > auto case, but it made the third one worse in a common case, > where it never acquires the lock in the first place, and so > doesn't analyze when before it did in that one case. Yeah, I see that now. >> Perhaps the new logic should go ahead and get its lock even on a >> busy system (like the old logic), > As far as I can tell, the old logic was always conditional on the > AccessExlusive lock acquisition, whether it was manual or auto. OK, will review that to confirm;but assuming that's right, and nobody else is already working on a fix, I propose to do the following: (1) Restore the prior behavior of the VACUUM command. This was only ever intended to be a fix for a serious autovacuum problem which caused many users serious performance problems -- in some cases including unscheduled down time. I also saw sites where, having been bitten by this, they disabled autovacuum and later ran into problems with bloat and/or xid wraparound. (2) If autovacuum decides to try to truncate but the lock cannot be initially acquired, and analyze is requested, skip the truncation and do the autoanalyze. If the table is so hot that we cannot get the lock, the space may get re-used soon, and if not there is a good chance another autovacuum will trigger soon. If the user really wants the space released to the OS immediately, they can run a manual vacuum to force the issue. If I don't hear anything within the next day or two, I'll write that up and post it here before applying (and back-patching to affected branches). -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: