Re: (auto)vacuum truncate exclusive lock - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: (auto)vacuum truncate exclusive lock |
Date | |
Msg-id | 1365792171.53572.YahooMailNeo@web162902.mail.bf1.yahoo.com Whole thread Raw |
In response to | Re: (auto)vacuum truncate exclusive lock (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: (auto)vacuum truncate exclusive lock
|
List | pgsql-hackers |
[some relevant dropped bits of the thread restored] Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Kevin Grittner <kgrittn@ymail.com> writes: >>>> Jeff Janes <jeff.janes@gmail.com> wrote: >>>> I propose to do the following: >>>> (1) Restore the prior behavior of the VACUUM command. This >>>> was only ever intended to be a fix for a serious autovacuum >>>> problem which caused many users serious performance problems >>>> (2) If autovacuum decides to try to truncate but the lock >>>> cannot be initially acquired, and analyze is requested, skip >>>> the truncation and do the autoanalyze. >>> I think that the minimum appropriate fix here is to [...] take >>> out the suppression of stats reporting and analysis. >> >> I'm not sure I understand -- are you proposing that is all we do >> for both the VACUUM command and autovacuum? > > No, I said that was the minimum fix. OK, I suggested that and more, so I wasn't sure what you were getting at. >>>>> OK, I see that now. In the old behavior, of the lock was >>>>> acquired, but then we were shoved off from it, the analyze >>>>> was not done. But, in the old behavior if the lock was never >>>>> acquired at all, then it would go ahead to do the >>>>> autoanalyze, >>>> Ah, I see now. So the actual worst case for the old code, in >>>> terms of both head-banging and statistics, was if autovacuum >>>> was able to acquire the lock but then many tasks all piled up >>>> behind its lock. If the system was even *more* busy it would >>>> not acquire the lock at all, and would behave better. > and I suppose the rationale for suppressing the stats report was > this same idea of lying to the stats collector in order to > encourage a new vacuum attempt to happen right away. I think Jan expressed some such sentiment back during the original discussion. I was not persuaded by that; but he pointed out that if the deadlock killer killed an autovacuum process which was doing a truncate, the it did not get to the statistics phase; so I agreed that any change in that behavior should be a separate patch. I missed the fact that if it failed to initially get the lock it did proceed to the statistics phase. I explained this earlier in this thread. No need to cast about for hypothetical explanations. > Now I'm not sure that that's a good idea at all I'm pretty sure it isn't; that's why I proposed changing it. > But if it is reasonable, we need a redesign of the reporting > messages, not just a hack to not tell the stats collector what we > did. The idea was to try to make as small a change in previous behavior as possible. Jan pointed out that when the deadlock detection code killed an autovacuum worker which was trying to truncate, the statistics were not updated, leading to retries. This was an attempt to *not change* existing behavior. It was wrong, because we both missed the fact that if it didn't get the lock in the first place it went ahead with statistics generation. That being the case, I was proposing we always generate statistics if we were supposed to. That would be a change toward *more* up-to-date statistics and *fewer* truncation retries than we've had. I'm OK with that because a table hot enough to hit the issue will likely need the space again or need another vacuum soon. > Are you saying you intend to revert that whole concept? No. I was merely asking what you were suggesting. As I said earlier: >>>> I have seen cases where the old logic head-banged for hours or >>>> days without succeeding at the truncation attempt in >>>> autovacuum, absolutely killing performance until the user ran >>>> an explicit VACUUM. And in the meantime, since the deadlock >>>> detection logic was killing autovacuum before it got to the >>>> analyze phase, the autoanalyze was never done. > Otherwise we need some thought about how to inform the stats > collector what's really happening. I think we can probably improve that on some future release. I don't think a new scheme for that makes sense for back-patching or 9.3. For now what I'm suggesting is generating statistics in all the cases it did before, plus the case where it starts truncation but does not complete it. The fact that before this patch there were cases where the autovacuum worker was killed, resulting in not generating needed statistics seems like a bug, not a behavior we need to preserve. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: