Thread: (auto)vacuum truncate exclusive lock

(auto)vacuum truncate exclusive lock

From
Jeff Janes
Date:
I guess I'm a couple releases late to review the "autovacuum truncate exclusive lock" patch (a79ae0bc0d454b9f2c95a), but this patch did not only affect autovac, it affects manual vacuum as well (as did the original behavior it is a modification of).  So the compiler constants are misnamed, and the elog message when it triggers is misleading.  (Is it also misleading to just say "vacuum"?  Does it need to say "(auto)vacuum"?) 

I've attached a patch that changes that. I also log the number of pages truncated at the time it gave up, as it would be nice to know if it is completely starving or making some progress.

Also, I think that permanently boycotting doing autoanalyze because someone is camping out on an access share lock (or because there are a never-ending stream of overlapping locks) and so the truncation cannot be done is a bit drastic, especially for inclusion in a point release.  Is there a way to have the autoanalyze happen, but then still arrange for the autovacuum to be triggered again next naptime?  

Cheers,

Jeff
Attachment

Re: (auto)vacuum truncate exclusive lock

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> I guess I'm a couple releases late to review the "autovacuum truncate
> exclusive lock" patch (a79ae0bc0d454b9f2c95a), but this patch did not only
> affect autovac, it affects manual vacuum as well (as did the original
> behavior it is a modification of).  So the compiler constants are misnamed,
> and the elog message when it triggers is misleading.  (Is it also
> misleading to just say "vacuum"?  Does it need to say "(auto)vacuum"?)

I just came to look at this via a complaint in pgsql-admin.  I'm not
convinced that we should consider the new behavior to be sane.
Automatic exclusive-lock abandonment makes sense for autovacuum, but
when the user has told us to vacuum, ISTM we should do it.  I can see
there might be differing opinions on that though.

> Also, I think that permanently boycotting doing autoanalyze because someone
> is camping out on an access share lock (or because there are a never-ending
> stream of overlapping locks) and so the truncation cannot be done is a bit
> drastic, especially for inclusion in a point release.

It's worse than that, it breaks manual VACUUM ANALYZE too (as per -admin
complaint).  I think this aspect is completely wrong, whether or not
you consider that dropping the exclusive lock early is sane for manual
vacuum.  If we were told to do an analyze, we should press on and do it.

Thoughts?
        regards, tom lane



Re: (auto)vacuum truncate exclusive lock

From
Kevin Grittner
Date:
Jeff Janes <jeff.janes@gmail.com> wrote:

> I guess I'm a couple releases late to review the "autovacuum
> truncate exclusive lock" patch (a79ae0bc0d454b9f2c95a), but this
> patch did not only affect autovac, it affects manual vacuum as
> well (as did the original behavior it is a modification of).  So
> the compiler constants are misnamed, and the elog message when it
> triggers is misleading.  (Is it also misleading to just say
> "vacuum"?  Does it need to say "(auto)vacuum"?)

I think it should because I don't think this heuristic is
appropriate for an explicit VACUUM command.  It's one thing for
autovacuum to leave the file sizes alone if it can't truncate
without blocking foreground processes; it's another for a
foreground VACUUM request to not do all of the work it was expected
to do.

> I also log the number of pages truncated at the time it gave up,
> as it would be nice to know if it is completely starving or
> making some progress.

If we're going to have the message, we should make it useful.  My
biggest question here is not whether we should add this info, but
whether it should be DEBUG instead of LOG.

> Also, I think that permanently boycotting doing autoanalyze
> because someone is camping out on an access share lock (or
> because there are a never-ending stream of overlapping locks) and
> so the truncation cannot be done is a bit drastic, especially for
> inclusion in a point release.

That much is not a change in the event that the truncation does not
complete.  I have seen cases where the old logic head-banged for
hours or days without succeeding at the truncation attempt in
autovacuum, absolutely killing performance until the user ran an
explicit VACUUM.  And in the meantime, since the deadlock detection
logic was killing autovacuum before it got to the analyze phase,
the autoanalyze was never done.

Perhaps the new logic should go ahead and get its lock even on a
busy system (like the old logic), and bail out after some
incremental progress.  Based on the current reports, it seems
likely that at least some workloads are not allowing the current
code to get a toe-hold.

> Is there a way to have the autoanalyze happen, but then still
> arrange for the autovacuum to be triggered again next naptime?  

I suggested that myself, but that was a bigger change from the old
behavior (at least if we allow that initial lock so we guarantee
some progress, per the above), and I didn't think that should be
back-patched.  A separate patch for just that, applied to
master/HEAD, seems like a good idea to me.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



(auto)vacuum truncate exclusive lock

From
Jeff Janes
Date:
On Thursday, April 11, 2013, Tom Lane wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> I guess I'm a couple releases late to review the "autovacuum truncate
> exclusive lock" patch (a79ae0bc0d454b9f2c95a), but this patch did not only
> affect autovac, it affects manual vacuum as well (as did the original
> behavior it is a modification of).  So the compiler constants are misnamed,
> and the elog message when it triggers is misleading.  (Is it also
> misleading to just say "vacuum"?  Does it need to say "(auto)vacuum"?)

I just came to look at this via a complaint in pgsql-admin.  I'm not
convinced that we should consider the new behavior to be sane.
Automatic exclusive-lock abandonment makes sense for autovacuum, but
when the user has told us to vacuum, ISTM we should do it.  I can see
there might be differing opinions on that though.

It does do the vacuum (so that internal space can be re-used, etc.), it is the truncation of > 8MB of terminal free space that it may or may not do.

In this regard, the old behavior of manual vacuum was not all that consistent, either.   If it could not get the lock immediately, it would abandon the attempt to truncation (but still do the analyze, unlike now).  But if the lock was immediately available, it would take it and hold it for as long as it took.  It is an odd mixture of courtesy and aggressiveness to not wait for a lock if unavailable, but if obtained to cling to it (justified by the ease of implementation, I guess). 

I think the manual vacuum should try to obtain the lock for 5 seconds, as in the current behavior.  Whether once obtained it should cling to it, or surrender it, I don't really know.  



 

> Also, I think that permanently boycotting doing autoanalyze because someone
> is camping out on an access share lock (or because there are a never-ending
> stream of overlapping locks) and so the truncation cannot be done is a bit
> drastic, especially for inclusion in a point release.

It's worse than that, it breaks manual VACUUM ANALYZE too (as per -admin
complaint).  I think this aspect is completely wrong, whether or not
you consider that dropping the exclusive lock early is sane for manual
vacuum.  If we were told to do an analyze, we should press on and do it.

Thoughts?

I don't think auto-analyze should get cancelled either.
 
Cheers,

Jeff

Re: (auto)vacuum truncate exclusive lock

From
Jeff Janes
Date:
On Thursday, April 11, 2013, Kevin Grittner wrote:

> I also log the number of pages truncated at the time it gave up,
> as it would be nice to know if it is completely starving or
> making some progress.

If we're going to have the message, we should make it useful.  My
biggest question here is not whether we should add this info, but
whether it should be DEBUG instead of LOG

I like it being LOG.  If it were DEBUG, I don't think anyone would be likely to see it when they needed to, as it happens sporadically on busy servers and I don't think people would run those with DEBUG on.  I figure it is analogous to an autovacuum cancel message it partially replaces, and those are LOG.


 

> Also, I think that permanently boycotting doing autoanalyze
> because someone is camping out on an access share lock (or
> because there are a never-ending stream of overlapping locks) and
> so the truncation cannot be done is a bit drastic, especially for
> inclusion in a point release.

That much is not a change in the event that the truncation does not
complete. 

OK, I see that now.  In the old behavior, of the lock was acquired, but then we were shoved off from it, the analyze was not done.  But, in the old behavior if the lock was never acquired at all, then it would go ahead to do the autoanalyze, and that has changed.   That is they way I was testing it (camping out on an access shared lock so the access exclusive could never be granted in the first place; because intercepting it during the truncate phase was hard to do) and I just assumed the behavior I saw would apply to both cases, but it does not.
 
I have seen cases where the old logic head-banged for
hours or days without succeeding at the truncation attempt in
autovacuum, absolutely killing performance until the user ran an
explicit VACUUM.  And in the meantime, since the deadlock detection
logic was killing autovacuum before it got to the analyze phase,
the autoanalyze was never done.

OK, so there three problems.  It would take a second to yield, in doing so it would abandon all the progress it had made in that second rather than saving it, and it would tight loop (restricted by naptime) on this because of the lack of analyze.  So it fixed the first two in a way that seems an absolute improvement for the auto case, but it made the third one worse in a common case, where it never acquires the lock in the first place, and so doesn't analyze when before it did in that one case.
 

Perhaps the new logic should go ahead and get its lock even on a
busy system (like the old logic),

As far as I can tell, the old logic was always conditional on the AccessExlusive lock acquisition, whether it was manual or auto.

Cheers,

Jeff 

Re: (auto)vacuum truncate exclusive lock

From
Kevin Grittner
Date:
Jeff Janes <jeff.janes@gmail.com> wrote:

>> If we're going to have the message, we should make it useful.
>> My biggest question here is not whether we should add this info,
>> but whether it should be DEBUG instead of LOG

> I like it being LOG.  If it were DEBUG, I don't think anyone
> would be likely to see it when they needed to, as it happens
> sporadically on busy servers and I don't think people would run
> those with DEBUG on.  I figure it is analogous to an autovacuum
> cancel message it partially replaces, and those are LOG.

OK, sold.

>>> Also, I think that permanently boycotting doing autoanalyze
>>> because someone is camping out on an access share lock (or
>>> because there are a never-ending stream of overlapping locks)
>>> and so the truncation cannot be done is a bit drastic,
>>> especially for inclusion in a point release.

>> That much is not a change in the event that the truncation does
>> not complete.  

> OK, I see that now.  In the old behavior, of the lock was
> acquired, but then we were shoved off from it, the analyze was
> not done.  But, in the old behavior if the lock was never
> acquired at all, then it would go ahead to do the autoanalyze,
> and that has changed.   That is they way I was testing it
> (camping out on an access shared lock so the access exclusive
> could never be granted in the first place; because intercepting
> it during the truncate phase was hard to do) and I just assumed
> the behavior I saw would apply to both cases, but it does not.

Ah, I see now.  So the actual worst case for the old code, in terms
of both head-banging and statistics, was if autovacuum was able to
acquire the lock but then many tasks all piled up behind its lock.
If the system was even *more* busy it would not acquire the lock at
all, and would behave better.

>> I have seen cases where the old logic head-banged for
>> hours or days without succeeding at the truncation attempt in
>> autovacuum, absolutely killing performance until the user ran an
>> explicit VACUUM.  And in the meantime, since the deadlock
>> detection logic was killing autovacuum before it got to the
>> analyze phase, the autoanalyze was never done.

> OK, so there three problems.  It would take a second to yield, in
> doing so it would abandon all the progress it had made in that
> second rather than saving it, and it would tight loop (restricted
> by naptime) on this because of the lack of analyze.  So it fixed
> the first two in a way that seems an absolute improvement for the
> auto case, but it made the third one worse in a common case,
> where it never acquires the lock in the first place, and so
> doesn't analyze when before it did in that one case.

Yeah, I see that now.

>> Perhaps the new logic should go ahead and get its lock even on a
>> busy system (like the old logic),

> As far as I can tell, the old logic was always conditional on the
> AccessExlusive lock acquisition, whether it was manual or auto.

OK, will review that to confirm;but assuming that's right, and
nobody else is already working on a fix, I propose to do the
following:

(1)  Restore the prior behavior of the VACUUM command.  This was
only ever intended to be a fix for a serious autovacuum problem
which caused many users serious performance problems -- in some
cases including unscheduled down time.  I also saw sites where,
having been bitten by this, they disabled autovacuum and later ran
into problems with bloat and/or xid wraparound.

(2)  If autovacuum decides to try to truncate but the lock cannot
be initially acquired, and analyze is requested, skip the
truncation and do the autoanalyze.  If the table is so hot that we
cannot get the lock, the space may get re-used soon, and if not
there is a good chance another autovacuum will trigger soon.  If
the user really wants the space released to the OS immediately,
they can run a manual vacuum to force the issue.

If I don't hear anything within the next day or two, I'll write
that up and post it here before applying (and back-patching to
affected branches).

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: (auto)vacuum truncate exclusive lock

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> OK, will review that to confirm;but assuming that's right, and
> nobody else is already working on a fix, I propose to do the
> following:

> (1)� Restore the prior behavior of the VACUUM command.� This was
> only ever intended to be a fix for a serious autovacuum problem
> which caused many users serious performance problems -- in some
> cases including unscheduled down time.� I also saw sites where,
> having been bitten by this, they disabled autovacuum and later ran
> into problems with bloat and/or xid wraparound.

> (2)� If autovacuum decides to try to truncate but the lock cannot
> be initially acquired, and analyze is requested, skip the
> truncation and do the autoanalyze.� If the table is so hot that we
> cannot get the lock, the space may get re-used soon, and if not
> there is a good chance another autovacuum will trigger soon.� If
> the user really wants the space released to the OS immediately,
> they can run a manual vacuum to force the issue.

I think that the minimum appropriate fix here is to revert the hunk
I quoted, ie take out the suppression of stats reporting and analysis.

However, we're still thinking too small.  I've been wondering whether we
couldn't entirely remove the dirty, awful kluges that were installed in
the lock manager to kill autovacuum when somebody blocked behind it.
This mechanism should ensure that AV never takes an exclusive lock
for long enough to be a serious problem, so do we need that anymore?
        regards, tom lane



Re: (auto)vacuum truncate exclusive lock

From
Andres Freund
Date:
On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
> > OK, will review that to confirm;but assuming that's right, and
> > nobody else is already working on a fix, I propose to do the
> > following:
>
> > (1)� Restore the prior behavior of the VACUUM command.� This was
> > only ever intended to be a fix for a serious autovacuum problem
> > which caused many users serious performance problems -- in some
> > cases including unscheduled down time.� I also saw sites where,
> > having been bitten by this, they disabled autovacuum and later ran
> > into problems with bloat and/or xid wraparound.
>
> > (2)� If autovacuum decides to try to truncate but the lock cannot
> > be initially acquired, and analyze is requested, skip the
> > truncation and do the autoanalyze.� If the table is so hot that we
> > cannot get the lock, the space may get re-used soon, and if not
> > there is a good chance another autovacuum will trigger soon.� If
> > the user really wants the space released to the OS immediately,
> > they can run a manual vacuum to force the issue.
>
> I think that the minimum appropriate fix here is to revert the hunk
> I quoted, ie take out the suppression of stats reporting and analysis.
>
> However, we're still thinking too small.  I've been wondering whether we
> couldn't entirely remove the dirty, awful kluges that were installed in
> the lock manager to kill autovacuum when somebody blocked behind it.
> This mechanism should ensure that AV never takes an exclusive lock
> for long enough to be a serious problem, so do we need that anymore?

Wouldn't that make DROP TABLE stop working while autovac is processing
the table? Considering how long e.g. a full table vacuum on a huge table
can take that doesn't seem to be acceptable.
Sure, you can manually kill the autovac process, but that will soon be
restarted. So you have to know that you need to start the DROP TABLE
beforehand so it will get the lock quicker and such.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: (auto)vacuum truncate exclusive lock

From
Kevin Grittner
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> OK, will review that to confirm;but assuming that's right, and
>> nobody else is already working on a fix, I propose to do the
>> following:
>
>> (1)  Restore the prior behavior of the VACUUM command.  This was
>> only ever intended to be a fix for a serious autovacuum problem
>> which caused many users serious performance problems -- in some
>> cases including unscheduled down time.  I also saw sites where,
>> having been bitten by this, they disabled autovacuum and later ran
>> into problems with bloat and/or xid wraparound.
>
>> (2)  If autovacuum decides to try to truncate but the lock cannot
>> be initially acquired, and analyze is requested, skip the
>> truncation and do the autoanalyze.  If the table is so hot that we
>> cannot get the lock, the space may get re-used soon, and if not
>> there is a good chance another autovacuum will trigger soon.  If
>> the user really wants the space released to the OS immediately,
>> they can run a manual vacuum to force the issue.
>
> I think that the minimum appropriate fix here is to revert the hunk
> I quoted, ie take out the suppression of stats reporting and analysis.

I'm not sure I understand -- are you proposing that is all we do
for both the VACUUM command and autovacuum?  (i.e., we don't try to
full restore the old VACUUM command behavior; just the troublesome
failure to generate statistics?)

> However, we're still thinking too small.  I've been wondering whether we
> couldn't entirely remove the dirty, awful kluges that were installed in
> the lock manager to kill autovacuum when somebody blocked behind it.
> This mechanism should ensure that AV never takes an exclusive lock
> for long enough to be a serious problem, so do we need that anymore?

Are you suggesting that just in master/HEAD or back to 9.0?  If the
latter, what existing problem does it cure (besides ugly code)?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: (auto)vacuum truncate exclusive lock

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-04-12 13:09:02 -0400, Tom Lane wrote:
>> However, we're still thinking too small.  I've been wondering whether we
>> couldn't entirely remove the dirty, awful kluges that were installed in
>> the lock manager to kill autovacuum when somebody blocked behind it.
>> This mechanism should ensure that AV never takes an exclusive lock
>> for long enough to be a serious problem, so do we need that anymore?

> Wouldn't that make DROP TABLE stop working while autovac is processing
> the table?

Meh ... I guess you're right.  I wasn't thinking about exclusive locks
being taken elsewhere.
        regards, tom lane



Re: (auto)vacuum truncate exclusive lock

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that the minimum appropriate fix here is to revert the hunk
>> I quoted, ie take out the suppression of stats reporting and analysis.

> I'm not sure I understand -- are you proposing that is all we do
> for both the VACUUM command and autovacuum?

No, I said that was the minimum fix.

Looking again at the patch, I note this comment:
              /*
+                * We failed to establish the lock in the specified number of
+                * retries. This means we give up truncating. Suppress the
+                * ANALYZE step. Doing an ANALYZE at this point will reset the
+                * dead_tuple_count in the stats collector, so we will not get
+                * called by the autovacuum launcher again to do the truncate.
+                */

and I suppose the rationale for suppressing the stats report was this
same idea of lying to the stats collector in order to encourage a new
vacuum attempt to happen right away.  Now I'm not sure that that's a
good idea at all --- what's the reasoning for thinking the table will be
any less hot in thirty seconds?  But if it is reasonable, we need a
redesign of the reporting messages, not just a hack to not tell the
stats collector what we did.

Are you saying you intend to revert that whole concept?  That'd be
okay with me, I think.  Otherwise we need some thought about how to
inform the stats collector what's really happening.
        regards, tom lane



Re: (auto)vacuum truncate exclusive lock

From
Alvaro Herrera
Date:
Tom Lane escribió:

> Are you saying you intend to revert that whole concept?  That'd be
> okay with me, I think.  Otherwise we need some thought about how to
> inform the stats collector what's really happening.

Maybe what we need is to consider table truncation as a separate
activity from vacuuming.  Then autovacuum could invoke it without having
to do a full-blown vacuum.  For this to work, I guess we would like to
separately store the status of the back-scan in pgstat somehow (I think
a boolean flag suffices: "were we able to truncate all pages that
appeared to be empty?")

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: (auto)vacuum truncate exclusive lock

From
Kevin Grittner
Date:
[some relevant dropped bits of the thread restored]

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Kevin Grittner <kgrittn@ymail.com> writes:
>>>> Jeff Janes <jeff.janes@gmail.com> wrote:

>>>> I propose to do the following:

>>>> (1)  Restore the prior behavior of the VACUUM command.  This
>>>> was only ever intended to be a fix for a serious autovacuum
>>>> problem which caused many users serious performance problems

>>>> (2)  If autovacuum decides to try to truncate but the lock
>>>> cannot be initially acquired, and analyze is requested, skip
>>>> the truncation and do the autoanalyze.

>>> I think that the minimum appropriate fix here is to [...] take
>>> out the suppression of stats reporting and analysis.
>>
>> I'm not sure I understand -- are you proposing that is all we do
>> for both the VACUUM command and autovacuum?
>
> No, I said that was the minimum fix.

OK, I suggested that and more, so I wasn't sure what you were
getting at.

>>>>> OK, I see that now.  In the old behavior, of the lock was
>>>>> acquired, but then we were shoved off from it, the analyze
>>>>> was not done.  But, in the old behavior if the lock was never
>>>>> acquired at all, then it would go ahead to do the
>>>>> autoanalyze,

>>>> Ah, I see now.  So the actual worst case for the old code, in
>>>> terms of both head-banging and statistics, was if autovacuum
>>>> was able to acquire the lock but then many tasks all piled up
>>>> behind its lock.  If the system was even *more* busy it would
>>>> not acquire the lock at all, and would behave better.

> and I suppose the rationale for suppressing the stats report was
> this same idea of lying to the stats collector in order to
> encourage a new vacuum attempt to happen right away.

I think Jan expressed some such sentiment back during the original
discussion.  I was not persuaded by that; but he pointed out that
if the deadlock killer killed an autovacuum process which was doing
a truncate, the it did not get to the statistics phase; so I agreed
that any change in that behavior should be a separate patch.  I
missed the fact that if it failed to initially get the lock it did
proceed to the statistics phase.  I explained this earlier in this
thread.  No need to cast about for hypothetical explanations.

> Now I'm not sure that that's a good idea at all

I'm pretty sure it isn't; that's why I proposed changing it.

> But if it is reasonable, we need a redesign of the reporting
> messages, not just a hack to not tell the stats collector what we
> did.

The idea was to try to make as small a change in previous behavior
as possible.  Jan pointed out that when the deadlock detection code
killed an autovacuum worker which was trying to truncate, the
statistics were not updated, leading to retries.  This was an
attempt to *not change* existing behavior.  It was wrong, because
we both missed the fact that if it didn't get the lock in the first
place it went ahead with statistics generation.  That being the
case, I was proposing we always generate statistics if we were
supposed to.  That would be a change toward *more* up-to-date
statistics and *fewer* truncation retries than we've had.  I'm OK
with that because a table hot enough to hit the issue will likely
need the space again or need another vacuum soon.

> Are you saying you intend to revert that whole concept?

No.  I was merely asking what you were suggesting.  As I said
earlier:

>>>> I have seen cases where the old logic head-banged for hours or
>>>> days without succeeding at the truncation attempt in
>>>> autovacuum, absolutely killing performance until the user ran
>>>> an explicit VACUUM.  And in the meantime, since the deadlock
>>>> detection logic was killing autovacuum before it got to the
>>>> analyze phase, the autoanalyze was never done.

> Otherwise we need some thought about how to inform the stats
> collector what's really happening.

I think we can probably improve that on some future release.  I
don't think a new scheme for that makes sense for back-patching or
9.3.

For now what I'm suggesting is generating statistics in all the
cases it did before, plus the case where it starts truncation but
does not complete it.  The fact that before this patch there were
cases where the autovacuum worker was killed, resulting in not
generating needed statistics seems like a bug, not a behavior we
need to preserve.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: (auto)vacuum truncate exclusive lock

From
Tom Lane
Date:
Kevin Grittner <kgrittn@ymail.com> writes:
> For now what I'm suggesting is generating statistics in all the
> cases it did before, plus the case where it starts truncation but
> does not complete it.� The fact that before this patch there were
> cases where the autovacuum worker was killed, resulting in not
> generating needed statistics seems like a bug, not a behavior we
> need to preserve.

Well, in the case where it gets killed, it's still not gonna generate
statistics.  What we've really got here is a new case that did not exist
before, namely that it voluntarily stops truncating.  But I agree that
modeling that case's behavior on the kill case was poorly thought out.

In other words, yes, I think we're on the same page.
        regards, tom lane



Re: (auto)vacuum truncate exclusive lock

From
Jan Wieck
Date:
On 4/12/2013 1:57 PM, Tom Lane wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think that the minimum appropriate fix here is to revert the hunk
>>> I quoted, ie take out the suppression of stats reporting and analysis.
>
>> I'm not sure I understand -- are you proposing that is all we do
>> for both the VACUUM command and autovacuum?
>
> No, I said that was the minimum fix.
>
> Looking again at the patch, I note this comment:
>
>                 /*
> +                * We failed to establish the lock in the specified number of
> +                * retries. This means we give up truncating. Suppress the
> +                * ANALYZE step. Doing an ANALYZE at this point will reset the
> +                * dead_tuple_count in the stats collector, so we will not get
> +                * called by the autovacuum launcher again to do the truncate.
> +                */
>
> and I suppose the rationale for suppressing the stats report was this
> same idea of lying to the stats collector in order to encourage a new
> vacuum attempt to happen right away.  Now I'm not sure that that's a
> good idea at all --- what's the reasoning for thinking the table will be
> any less hot in thirty seconds?  But if it is reasonable, we need a
> redesign of the reporting messages, not just a hack to not tell the
> stats collector what we did.

Yes, that was the rationale behind it combined with "don't change 
function call sequences and more" all over the place.

The proper solution would eventually be to add a block number to the 
stats held by the stats collector, which preserves the information about 
the last filled block of the table. The decouple the truncate from 
vacuum/autovacuum. vacuum/autovacuum will set that block number when 
they detect the trailing free space. The analyze step can happen just as 
usual and reset stats, which doesn't reset that block number. The 
autovacuum launcher goes through its normal logic for launching autovac 
or autoanalyze. If it doesn't find any of those to do but the 
last-used-block is set, it launches the separate lazy truncate step.

This explicitly moves the truncate, which inherently requires the 
exclusive lock and therefore is undesirable even in a manual vacuum, 
into the background.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



Re: (auto)vacuum truncate exclusive lock

From
Jan Wieck
Date:
On 4/12/2013 2:08 PM, Alvaro Herrera wrote:
> Tom Lane escribió:
>
>> Are you saying you intend to revert that whole concept?  That'd be
>> okay with me, I think.  Otherwise we need some thought about how to
>> inform the stats collector what's really happening.
>
> Maybe what we need is to consider table truncation as a separate
> activity from vacuuming.  Then autovacuum could invoke it without having
> to do a full-blown vacuum.  For this to work, I guess we would like to
> separately store the status of the back-scan in pgstat somehow (I think
> a boolean flag suffices: "were we able to truncate all pages that
> appeared to be empty?")

Should have read the entire thread before responding :)


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



Re: (auto)vacuum truncate exclusive lock

From
Jan Wieck
Date:
On 4/18/2013 11:44 AM, Jan Wieck wrote:

> Yes, that was the rationale behind it combined with "don't change
> function call sequences and more" all over the place.

function call signatures

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin