Re: canceling autovacuum task woes - Mailing list pgsql-hackers

From Robert Haas
Subject Re: canceling autovacuum task woes
Date
Msg-id 05DA660B-FF87-4382-B1AF-A5D8DC0D88F7@gmail.com
Whole thread Raw
In response to Re: canceling autovacuum task woes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: canceling autovacuum task woes
List pgsql-hackers
On Jul 24, 2012, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jul 24, 2012 at 4:03 PM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>>> Looks great.  Are you considering backpatching this?
>
>> Well, that would certainly make MY life easier.  I am not sure whether
>> it would be in line with project policy, however.
>
> +1 for a backpatch.  Otherwise it'll be years before we gain any
> information about the unexpected cancels that you think exist

OK, great.

> However, after looking some more at deadlock.c, I wonder whether
> (a) this patch gives sufficient detail, and (b) whether there isn't a
> problem that's obvious by inspection.  It appears to me that as the
> blocking_autovacuum_proc stuff is coded, it will finger an AV proc as
> needing to be killed even though it may be several graph edges out from
> the current proc.  This means that with respect to (a), the connection
> from the process doing the kill to the AV proc may be inadequately
> documented by this patch, and with respect to (b), there might well be
> cases where we found an AV proc somewhere in the graph traversal but
> it's not actually guilty of blocking the current process ... especially
> not after the queue reorderings that we may have done.  I think I'd be
> happier with that code if it restricted its AV targets to procs that
> *directly* block the current process, which not incidentally would make
> this amount of log detail sufficient.

Uggh.  Well, that certainly sounds like something that could cause spurious cancels - or excessively fast ones, since
presumablyif we limit it to things that directly block the current process, you'll always allow the full
deadlock_timeoutbefore nuking the autovac worker.  So +1 for changing that. 

Does an edge in this context mean any lock, or just an ungranted one?  I assume the latter, which still leaves the
questionof where the edges are coming from in the first place. 

...Robert

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: [patch] libpq one-row-at-a-time API
Next
From: Marko Kreen
Date:
Subject: Re: [patch] libpq one-row-at-a-time API