On Jul 24, 2012, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Jul 24, 2012 at 4:03 PM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>>> Looks great. Are you considering backpatching this?
>
>> Well, that would certainly make MY life easier. I am not sure whether
>> it would be in line with project policy, however.
>
> +1 for a backpatch. Otherwise it'll be years before we gain any
> information about the unexpected cancels that you think exist
OK, great.
> However, after looking some more at deadlock.c, I wonder whether
> (a) this patch gives sufficient detail, and (b) whether there isn't a
> problem that's obvious by inspection. It appears to me that as the
> blocking_autovacuum_proc stuff is coded, it will finger an AV proc as
> needing to be killed even though it may be several graph edges out from
> the current proc. This means that with respect to (a), the connection
> from the process doing the kill to the AV proc may be inadequately
> documented by this patch, and with respect to (b), there might well be
> cases where we found an AV proc somewhere in the graph traversal but
> it's not actually guilty of blocking the current process ... especially
> not after the queue reorderings that we may have done. I think I'd be
> happier with that code if it restricted its AV targets to procs that
> *directly* block the current process, which not incidentally would make
> this amount of log detail sufficient.
Uggh. Well, that certainly sounds like something that could cause spurious cancels - or excessively fast ones, since
presumablyif we limit it to things that directly block the current process, you'll always allow the full
deadlock_timeoutbefore nuking the autovac worker. So +1 for changing that.
Does an edge in this context mean any lock, or just an ungranted one? I assume the latter, which still leaves the
questionof where the edges are coming from in the first place.
...Robert