On Mon, 2010-05-03 at 15:39 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, May 3, 2010 at 11:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm inclined to think that we should throw away all this logic and just
> >> have the slave cancel competing queries if the replay process waits
> >> more than max_standby_delay seconds to acquire a lock.
>
> > What if we somehow get into a situation where the replay process is
> > waiting for a lock over and over and over again, because it keeps
> > killing conflicting processes but something restarts them and they
> > take locks over again?
>
> They won't be able to take locks "over again", because the lock manager
> won't allow requests to pass a pending previous request, except in
> very limited circumstances that shouldn't hold here. They'll queue
> up behind the replay process's lock request, not in front of it.
> (If that isn't the case, it needs to be fixed, quite independently
> of this concern.)
Most conflicts aren't lock-manager locks, they are snapshot conflicts,
though clearly different workloads will have different characteristics.
Some conflicts are buffer conflicts and the semantics of buffer cleanup
locks and many other internal locks are that shared locks queue jump
past exclusive lock requests. Not something we should touch, now at
least.
I understand that you aren't impressed by everything about the current
patch but rushed changes may not help either.
-- Simon Riggs www.2ndQuadrant.com