On Fri, 13 Mar 2009, Kevin Grittner wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I think that changing the locking behavior is attacking the problem
>>> at the wrong level anyway.
>>
>> Right. By the time a patch here could have any effect, you've
>> already lost the game --- having to deschedule and reschedule a
>> process is a large cost compared to the typical lock hold time for
>> most LWLocks. So it would be better to look at how to avoid
>> blocking in the first place.
>
> That's what motivated my request for a profile of the "80 clients with
> zero wait" case. If all data access is in RAM, why can't 80 processes
> keep 64 threads (on 8 processors) busy? Does anybody else think
> that's an interesting question, or am I off in left field here?
I don't think that anyone is arguing that it's not intersting, but I also
think that complete dismissal of the existing test case is also wrong.
last night Tom documented some reasons why the prior test may have some
issues, but even with those I think the test shows that there is room for
improvement on the locking.
making sure that the locking change doesn't cause problems for other
workload is a _very_ valid concern, but it's grounds for more testing, not
dismissal.
I think that the suggestion to wake up the first N waiters instead of all
of them is a good optimization (and waking N - # active back-ends would be
even better if there is an easy way to know that number) but I think that
it's worth making the result testable by more people so that we can see if
what workloads are pathalogical for this change (if any)
David Lang