Re: Seeing context switch storm with 10/13 snapshot of - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Seeing context switch storm with 10/13 snapshot of |
Date | |
Msg-id | 1129928242.8300.960.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Seeing context switch storm with 10/13 snapshot of (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Seeing context switch storm with 10/13 snapshot of
|
List | pgsql-hackers |
On Fri, 2005-10-21 at 09:52 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > It would be good right now to have a multi-process test harness that > > would allow us to test out different spin lock code without the rest of > > PostgreSQL getting in the way of testing. If we can isolate the issue > > outside of PostgreSQL it will be much easier to fix. > > Actually I disagree with that, on three grounds: > > 1. Optimizing for an artificial environment may result in the wrong > optimization. > > 2. I don't think you'll be able to prove anything except that all SMP > designs ultimately suck. There is no hardware on the planet that can > trade cache lines back and forth at instruction-dispatch rates. > > 3. The problem right now is not lack of ability to reproduce the > problem, it is lack of ideas how to fix it. Building an artificial > testbed is just directing effort into make-work rather than towards > solving the problem. If we think spinlocks are the problem, building a spinlock test harness will prove that and also simplify the testing of a solution. Isolating the spinlocks in that way is not artificial, but actually a very pure test, so although I agree with (1) as a general statement, this does not apply for the test harness proposal. I was seeing the problem as likely to rear its head again over time. We are entering a stage of increased importance of SMP code, since within a few short years all CPUs will be dual core/HT or something similar. We may fix it for one platform, but other similar problems may re/emerge. The easiest way to test spinlock code on any platform is to get an isolated test case that is runnable outside of the context of PostgreSQL, yet using the pg spinlock code. That would allow us to bring in other people and their many eyeballs to look at the issue; we on this list are not experts at everything. Certainly there is a lack of ideas as to how to fix it, as you mention in (3). This shows to me that the solution lies in one of two areas: a) the solution has not yet been considered or b) the solution has already been thought of and for whatever reason disregarded. You may be certain that the solution lies in a), though I am not. Rejecting ideas quickly may simply increase the chances of finding the solution in a b) case. Forgive me but (2) seems spurious. Nobody said anything about trading cache lines at instruction-dispatch rates. The objective of the test harness would be to check whether negative effects such as CS storms exist on that platform. Actual optimization of the spinlock mechanisms in the context of the PostgreSQL server should certainly be done within the code. In a more general sense: what is the best next action to make progress on the CS issue that exists for certain CPUs? Best Regards, Simon Riggs
pgsql-hackers by date: