Re: Spinlocks, yet again: analysis and proposed patches - Mailing list pgsql-hackers
From | Michael Paesold |
---|---|
Subject | Re: Spinlocks, yet again: analysis and proposed patches |
Date | |
Msg-id | 017b01c5b8ff$c3280bc0$0f01a8c0@zaphod Whole thread Raw |
In response to | Spinlocks, yet again: analysis and proposed patches (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane wrote: > "Michael Paesold" <mpaesold@gmx.at> writes: >> To have other data, I have retested the patches on a single-cpu Intel P4 >> 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 >> dual-Xeon >> results it's clear that this is in reality only one cpu. While the >> runtime >> for N=1 is better than the other system, for N=4 it's already worse. The >> situation with the patches is quite different, though. Unfortunatly. > >> CVS tip from 2005-09-12: >> 1: 36s 2: 77s (cpu ~85%) 4: 159s (cpu ~98%) > >> only slock-no-cmpb: >> 1: 36s 2: 81s (cpu ~79%) 4: 177s (cpu ~94%) >> (doesn't help this time) > > Hm. This is the first configuration we've seen in which slock-no-cmpb > was a loss. Could you double-check that result? The first tests were compiled with CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with just CFLAGS='-O2' yesterday. The difference was just about a second, but the result from the patch was the same. The results for N=4 and N=8 show the positive effect more clearly. configure: CFLAGS='-O2' --enable-casserts On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1) CVS tip from 2005-09-12: 1: 37s 2: 78s 4: 159s 8: 324 only slock-no-cmpb: 1: 37s 2: 82s (5%) 4: 178s (12%) 8: 362 (12%) configure: --enable-casserts (Btw. I have always done "make clean ; make ; make install" between tests) Best Regards, Michael Paesold > I can't see any reasonable way to do runtime switching of the cmpb test > --- whatever logic we put in to control it would cost as much or more > than the cmpb anyway :-(. I think that has to be a compile-time choice. > From my perspective it'd be acceptable to remove the cmpb only for > x86_64, since only there does it seem to be a really significant win. > On the other hand it seems that removing the cmpb is a net win on most > x86 setups too, so maybe we should just do it and accept that there are > some cases where it's not perfect. How many test cases do we have yet? Summary of the effects without the cmpb instruction seems to be: 8-way Opteron: better Dual/HT Xeon w/o EM64T: better Dual/HT EM64T: better for N<=cpus, worse for N>cpus (Stephen's) HT P4 w/o EM64T: worse (stronger for N>cpus) Have I missed other reports that did test the slock-no-cmpb.patch alone? Two of the systems with positive effects are x86_64, one is an older high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with only hyper threading. According to the mentions thread's title, this was an optimization for hyperthreading, not regular multi-cpus. We could have more data, especially on newer and high-end systems. Could some of you test the slock-no-cmpb.patch? You'll need an otherwise idle system to get repeatable results. http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php I have re-attached the relevant files from Tom's posts because in the archive it's not clear anymore what should go into which file. See instructions in the first messages above. The patch applies to CVS tip with patch -p1 < slock-no-cmpb.patch Best Regards, Michael Paesold
pgsql-hackers by date: