Re: Spinlocks, yet again: analysis and proposed patches - Mailing list pgsql-hackers

From Michael Paesold
Subject Re: Spinlocks, yet again: analysis and proposed patches
Date
Msg-id 017b01c5b8ff$c3280bc0$0f01a8c0@zaphod
Whole thread Raw
In response to Spinlocks, yet again: analysis and proposed patches  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> "Michael Paesold" <mpaesold@gmx.at> writes:
>> To have other data, I have retested the patches on a single-cpu Intel P4
>> 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 
>> dual-Xeon
>> results it's clear that this is in reality only one cpu. While the 
>> runtime
>> for N=1 is better than the other system, for N=4 it's already worse. The
>> situation with the patches is quite different, though. Unfortunatly.
>
>> CVS tip from 2005-09-12:
>> 1: 36s   2: 77s (cpu ~85%)    4: 159s (cpu ~98%)
>
>> only slock-no-cmpb:
>> 1: 36s   2: 81s (cpu ~79%)    4: 177s (cpu ~94%)
>> (doesn't help this time)
>
> Hm.  This is the first configuration we've seen in which slock-no-cmpb
> was a loss.  Could you double-check that result?

The first tests were compiled with 
CFLAGS='-O2 -mcpu=pentium4 -march=pentium4'. I had redone the tests with 
just CFLAGS='-O2' yesterday. The difference was just about a second, but the 
result from the patch was the same. The results for N=4 and N=8 show the 
positive effect more clearly.

configure: CFLAGS='-O2' --enable-casserts
On RHEL 4.1, gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1)

CVS tip from 2005-09-12:
1: 37s   2: 78s      4: 159s       8: 324

only slock-no-cmpb:
1: 37s   2: 82s (5%) 4: 178s (12%) 8: 362 (12%)

configure:  --enable-casserts

(Btw. I have always done "make clean ; make ; make install" between tests)

Best Regards,
Michael Paesold

> I can't see any reasonable way to do runtime switching of the cmpb test
> --- whatever logic we put in to control it would cost as much or more
> than the cmpb anyway :-(.  I think that has to be a compile-time choice.
> From my perspective it'd be acceptable to remove the cmpb only for
> x86_64, since only there does it seem to be a really significant win.
> On the other hand it seems that removing the cmpb is a net win on most
> x86 setups too, so maybe we should just do it and accept that there are
> some cases where it's not perfect.

How many test cases do we have yet?
Summary of the effects without the cmpb instruction seems to be:

8-way Opteron:           better
Dual/HT Xeon w/o EM64T:  better
Dual/HT EM64T:           better for N<=cpus, worse for N>cpus (Stephen's)
HT P4 w/o EM64T:         worse (stronger for N>cpus)

Have I missed other reports that did test the slock-no-cmpb.patch alone?
Two of the systems with positive effects are x86_64, one is an older 
high-end Intel x86 chip. The negative effect is on a low-cost Pentium 4 with 
only hyper threading. According to the mentions thread's title, this was an 
optimization for hyperthreading, not regular multi-cpus.

We could have more data, especially on newer and high-end systems. Could 
some of you test the slock-no-cmpb.patch? You'll need an otherwise idle 
system to get repeatable results.

http://archives.postgresql.org/pgsql-hackers/2005-09/msg00565.php
http://archives.postgresql.org/pgsql-hackers/2005-09/msg00566.php

I have re-attached the relevant files from Tom's posts because in the 
archive it's not clear anymore what should go into which file. See 
instructions in the first messages above.

The patch applies to CVS tip with
patch -p1 < slock-no-cmpb.patch

Best Regards,
Michael Paesold


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Spinlocks, yet again: analysis and proposed patches
Next
From: "Michael Paesold"
Date:
Subject: Re: postgresql CVS callgraph data from dbt2