Re: Spinlock performance improvement proposal - Mailing list pgsql-hackers

From Neil Padgett
Subject Re: Spinlock performance improvement proposal
Date
Msg-id 3BB37322.A9B97FB4@redhat.com
Whole thread Raw
In response to Spinlock performance improvement proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> 
> Neil Padgett <npadgett@redhat.com> writes:
> > Well. Currently the runs are the typical pg_bench runs.
> 
> With what parameters?  If you don't initialize the pg_bench database
> with "scale" proportional to the number of clients you intend to use,
> then you'll naturally get huge lock contention.  For example, if you
> use scale=1, there's only one "branch" in the database.  Since every
> transaction wants to update the branch's balance, every transaction
> has to write-lock that single row, and so everybody serializes on that
> one lock.  Under these conditions it's not surprising to see lots of
> lock waits and lots of useless runs of the deadlock detector ...

The results you saw with the large number of useless runs of the
deadlock detector had a scale factor of 2. With a scale factor 2, the
performance fall-off began at about 100 clients. So, I reran the 512
client profiling run with a scale factor of 12. (2:100 as 10:500 -- so
12 might be an appropriate scale factor with some cushion?) This does,
of course, reduce the contention. However, the throughput is still only
about twice as much, which sounds good, but is still a small fraction of
the throughput realized on the same machine with a small number of
clients. (This is the uniprocessor machine.)

The new profile looks like this (uniprocessor machine):
Flat profile:

Each sample counts as 1 samples. %   cumulative   self              self     total           time   samples   samples
calls  T1/call  T1/call  name     9.44  10753.00 10753.00                             pg_fsync (I'd
 
attribute this to the slow disk in the machine -- scale 12 yields a lot
of tuples.) 6.63  18303.01  7550.00                             s_lock_sleep 6.56  25773.01  7470.00
        s_lock 5.88  32473.01  6700.00                             heapgettup 5.28  38487.02  6014.00
        
 
HeapTupleSatisfiesSnapshot 4.83  43995.02  5508.00                             hash_destroy 2.77  47156.02  3161.00
                       load_file 1.90  49322.02  2166.00                             XLogInsert 1.86  51436.02  2114.00
                           _bt_compare 1.82  53514.02  2078.00                             AllocSetAlloc 1.72  55473.02
1959.00                             LockBuffer 1.50  57180.02  1707.00                             init_ps_display 1.40
58775.03  1595.00                            
 
DirectFunctionCall9 1.26  60211.03  1436.00                             hash_search 1.14  61511.03  1300.00
               GetSnapshotData 1.11  62780.03  1269.00                             SpinAcquire 1.10  64028.03  1248.00
                          LockAcquire 1.04  70148.03  1190.00                             heap_fetch 0.91  71182.03
1034.00                            _bt_orderkeys 0.89  72201.03  1019.00                             LockRelease 0.75
73058.03  857.00                            
 
InitBufferPoolAccess
.
.
.

I reran the benchmarks on the SMP machine with a scale of 12 instead of
2. The numbers still show a clear performance drop off at approximately
100 clients, albeit not as sharp. (But still quite pronounced.) In terms
of raw performance, the numbers are comparable. The scale factor
certainly helped -- but it still seems that we might have a problem
here.

Thoughts?

Neil

-- 
Neil Padgett
Red Hat Canada Ltd.                       E-Mail:  npadgett@redhat.com
2323 Yonge Street, Suite #300, 
Toronto, ON  M4P 2C9


pgsql-hackers by date:

Previous
From: Barry Lind
Date:
Subject: Re: Abort transaction on duplicate key error
Next
From: "Mike Rogers"
Date:
Subject: Re: [PHP] [BUGS] PostgreSQL / PHP Overrun Error