Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile - Mailing list pgsql-hackers

From Sergey Koposov
Subject Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date
Msg-id alpine.LRH.2.02.1205242008390.14366@calx046.ast.cam.ac.uk
Whole thread Raw
In response to Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, 24 May 2012, Robert Haas wrote:

> As you can see, raw performance isn't much worse with the larger data
> sets, but scalability at high connection counts is severely degraded
> once the working set no longer fits in shared_buffers.

Actually the problem persits even when I trim the dataset size to be within
the shared_buffers.

Here is the dump (0.5 gig in size, tested with shared_buffers=10G,
work_mem=500Mb):
http://www.ast.cam.ac.uk/~koposov/files/dump.gz
And I attach the script

For my toy dataset the performance of a single thread goes down 
from ~6.4 to 18 seconds (~ 3 times worse),

And actually while running the script repeatedly on my main machine, for 
some reason I saw  some variation in terms of how much threaded execution 
is slower than a single thread.

Now I see 25 seconds for multi threaded run vs the same ~ 6 second for a 
single thread.

The oprofile shows 782355   21.5269  s_lock  782355   100.000  s_lock [self]
-------------------------------------------------------------------------------
709801   19.5305  PinBuffer  709801   100.000  PinBuffer [self]
-------------------------------------------------------------------------------
326457    8.9826  LWLockAcquire  326457   100.000  LWLockAcquire [self]
-------------------------------------------------------------------------------
309437    8.5143  UnpinBuffer  309437   100.000  UnpinBuffer [self]
-------------------------------------------------------------------------------
252972    6.9606  ReadBuffer_common  252972   100.000  ReadBuffer_common [self]
-------------------------------------------------------------------------------
201558    5.5460  LockBuffer  201558   100.000  LockBuffer [self]
------------------------------------------------------------

It is interesting that On another machine with much smaller shared memory 
(3G), smaller RAM (12G),  smaller number of cpus  and PG 9.1 running I was 
getting consistently ~ 7.2 vs 4.5 sec (for multi vs single thread)

PS Just in case the CPU on the main machine I'm testing is Xeon(R) CPU E7- 
4807 (the total number of real cores is 24)





*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Next
From: Tom Lane
Date:
Subject: Re: Backends stalled in 'startup' state: index corruption