Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date
Msg-id 246CDA37-6A93-4F60-9F48-F0B43DC06AC4@phlo.org
Whole thread Raw
In response to Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Sergey Koposov <koposov@ast.cam.ac.uk>)
Responses Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
List pgsql-hackers
On May31, 2012, at 01:16 , Sergey Koposov wrote:
> On Wed, 30 May 2012, Florian Pflug wrote:
>>
>> I wonder if the huge variance could be caused by non-uniform synchronization costs across different cores. That's
notall that unlikely, because at least some cache levels (L2 and/or L3, I think) are usually shared between all cores
ona single die. Thus, a cache bouncing line between cores on the same die might very well be faster then it bouncing
betweencores on different dies. 
>>
>> On linux, you can use the taskset command to explicitly assign processes to cores. The easiest way to check if that
makesa difference is to assign one core for each connection to the postmaster before launching your test. Assuming that
cpuassignment are inherited to child processes, that should then spread your backends out over exactly the cores you
specify.
>
> Wow, thanks! This seems to be working to some extend. I've found that distributing each thread x ( 0<x<7) to the cpu
1+3*x
> (reminder, that i have HT disabled and in total I have 4 cpus with 6 proper cores each) gives quite good results. And
aftera few runs, I seem to be getting a more or less stable results for the multiple threads, with the performance of
multithreadedruns going from 6 to 11 seconds for various threads. (another reminder is that 5-6 seconds  is roughly the
timingof a my queries running in a single  thread). 

Wait, so performance *increased* by spreading the backends out over as many dies as possible, not by using as few as
possible?That'd 
be exactly the opposite of what I'd have expected. (I'm assuming that cores on one die have ascending ids on linux. If
youcould post the contents of /proc/cpuinfo, we could verify that) 

> So to some extend one can say that the problem is partially solved (i.e. it is probably understood)

Not quite, I think. We still don't really know why there's that much spinlock contention AFAICS. But what we've learned
isthat the actual 
spinning on a contested lock is only part of the problem. The cache-line bouncing caused by all those lock acquisition
isthe other part, and it's pretty expensive too - otherwise, moving the backends around wouldn't have helped. 

> But the question now is whether there is a *PG* problem here or not, or is it Intel's or Linux's problem ?

Neither Intel nor Linux can do much about this, I fear. Synchronization will always be expensive, and the more so the
largerthe number of cores. Linux could maybe pick a better process to core assignment, but it probably won't be able to
pickthe optimal one for every workload. So unfortunately, this is a postgres problem I'd say. 

> Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as
demonstrateda while ago by just cat'ting the files in multiple threads). 

Yup, we'll have to figure out a way to reduce the locking overhead. 9.2 already scales much better to a large number of
coresthan previous versions did, but your test case shows that there's still room for improvement. 

best regards,
Florian Pflug



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: WalSndWakeup() and synchronous_commit=off
Next
From: Jeff Janes
Date:
Subject: Re: Figuring out shared buffer pressure