Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile - Mailing list pgsql-hackers
From | Florian Pflug |
---|---|
Subject | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile |
Date | |
Msg-id | 246CDA37-6A93-4F60-9F48-F0B43DC06AC4@phlo.org Whole thread Raw |
In response to | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile (Sergey Koposov <koposov@ast.cam.ac.uk>) |
Responses |
Re: 9.2beta1, parallel queries, ReleasePredicateLocks,
CheckForSerializableConflictIn in the oprofile
|
List | pgsql-hackers |
On May31, 2012, at 01:16 , Sergey Koposov wrote: > On Wed, 30 May 2012, Florian Pflug wrote: >> >> I wonder if the huge variance could be caused by non-uniform synchronization costs across different cores. That's notall that unlikely, because at least some cache levels (L2 and/or L3, I think) are usually shared between all cores ona single die. Thus, a cache bouncing line between cores on the same die might very well be faster then it bouncing betweencores on different dies. >> >> On linux, you can use the taskset command to explicitly assign processes to cores. The easiest way to check if that makesa difference is to assign one core for each connection to the postmaster before launching your test. Assuming that cpuassignment are inherited to child processes, that should then spread your backends out over exactly the cores you specify. > > Wow, thanks! This seems to be working to some extend. I've found that distributing each thread x ( 0<x<7) to the cpu 1+3*x > (reminder, that i have HT disabled and in total I have 4 cpus with 6 proper cores each) gives quite good results. And aftera few runs, I seem to be getting a more or less stable results for the multiple threads, with the performance of multithreadedruns going from 6 to 11 seconds for various threads. (another reminder is that 5-6 seconds is roughly the timingof a my queries running in a single thread). Wait, so performance *increased* by spreading the backends out over as many dies as possible, not by using as few as possible?That'd be exactly the opposite of what I'd have expected. (I'm assuming that cores on one die have ascending ids on linux. If youcould post the contents of /proc/cpuinfo, we could verify that) > So to some extend one can say that the problem is partially solved (i.e. it is probably understood) Not quite, I think. We still don't really know why there's that much spinlock contention AFAICS. But what we've learned isthat the actual spinning on a contested lock is only part of the problem. The cache-line bouncing caused by all those lock acquisition isthe other part, and it's pretty expensive too - otherwise, moving the backends around wouldn't have helped. > But the question now is whether there is a *PG* problem here or not, or is it Intel's or Linux's problem ? Neither Intel nor Linux can do much about this, I fear. Synchronization will always be expensive, and the more so the largerthe number of cores. Linux could maybe pick a better process to core assignment, but it probably won't be able to pickthe optimal one for every workload. So unfortunately, this is a postgres problem I'd say. > Because still the slowdown was caused by locking. If there wouldn't be locking there wouldn't be any problems (as demonstrateda while ago by just cat'ting the files in multiple threads). Yup, we'll have to figure out a way to reduce the locking overhead. 9.2 already scales much better to a large number of coresthan previous versions did, but your test case shows that there's still room for improvement. best regards, Florian Pflug
pgsql-hackers by date: