Thread: Hmmm... why does pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?

Hi everyone,

I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:
https://github.com/gbb/par_psql  ] 

Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of parallelisation
inthe last 12 months. 

Basically:

- I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully
configuredkernel/postgresql.conf for high performance. 

- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance
improvement.

- Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.

- HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not
parallelisewell, even when they are independently defined functions, or accessing tables in a read-only way. They hit a
limitof 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3)
regardlessof how many CPU cores I throw at them. This is about 6 times slower than I'm expecting.  


I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or
incurringsome huge frictional costs. Whether I use independently defined functions, independent source tables,
independentoutput tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be
WAL/synchronisationrelated, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged
tablesfor output.  

Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md

I'm wondering what I'm missing here. Any ideas?

Graeme.

>
>Hi everyone,
>
>I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:
https://github.com/gbb/par_psql  ] 
>
>Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of
parallelisationin the last 12 months. 
>
>Basically:
>
>- I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully
configuredkernel/postgresql.conf for high performance. 
>
>- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance
improvement.
>
>- Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
>
>- HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not
parallelisewell, even when they are independently defined functions, or accessing tables in a read-only way. They hit a
limitof 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3)
regardlessof how many CPU cores I throw at them. This is about 6 times slower than I'm expecting. 
>
>
>I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or
incurringsome huge frictional costs. Whether I use independently defined functions, independent source tables,
independentoutput tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be
WAL/synchronisationrelated, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged
tablesfor output. 
>
>Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
>
>I'm wondering what I'm missing here. Any ideas?
>
>Graeme.
>

auto explain might help giving some insight in what's going on:
http://www.postgresql.org/docs/9.4/static/auto-explain.html

Regards,
Marc Mamin

On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell <graeme.bell@nibio.no> wrote:
> Hi everyone,
>
> I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:
https://github.com/gbb/par_psql  ] 
>
> Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of
parallelisationin the last 12 months. 
>
> Basically:
>
> - I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully
configuredkernel/postgresql.conf for high performance. 
>
> - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance
improvement.
>
> - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
>
> - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not
parallelisewell, even when they are independently defined functions, or accessing tables in a read-only way. They hit a
limitof 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3)
regardlessof how many CPU cores I throw at them. This is about 6 times slower than I'm expecting. 
>
> I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or
incurringsome huge frictional costs. Whether I use independently defined functions, independent source tables,
independentoutput tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be
WAL/synchronisationrelated, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged
tablesfor output. 
>
> Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
>
> I'm wondering what I'm missing here. Any ideas?

I'm not necessarily seeing your results.   via pgbench,

mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 658833
latency average: 0.091 ms
tps = 10980.538470 (including connections establishing)
tps = 10980.994547 (excluding connections establishing)
mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 60 s
number of transactions actually processed: 2847631
latency average: 0.084 ms
tps = 47460.430447 (including connections establishing)
tps = 47463.702074 (excluding connections establishing)

b.sql:
select f();

f():
create or replace function f() returns int as $$ begin return 1; end;
$$ language plpgsql;

the results are pretty volatile even with a 60s run, but I'm clearly
not capped at 2.5x parallelization (my box is 4 core).  It would help
if you disclosed the function body you're benchmarking.   If the
problem is indeed on the sever, the next step I think is to profile
the code and look for locking issues.

merlin


Hi Merlin,

Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in a
pl/pgsqlenvironment here. You're just measuring whether postgres can parallelise entering that environment and get back
out.Don't get me wrong - it's great that this scales well because it affects situations where you have lots of calls to
trivialfunctions.  
However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in
postgisor similar.  

If you re-read my previous email or look at par_psql  (http://parpsql.com) and look at the benchmarks there you'll
maybesee more about what I'm talking about. 

To clear up the issue I build a little test harness around your comment below.
If anyone was wondering if it's par_psql itself that causes bad scaling in postgres.
The answer is clearly no. :-)

What I found this evening is that there are several problems here. I did some testing here using a machine with 16
physicalcores and lots of memory/IO.  

- Using a table as a source of input rather than a fixed parameter e.g. 'select col1... ' vs. 'select 3'. Please note I
amnot talking about poor performance, I am talking about poor scaling of performance to multicore. There should be no
reasonfor this when read-locks are being taken on the table, and no reason for this when it is combined with e.g. a
bunchof pl/pgsql work in a function. However the impact of this problem is only seen above 8 cores where performance
crashes. 

- Using pl/pgsql itself intensively (e.g. anything non-trivial) causes horrifically bad scaling above 2 cores on the
systemsI've tested and performance crashes very hard soon after. This matches what I've seen elsewhere in big projects
andin par_psql's tests.  

Of course, it could be some wacky postgresql.conf setting (I doubt it here), so I'd be glad if others could give it a
try.If you're bored, set the time to 5s and run, from testing I can tell you it shouldn't alter the results.  

The repo will be up in around 30 minutes time on http://github.com/gbb/ppppt, and I'm going to submit it as a bug to
thepg bugs list.  

Graeme.


On 06 Jul 2015, at 18:40, Merlin Moncure <mmoncure@gmail.com> wrote:

> On Fri, Jul 3, 2015 at 9:48 AM, Graeme B. Bell <graeme.bell@nibio.no> wrote:
>> Hi everyone,
>>
>> I've written a new open source tool for easily parallelising SQL scripts in postgres.   [obligatory plug:
https://github.com/gbb/par_psql  ] 
>>
>> Using it, I'm seeing a problem that I've also seen in other postgres projects involving high degrees of
parallelisationin the last 12 months. 
>>
>> Basically:
>>
>> - I have machines here with up to 16 CPU cores and 128GB memory, very fast SSDs and controller etc, carefully
configuredkernel/postgresql.conf for high performance. 
>>
>> - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance
improvement.
>>
>> - Non-DB stuff like GDAL, python etc. parallelise nearly perfectly.
>>
>> - HOWEVER calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not
parallelisewell, even when they are independently defined functions, or accessing tables in a read-only way. They hit a
limitof 2.5x performance improvement relative to single-CPU performance (pg9.4) and merely 2x performance (pg9.3)
regardlessof how many CPU cores I throw at them. This is about 6 times slower than I'm expecting. 
>>
>> I can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or
incurringsome huge frictional costs. Whether I use independently defined functions, independent source tables,
independentoutput tables, makes no difference whatsoever, so it doesn't feel 'lock-related'. It also doesn't seem to be
WAL/synchronisationrelated, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged
tablesfor output. 
>>
>> Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
>>
>> I'm wondering what I'm missing here. Any ideas?
>
> I'm not necessarily seeing your results.   via pgbench,
>
> mmoncure@mernix2 11:34 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -f b.sql
> transaction type: Custom query
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> duration: 60 s
> number of transactions actually processed: 658833
> latency average: 0.091 ms
> tps = 10980.538470 (including connections establishing)
> tps = 10980.994547 (excluding connections establishing)
> mmoncure@mernix2 11:35 AM ~$ ~/pgdev/bin/pgbench -n -T 60 -c4 -j4 -f b.sql
> transaction type: Custom query
> scaling factor: 1
> query mode: simple
> number of clients: 4
> number of threads: 4
> duration: 60 s
> number of transactions actually processed: 2847631
> latency average: 0.084 ms
> tps = 47460.430447 (including connections establishing)
> tps = 47463.702074 (excluding connections establishing)
>
> b.sql:
> select f();
>
> f():
> create or replace function f() returns int as $$ begin return 1; end;
> $$ language plpgsql;
>
> the results are pretty volatile even with a 60s run, but I'm clearly
> not capped at 2.5x parallelization (my box is 4 core).  It would help
> if you disclosed the function body you're benchmarking.   If the
> problem is indeed on the sever, the next step I think is to profile
> the code and look for locking issues.
>
> merlin



On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell <graeme.bell@nibio.no> wrote:
>
> Hi Merlin,
>
> Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in a
pl/pgsqlenvironment here. You're just measuring whether postgres can parallelise entering that environment and get back
out.Don't get me wrong - it's great that this scales well because it affects situations where you have lots of calls to
trivialfunctions. 
> However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in
postgisor similar. 

Maybe so.  But it will be a lot easier for me (and others on this)
list if you submit a self contained test case that runs via pgbench.
From there it's a simple matter of a perf top and other standard
locking diagnostic tests and also rules out any suspicion of 3rd party
issues.  This will also get better feedback on -bugs.

merlin


On 07 Jul 2015, at 22:52, Merlin Moncure <mmoncure@gmail.com> wrote:

> On Tue, Jul 7, 2015 at 3:33 PM, Graeme B. Bell <graeme.bell@nibio.no> wrote:
>>
>> Hi Merlin,
>>
>> Long story short - thanks for the reply, but you're not measuring anything about the parallelism of code running in
apl/pgsql environment here. You're just measuring whether postgres can parallelise entering that environment and get
backout. Don't get me wrong - it's great that this scales well because it affects situations where you have lots of
callsto trivial functions. 
>> However it's not the problem I'm talking about. I mean 'real' pl'pgsql functions. e.g. things that you might find in
postgisor similar. 
>
> Maybe so.  But it will be a lot easier for me (and others on this)
> list if you submit a self contained test case that runs via pgbench.


Hi Merlin,

I'm guessing you are maybe pressed for time at the moment because I already clearly included this on the last email, as
wellas the links to the alternative benchmarks with the same problem I referred to on both of my last emails which are
alsotrivial to drop into pgbench (cut/paste).  

e.g. did you see these parts of my previous email

"To clear up the issue I build a little test harness around your comment below."
"http://github.com/gbb/ppppt"

Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 different problems, all of it is
clearlydocumented.  

I haven't used perf with pgbench before, and I can't run any code today.
If you're interested in this but short on time, maybe you can glance at the repo above and just add 'perf' at the
appropriatepoint in the rbuild wrapper. 

Graeme.



On 2015-07-08 11:13:04 +0000, Graeme B. Bell wrote:
> I'm guessing you are maybe pressed for time at the moment because I
> already clearly included this on the last email, as well as the links
> to the alternative benchmarks with the same problem I referred to on
> both of my last emails which are also trivial to drop into pgbench
> (cut/paste).

You realize that you want something here, not Merlin, right?

> e.g. did you see these parts of my previous email
>
> "To clear up the issue I build a little test harness around your comment below."
> "http://github.com/gbb/ppppt"

Well, that requires reviewing the source code of the run script and
such.



I think we shouldn't discuss this on two threads (-performance, -bugs),
that makes it hard to follow. Given Tom's more detailed answer I think
the -bugs thread already contains more pertinent information.


On 08 Jul 2015, at 13:20, Andres Freund <andres@anarazel.de> wrote:

> On 2015-07-08 11:13:04 +0000, Graeme B. Bell wrote:
>> I'm guessing you are maybe pressed for time at the moment because I
>> already clearly included this on the last email, as well as the links
>> to the alternative benchmarks with the same problem I referred to on
>> both of my last emails which are also trivial to drop into pgbench
>> (cut/paste).
>
> You realize that you want something here, not Merlin, right?

Hi Andreas,

My email was saying it's not helpful for anyone on the list for him to keep asking me to give him X and me to keep
sendingit.  Do you disagree with that idea? 

I tried to phrase my request politely, but perhaps I failed. If you have suggestions for better ways to say "I already
sentit, twice" more politely in this situation, I'd welcome them off list.  

He asked me to disclose the function body I was testing. I did that, *and* also disclosed the entire approach to the
benchmarktoo in a way that made it trivial for him or others to replicate the situation I'd found. I'm pretty sure you
shouldnot be discouraging this kind of thing in bug/performance reports.  

I get your point that when you're asking for other people to look at something with you, don't antagonise them.

I didn't intend it as antagonising and Merlin hasn't mailed me anything to say he was antagonised. I'm quite sure he's
capableof defending himself or communicating with me himself if he does feel antagonised by something. I hope we can
endthe discussion of that here? 

Merlin, if you were antagonised, sorry, I did not mean to antagonise you. I just wanted to just wanted make it clear
thatI'd sent you what you asked for, + more, and that I was surprised you hadn't noticed it.  

>> "To clear up the issue I build a little test harness around your comment below."
>> "http://github.com/gbb/ppppt"
>
> Well, that requires reviewing the source code of the run script and
> such.

No, of course it doesn't.  It appears that you didn't look at the repo or read my previous mail before you wrote this.

I do not wish to antagonise you either, so please go and look at the repo before you write the next reply.

"http://github.com/gbb/ppppt
Just pick any function you like, there are 6 there, and 3 of them demonstrate 2 different problems, all of it is
clearlydocumented." 

When you open up the repo, there are the tests
https://github.com/gbb/ppppt/tree/master/tests

You don't need to review any code from the run script. The functions are there as isolated files and what they are
intendedto demonstrate is clearly described with text and graphics. I could see your point if I had mailed out some
giantscript with a bunch of SQL calls embedded in its guts, but that's the opposite of what I did here.   

Did you find it difficult to navigate the repo structure (2 folders, a few files)? If so please let me know off-list
whatwas difficult and I will see if I can improve it.  

> I think we shouldn't discuss this on two threads (-performance, -bugs),
> that makes it hard to follow. Given Tom's more detailed answer I think
> the -bugs thread already contains more pertinent information.

I don't necessarily disagree with this idea, but...

How many people concerned with performance are following the -bugs list? How much space is there for discussion of this
on-bugs? Since only working solutions for this performance problem so far are all user-side rather than commiter-side,
whywould you want to restrict that information to a commiter-side list? 

It has developed this way because I noticed it as a performance issue first, then decided to report it as a potential
bug.

Perhaps it would be useful to keep the discussion separate as the -commiter side aspects (how to fix this at the server
level)and -user side (what you can do to improve performance right now).  I will defer to general opinion on this in my
follow-upposts.  

Graeme.

On 2015-07-09 10:30:35 +0000, Graeme B. Bell wrote:
> > Well, that requires reviewing the source code of the run script and
> > such.
>
> No, of course it doesn't.  It appears that you didn't look at the repo or read my previous mail before you wrote
this. 

FFS, I *ran* some of the tests and reported on results. With you in CC.

What I mean is that I don't just run random code from some random github
repository.

> I do not wish to antagonise you either, so please go and look at the
> repo before you write the next reply.

Over and out.


>> No, of course it doesn't.  It appears that you didn't look at the repo or read my previous mail before you wrote
this. 
>
> FFS, I *ran* some of the tests and reported on results. With you in CC.

Just checked back. So you did. I'm sorry, I made the mistake I accused you of.

But... why then did you say I hadn't provided him with individual functions, when you've seen the repo yourself? I
don'tunderstand. You knew they're there. 

> What I mean is that I don't just run random code from some random github
> repository.

Sure, but surely that's not an issue when the SQL functions are also seperately provided and clearly labelled in the
repo?

Do you feel there is a difference about the trustworthiness of isolated files containing an SQL function presented in a
githubrepo, and SQL functions presented in an email? 

I am not sure I can agree with that idea, I think they are both just SQL functions. The difference is that one also
offersyou a bit more if you want to check/try it. 

> I do not wish to antagonise you either, so please go and look at the
>> repo before you write the next reply.
>
> Over and out.

Seems there has been a misunderstanding here and I feel I'm still missing something in what you're saying. Sorry
Andres.Let's just forget this. I don't think we disagree especially on this and I am not looking to make an enemy here. 

Also, thanks for running the benchmarks to get some numbers.

Graeme.