Thread: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
Hi everyone, I've written a new open source tool for easily parallelising SQL scripts in postgres. [obligatory plug: https://github.com/gbb/par_psql ] Using it, I'm seeing a problem I've seen in other postgres projects involving parallelisation in the last 12 months. Basically: - I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conffor high performance. - Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement. - Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, evenwhen they are independent or accessing tables in a read-only way. They hit a limit at 2.5x performance improvement relativeto single-CPU performance (pg9.4) and 2x performance (pg9.3). This is about 6 times slower than I'm expecting. - Can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurringsome huge frictional costs. Whether I use independently defined functions, independent source tables, independentoutput tables, makes no difference whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be WAL/synchronisationrelated, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables. Curious? Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md Wondering what I'm missing here. Any ideas? Graeme.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Craig James
Date:
On Thu, Jul 2, 2015 at 9:15 AM, Graeme B. Bell <graeme.bell@nibio.no> wrote:
Hi everyone,
I've written a new open source tool for easily parallelising SQL scripts in postgres. [obligatory plug: https://github.com/gbb/par_psql ]
Using it, I'm seeing a problem I've seen in other postgres projects involving parallelisation in the last 12 months.
Basically:
- I have machines here with up to 16 CPUs and 128GB memory, very fast SSDs and controller etc, carefully configured kernel/postgresql.conf for high performance.
- Ordinary queries parallelise nearly perfectly (e.g. SELECT some_stuff ...), e.g. almost up to 16x performance improvement.
- Calls to CPU-intensive user-defined pl/pgsql functions (e.g. SELECT myfunction(some_stuff)) do not parallelise well, even when they are independent or accessing tables in a read-only way. They hit a limit at 2.5x performance improvement relative to single-CPU performance (pg9.4) and 2x performance (pg9.3). This is about 6 times slower than I'm expecting.
- Can't see what would be locking. It seems like it's the pl/pgsql environment itself that is somehow locking or incurring some huge frictional costs. Whether I use independently defined functions, independent source tables, independent output tables, makes no difference whatsoever, so it doesn't feel 'locky'. It also doesn't seem to be WAL/synchronisation related, as the machines I'm using can hit absurdly high pgbench rates, and I'm using unlogged tables.
Curious? Take a quick peek here: https://github.com/gbb/par_psql/blob/master/BENCHMARKS.md
Wondering what I'm missing here. Any ideas?
No ideas, but I ran into the same thing. I have a set of C/C++ functions that put some chemistry calculations into Postgres as extensions (things like, "calculate the molecular weight of this molecule"). As SQL functions, the whole thing bogged down, and we never got the scalability we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at the same time, even with dozens of clients.
When I moved these same functions into an Apache fast-CGI HTTP service (exact same code, same network overhead), I could easily scale up and use the full 100% of all eight CPUs.
I have no idea why, and never investigated further. The convenience of having the functions in SQL wasn't that important.
Craig
Graeme.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Joshua D. Drake"
Date:
On 07/07/2015 08:05 PM, Craig James wrote: > > > No ideas, but I ran into the same thing. I have a set of C/C++ functions > that put some chemistry calculations into Postgres as extensions (things > like, "calculate the molecular weight of this molecule"). As SQL > functions, the whole thing bogged down, and we never got the scalability > we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at > the same time, even with dozens of clients. > > When I moved these same functions into an Apache fast-CGI HTTP service > (exact same code, same network overhead), I could easily scale up and > use the full 100% of all eight CPUs. > > I have no idea why, and never investigated further. The convenience of > having the functions in SQL wasn't that important. I admit that I haven't read this whole thread but: Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource. With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
> On 07/07/2015 08:05 PM, Craig James wrote: >> >> >> No ideas, but I ran into the same thing. I have a set of C/C++ functions >> that put some chemistry calculations into Postgres as extensions (things >> like, "calculate the molecular weight of this molecule"). As SQL >> functions, the whole thing bogged down, and we never got the scalability >> we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at >> the same time, even with dozens of clients. Hi all, The sample code / results were put up last night at http://github.com/gbb/ppppt Craig's problem sounds similar to my own, assuming he means running C indirectly via SQL vs running C more directly. Lots of parallel connections to postgres but maximum 2 CPUs of scaling (and it gets worse, as you try to run more things). Tom Lane has posted an interesting comment over on the bugs list which identies a likely source at least one of the problems,maybe both. It seems to be linked to internal locking inside postgres (which makes sense, given the results - both problems feel 'lock-y'). Also, he mentions a workaround for some functions that scales to 8-way apparently. http://www.postgresql.org/message-id/31265.1436317984@sss.pgh.pa.us I think it's potentially a big problem for CPU intensive postgres libraries like pgrouting, or perhaps the postgis & postgisraster functions, things like that. I don't know how well their functions are marked for e.g. immutability. Are there any postgis devs on this list? Graeme Bell
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Craig James
Date:
On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 07/07/2015 08:05 PM, Craig James wrote:
No ideas, but I ran into the same thing. I have a set of C/C++ functions
that put some chemistry calculations into Postgres as extensions (things
like, "calculate the molecular weight of this molecule"). As SQL
functions, the whole thing bogged down, and we never got the scalability
we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at
the same time, even with dozens of clients.
When I moved these same functions into an Apache fast-CGI HTTP service
(exact same code, same network overhead), I could easily scale up and
use the full 100% of all eight CPUs.
I have no idea why, and never investigated further. The convenience of
having the functions in SQL wasn't that important.
I admit that I haven't read this whole thread but:
Using Apache Fast-CGI, you are going to fork a process for each instance of the function being executed and that in turn will use all CPUs up to the max available resource.
With PostgreSQL, that isn't going to happen unless you are running (at least) 8 functions across 8 connections.
Well, right, which is why I mentioned "even with dozens of clients." Shouldn't that scale to at least all of the CPUs in use if the function is CPU intensive (which it is)?
Craig
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Joshua D. Drake"
Date:
On 07/08/2015 10:48 AM, Craig James wrote: > I admit that I haven't read this whole thread but: > > Using Apache Fast-CGI, you are going to fork a process for each > instance of the function being executed and that in turn will use > all CPUs up to the max available resource. > > With PostgreSQL, that isn't going to happen unless you are running > (at least) 8 functions across 8 connections. > > > Well, right, which is why I mentioned "even with dozens of clients." > Shouldn't that scale to at least all of the CPUs in use if the function > is CPU intensive (which it is)? In theory but that isn't PostgreSQL that does that, it will be the kernel scheduler. Although (and I am grasping at straws): I wonder if the execution is taking place outside of the backend proper or... are you using a pooler? JD > > Craig > > > > > JD > > -- > Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 > PostgreSQL Centered full stack support, consulting and development. > Announcing "I'm offended" is basically telling the world you can't > control your own emotions, so everyone else should do it for you. > > > > > -- > --------------------------------- > Craig A. James > Chief Technology Officer > eMolecules, Inc. > --------------------------------- -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Craig James
Date:
On Wed, Jul 8, 2015 at 10:52 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On 07/08/2015 10:48 AM, Craig James wrote:I admit that I haven't read this whole thread but:
Using Apache Fast-CGI, you are going to fork a process for each
instance of the function being executed and that in turn will use
all CPUs up to the max available resource.
With PostgreSQL, that isn't going to happen unless you are running
(at least) 8 functions across 8 connections.
Well, right, which is why I mentioned "even with dozens of clients."
Shouldn't that scale to at least all of the CPUs in use if the function
is CPU intensive (which it is)?
In theory but that isn't PostgreSQL that does that, it will be the kernel scheduler. Although (and I am grasping at straws):
I wonder if the execution is taking place outside of the backend proper or... are you using a pooler?
No pooler, and the functions were in an ordinary SQL extension .so library and loaded as
CREATE OR REPLACE FUNCTION funcname( ... ) returns ...
AS 'libxxx.so', 'funcname LANGUAGE c STRICT IMMUTABLE COST 10000;
Craig
JD
Craig
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.
--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------
--
Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Merlin Moncure
Date:
On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames@emolecules.com> wrote: > On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake <jd@commandprompt.com> > wrote: >> >> >> On 07/07/2015 08:05 PM, Craig James wrote: >>> >>> >>> >>> No ideas, but I ran into the same thing. I have a set of C/C++ functions >>> that put some chemistry calculations into Postgres as extensions (things >>> like, "calculate the molecular weight of this molecule"). As SQL >>> functions, the whole thing bogged down, and we never got the scalability >>> we needed. On our 8-CPU setup, we couldn't get more than 2 CPUs busy at >>> the same time, even with dozens of clients. >>> >>> When I moved these same functions into an Apache fast-CGI HTTP service >>> (exact same code, same network overhead), I could easily scale up and >>> use the full 100% of all eight CPUs. >>> >>> I have no idea why, and never investigated further. The convenience of >>> having the functions in SQL wasn't that important. >> >> >> I admit that I haven't read this whole thread but: >> >> Using Apache Fast-CGI, you are going to fork a process for each instance >> of the function being executed and that in turn will use all CPUs up to the >> max available resource. >> >> With PostgreSQL, that isn't going to happen unless you are running (at >> least) 8 functions across 8 connections. > > > Well, right, which is why I mentioned "even with dozens of clients." > Shouldn't that scale to at least all of the CPUs in use if the function is > CPU intensive (which it is)? only in the absence of inter-process locking and cache line bouncing. merlin
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Andres Freund
Date:
On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: > On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames@emolecules.com> wrote: > > On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake <jd@commandprompt.com> > >> Using Apache Fast-CGI, you are going to fork a process for each instance > >> of the function being executed and that in turn will use all CPUs up to the > >> max available resource. > >> > >> With PostgreSQL, that isn't going to happen unless you are running (at > >> least) 8 functions across 8 connections. > > > > > > Well, right, which is why I mentioned "even with dozens of clients." > > Shouldn't that scale to at least all of the CPUs in use if the function is > > CPU intensive (which it is)? > > only in the absence of inter-process locking and cache line bouncing. And addititionally memory bandwidth (shared between everything, even in the numa case), cross socket/bus bandwidth (absolutely performance critical in multi-socket configurations), cache capacity (shared between cores, and sometimes even sockets!).
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Craig James
Date:
On Wed, Jul 8, 2015 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote:
> On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames@emolecules.com> wrote:
> > On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake <jd@commandprompt.com>
> >> Using Apache Fast-CGI, you are going to fork a process for each instance
> >> of the function being executed and that in turn will use all CPUs up to the
> >> max available resource.
> >>
> >> With PostgreSQL, that isn't going to happen unless you are running (at
> >> least) 8 functions across 8 connections.
> >
> >
> > Well, right, which is why I mentioned "even with dozens of clients."
> > Shouldn't that scale to at least all of the CPUs in use if the function is
> > CPU intensive (which it is)?
>
> only in the absence of inter-process locking and cache line bouncing.
And addititionally memory bandwidth (shared between everything, even in
the numa case), cross socket/bus bandwidth (absolutely performance
critical in multi-socket configurations), cache capacity (shared between
cores, and sometimes even sockets!).
From my admittedly naive point of view, it's hard to see why any of this matters. I have functions that do purely CPU-intensive mathematical calculations ... you could imagine something like is_prime(N) that determines if N is a prime number. I have eight clients that connect to eight backends. Each client issues an SQL command like, "select is_prime(N)" where N is a simple number.
Are you saying that in order to calculate is_prime(N), all of that stuff (inter-process locking, memory bandwith, bus bandwidth, cache capacity, etc.) is even relevant? And if so, how is it that Postgres is so different from an Apache fast-CGI program that runs the exact same is_prime(N) calculation?
Just curious ... as I said, I've already implemented a different solution.
Craig
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
andres@anarazel.de (Andres Freund)
Date:
On 2015-07-08 15:38:24 -0700, Craig James wrote: > From my admittedly naive point of view, it's hard to see why any of this > matters. I have functions that do purely CPU-intensive mathematical > calculations ... you could imagine something like is_prime(N) that > determines if N is a prime number. I have eight clients that connect to > eight backends. Each client issues an SQL command like, "select > is_prime(N)" where N is a simple number. I mostly replied to Merlin's general point (additionally in the context of plpgsql). But I have a hard time seing that postgres would be the bottleneck for a is_prime() function (or something with similar characteristics) that's written in C where the average runtime is more than, say, a couple thousand cyles. I'd like to see a profile of that. Andres
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Tom Lane
Date:
andres@anarazel.de (Andres Freund) writes: > On 2015-07-08 15:38:24 -0700, Craig James wrote: >> From my admittedly naive point of view, it's hard to see why any of this >> matters. I have functions that do purely CPU-intensive mathematical >> calculations ... you could imagine something like is_prime(N) that >> determines if N is a prime number. I have eight clients that connect to >> eight backends. Each client issues an SQL command like, "select >> is_prime(N)" where N is a simple number. > I mostly replied to Merlin's general point (additionally in the context of > plpgsql). > But I have a hard time seing that postgres would be the bottleneck for a > is_prime() function (or something with similar characteristics) that's > written in C where the average runtime is more than, say, a couple > thousand cyles. I'd like to see a profile of that. But that was not the case that Graeme was complaining about. He's talking about simple-arithmetic-and-looping written in plpgsql, in a volatile function that is going to take a new snapshot for every statement, even if that's only "n := n+1". So it's going to spend a substantial fraction of its runtime banging on the ProcArray, and that doesn't scale. If you write your is_prime function purely in plpgsql, and don't bother to mark it nonvolatile, *it will not scale*. It'll be slow even in single-thread terms, but it'll be particularly bad if you're saturating a multicore machine with it. One of my Salesforce colleagues has been looking into ways that we could decide to skip the per-statement snapshot acquisition even in volatile functions, if we could be sure that a particular statement isn't going to do anything that would need a snapshot. Now, IMO that doesn't really do much for properly written plpgsql; but there's an awful lot of bad plpgsql code out there, and it can make a huge difference for that. regards, tom lane
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
On 08 Jul 2015, at 22:27, Andres Freund <andres@anarazel.de> wrote: > On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: >> On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames@emolecules.com> wrote: >>> >>> Well, right, which is why I mentioned "even with dozens of clients." >>> Shouldn't that scale to at least all of the CPUs in use if the function is >>> CPU intensive (which it is)? >> >> only in the absence of inter-process locking and cache line bouncing. > > And addititionally memory bandwidth (shared between everything, even in > the numa case), cross socket/bus bandwidth (absolutely performance > critical in multi-socket configurations), cache capacity (shared between > cores, and sometimes even sockets!). 1. Note for future readers - it's also worth noting that depending on the operation, and on your hardware, you may have less"CPU cores" than you think to parallelise upon. 1a. For example AMD CPUs list the number of integer cores (e.g. 16), but there is actually only half as many cores availablefor floating point work (8). So if your functions need to use floating point, your scaling will suffer badly onFP functions. https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture) "In terms of hardware complexity and functionality, this "module" is equal to a dual-core processor in its integer power,and to a single-core processor in its floating-point power: for each two integer cores, there is one floating-pointcore." 1b. Or, if you have hyper-threading enabled on an Intel CPU, you may think you have e.g. 8 cores, but if all the threadsare running the same type of operation repeatedly, it won't be possible for the hyper-threading to work well and you'llonly get 4 in practice. Maybe less due to overheads. Or, if your work is continuallly going to main memory for data(e.g. limited by the memory bus), it will run at 4-core speed, because the cores have to share the same memory bus. Hyper-threading depends on the 2 logical cores being asked to perform two different types of tasks at once (each having relativelylower demands on memory). "When execution resources would not be used by the current task in a processor without hyper-threading, and especially whenthe processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduledtask." https://en.wikipedia.org/wiki/Hyper-threading https://en.wikipedia.org/wiki/Superscalar 2. Keep in mind also when benchmarking that it's normal to see an small drop-off when you hit the maximum number of coresfor your system. After all, the O/S and the benchmark program and anything else you have running will need a core or two.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Andres Freund
Date:
On 2015-07-08 23:38:38 -0400, Tom Lane wrote: > andres@anarazel.de (Andres Freund) writes: > > On 2015-07-08 15:38:24 -0700, Craig James wrote: > >> From my admittedly naive point of view, it's hard to see why any of this > >> matters. I have functions that do purely CPU-intensive mathematical > >> calculations ... you could imagine something like is_prime(N) that > >> determines if N is a prime number. I have eight clients that connect to > >> eight backends. Each client issues an SQL command like, "select > >> is_prime(N)" where N is a simple number. > > > I mostly replied to Merlin's general point (additionally in the context of > > plpgsql). > > > But I have a hard time seing that postgres would be the bottleneck for a > > is_prime() function (or something with similar characteristics) that's > > written in C where the average runtime is more than, say, a couple > > thousand cyles. I'd like to see a profile of that. > > But that was not the case that Graeme was complaining about. No, Craig was complaining about that case... > One of my Salesforce colleagues has been looking into ways that we could > decide to skip the per-statement snapshot acquisition even in volatile > functions, if we could be sure that a particular statement isn't going to > do anything that would need a snapshot. Yea, I actually commented about that on IRC as well. I was thinking about actually continuing to get a snapshot, but mark it as 'complete on usage'. I.e. only call GetSnapshotData() only when the snapshot is used to decide about visibility. We probably can't do that in the toplevel visibility case because it'll probably have noticeable semantic effects, but ISTM it should be doable for the volatile function using spi case.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
On 09 Jul 2015, at 05:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > If you > write your is_prime function purely in plpgsql, and don't bother to mark > it nonvolatile, *it will not scale*. > much for properly written plpgsql; but there's an awful lot of bad plpgsql > code out there, and it can make a huge difference for that. Hi Tom, I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That is putting the blame on programmers. Clearly,if there is no end of code out there that isn't right in this regard, there's something very wrong in the projectdocumentation. 1. I have been writing pl/pgsql on and off for a couple of years now and I've read quite a bit of the postgres doumentation,but I don't recall seeing a clear statement telling me I should mark pl/pgsql functions nonvolatile whereverpossible or throw all performance and scalability out the window. I'm sure there may be a line hidden somewhere inthe docs, but judging from the impact it has in practice, this seems like a very fundamental concept that should be repeatedlyand clearly marked in the docs. 2. Furthermore, I have never come across anything in the documentation that made it clear to me that any pl/pgsql functionI write will, by default, be taking out locks for every single statement in the code. I've written code in I dunno,maybe 15-20 different languages in my life, and I can't think of another language offhand that does that by default.From the reactions on this thread to this benchmark and the par_psql benchmarks, it doesn't seem that it was evenimmediately obvious to many postgres enthusiasts and developers. 3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. But I really don't think it helps to frame this as laziness or "bad" in any other sense of the word e.g. 'clumsy'. Let's look at the postgresql documentation for some examples of 'bad' and lazy code: http://www.postgresql.org/docs/9.3/static/plpgsql-structure.html http://www.postgresql.org/docs/9.3/static/plpgsql-declarations.html There are about 13 functions on that page. How many functions on that page make use non-volatile or immutable wherever it would be appropriate? zero. or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html zero or this one: http://www.postgresql.org/docs/9.3/static/plpgsql-cursors.html#PLPGSQL-CURSOR-USING zero The reason 90% of people out there are 'not bothering' and 'writing bad code' is because **99% of the postgresql documentationteaches them to do it that way**. So when you talk about other people 'not bothering' to do things - who is really at fault here what for what you see as endemic'bad' or 'lazy' code? Is it the new postgres programmers, or the people that taught them with "bad" examples consistentlythroughout the *entire project documentation*, starting from the very first example? I think I'm going to raise this as a documentation bug. Graeme.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Merlin Moncure
Date:
On Wed, Jul 8, 2015 at 5:38 PM, Craig James <cjames@emolecules.com> wrote: > On Wed, Jul 8, 2015 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote: >> >> On 2015-07-08 13:46:53 -0500, Merlin Moncure wrote: >> > On Wed, Jul 8, 2015 at 12:48 PM, Craig James <cjames@emolecules.com> >> > wrote: >> > > On Tue, Jul 7, 2015 at 10:31 PM, Joshua D. Drake >> > > <jd@commandprompt.com> >> > >> Using Apache Fast-CGI, you are going to fork a process for each >> > >> instance >> > >> of the function being executed and that in turn will use all CPUs up >> > >> to the >> > >> max available resource. >> > >> >> > >> With PostgreSQL, that isn't going to happen unless you are running >> > >> (at >> > >> least) 8 functions across 8 connections. >> > > >> > > >> > > Well, right, which is why I mentioned "even with dozens of clients." >> > > Shouldn't that scale to at least all of the CPUs in use if the >> > > function is >> > > CPU intensive (which it is)? >> > >> > only in the absence of inter-process locking and cache line bouncing. >> >> And addititionally memory bandwidth (shared between everything, even in >> the numa case), cross socket/bus bandwidth (absolutely performance >> critical in multi-socket configurations), cache capacity (shared between >> cores, and sometimes even sockets!). > > > From my admittedly naive point of view, it's hard to see why any of this > matters. I have functions that do purely CPU-intensive mathematical > calculations ... you could imagine something like is_prime(N) that > determines if N is a prime number. I have eight clients that connect to > eight backends. Each client issues an SQL command like, "select is_prime(N)" > where N is a simple number. > > Are you saying that in order to calculate is_prime(N), all of that stuff > (inter-process locking, memory bandwith, bus bandwidth, cache capacity, > etc.) is even relevant? And if so, how is it that Postgres is so different > from an Apache fast-CGI program that runs the exact same is_prime(N) > calculation? > > Just curious ... as I said, I've already implemented a different solution. If your is_prime() was written in C and was written so that it did not utilize the database API, it should scale up quite nicely. This can be easily proved. On my quad core workstation, postgres=# select 12345! * 0; ?column? ────────── 0 (1 row) Time: 10435.554 ms ...which is heavily cpu bound, takes about 10 seconds. scaling out to 4 threads via: time ~/pgdev/bin/pgbench -n -t1 -c4 -f <(echo "select 12345! * 0;") yields: real 0m11.317s user 0m0.001s sys 0m0.005s ...I'll call that pretty good scaling. The reason why this scales so good is that the numeric code is all operating on local data structures and is not involving backend componentry with it's various attached complexity such as having to be checked for being visible to the current transaction. I submit that toy benchmarks like factoring or pi digits are not really good indicators of language scaling and performance because just about all real world code involves data structures, i/o, memory allocation, amateur coders, etc. Java tends to approach C in benchmark shootouts but woefully underperforms my expectations relative to C in code that does things that's actually useful (aside: if you think I'm knocking java, the situation is even worse with most other languages I come across). pl/pgsql is simply not optimized for that style of coding although if you know postgres you can start to tickle the limits of what's expected from the language. If that isn't working for you, pl/v8 strikes me as the best alternative due to it's performance and good integration with postgres data structures (in fact, I'd be arguing for it to be moved to core if the v8 dependency wasn't so capricious). Either way, I'll advocate any solution that allows you to code inside the database environment as opposed to the client side. merlin
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Merlin Moncure
Date:
On Thu, Jul 9, 2015 at 4:44 AM, Graeme B. Bell <graeme.bell@nibio.no> wrote: > On 09 Jul 2015, at 05:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> If you >> write your is_prime function purely in plpgsql, and don't bother to mark >> it nonvolatile, *it will not scale*. > >> much for properly written plpgsql; but there's an awful lot of bad plpgsql >> code out there, and it can make a huge difference for that. > > > Hi Tom, > > I object to phrases like 'don't bother to mark it' and 'bad plpgsql' here. That is putting the blame on programmers. Clearly,if there is no end of code out there that isn't right in this regard, there's something very wrong in the projectdocumentation. > > 1. I have been writing pl/pgsql on and off for a couple of years now and I've read quite a bit of the postgres doumentation,but I don't recall seeing a clear statement telling me I should mark pl/pgsql functions nonvolatile whereverpossible or throw all performance and scalability out the window. I'm sure there may be a line hidden somewhere inthe docs, but judging from the impact it has in practice, this seems like a very fundamental concept that should be repeatedlyand clearly marked in the docs. > > 2. Furthermore, I have never come across anything in the documentation that made it clear to me that any pl/pgsql functionI write will, by default, be taking out locks for every single statement in the code. I've written code in I dunno,maybe 15-20 different languages in my life, and I can't think of another language offhand that does that by default.From the reactions on this thread to this benchmark and the par_psql benchmarks, it doesn't seem that it was evenimmediately obvious to many postgres enthusiasts and developers. > > 3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. Particularly with regards documentation, a patch improving things is much more likely to improve the situation than griping. Also, conversation on this list gets recorded for posterity and google is remarkably good at matching people looking for problems with solutions. So, even in absence of a patch perhaps we've made the lives of future head-scratchers a little bit easier with this discussion. merlin
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Thomas Kellerer
Date:
Graeme B. Bell schrieb am 09.07.2015 um 11:44: > I don't recall seeing a clear statement telling me I should mark pl/pgsql > functions nonvolatile wherever possible or throw all performance and > scalability out the window. From: http://www.postgresql.org/docs/current/static/xfunc-volatility.html "For best optimization results, you should label your functions with the strictest volatility category that is valid for them."
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
On 09 Jul 2015, at 15:22, Thomas Kellerer <spam_eater@gmx.net> wrote: > Graeme B. Bell schrieb am 09.07.2015 um 11:44: >> I don't recall seeing a clear statement telling me I should mark pl/pgsql >> functions nonvolatile wherever possible or throw all performance and >> scalability out the window. > > From: http://www.postgresql.org/docs/current/static/xfunc-volatility.html > > "For best optimization results, you should label your functions > with the strictest volatility category that is valid for them." Hi Thomas, Thank you very much for the link. However, the point I was making wasn't that no sentence exists anywhere. My point was that I've read the docs more than anyoneelse in my institute and I was completely unaware of this. It also quite vague - if you hand that to a younger programmer in particular, how do they implement it in practice? Whenis it important to do it? If this one factor silently breaks multiprocessor scaling of pl/pgsql, and multiprocessingis the biggest trend in CPU processing of the last decade (comparing server CPUS of 2005 with 2015), thenwhy is this information not up front and clear? A second point to keep in mind that optimization and parallelisation/scalability are not always the same thing. For example, in one project I took a bunch of looped parallel UPDATEs on a set of 50 tables, and rewrote them so as to runthe loop all at once inside a pl/pgsql function. Crudely, I took out the table-level for loop and put it at row-levelinstead. I expected they'd execute much faster if UPDATEs were using data still in cache. Also, I would be updating without writingout WAL entries to disk repeatedly. It turns out the update per row ran much faster - as expected - when I used one table, but when I ran it in parallel on manytables, the performance was even worse than when I started. If you look at the benchmarks, you'll see that performancedrops through the floor at 8-16 cores. I think that was when I first noticed this bug/feature. [If anyone is curious, the way I solved that one in the end was to pre-calculate every possible way the tables might be updatedafter N loops of updates using Python, and import that as a lookup table into PG. It turns out that although we had10's of GB of data per table, there were only about 100,00 different types of situation, and only e.g. 80 iterations toconsider). Then I ran a single set of UPDATEs with no pl/pgsql. It was something like a 10000x performance improvement.] Graeme.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
>> >> 3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. > > Particularly with regards documentation, a patch improving things is > much more likely to improve the situation than griping. Also, > conversation on this list gets recorded for posterity and google is > remarkably good at matching people looking for problems with > solutions. So, even in absence of a patch perhaps we've made the > lives of future head-scratchers a little bit easier with this > discussion. I agree that patch>gripe, and about the google aspect. But nonetheless, a well-intentioned gripe is > ignorance of a problem. As mentioned earlier, I'm sick just now and will be back in hospital again tomorrow & monday, so a patch may be a littlebit much to ask from me here :-) It's a bit much even keeping up with the posts on the thread so far. I might try to fix the documentation a bit later, though as someone with no experience in marking up volatility on pl/pgsqlfunctions I doubt my efforts would be that great. I also have other OSS project contributions that need some attentionfirst. Re: the google effect. Are these mailing list archives mirrored anywhere, incidentally? For example, I notice we just losthttp:reddit.com/r/amd at the weekend, all the discussion of the last few years on that forum is out of reach. Graeme Bell
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
Merlin Moncure
Date:
On Thu, Jul 9, 2015 at 10:12 AM, Graeme B. Bell <graeme.bell@nibio.no> wrote: >>> >>> 3. I don't disagree that the benchmark code is objectively 'bad' in the sense that it is missing an important optimisation. >> >> Particularly with regards documentation, a patch improving things is >> much more likely to improve the situation than griping. Also, >> conversation on this list gets recorded for posterity and google is >> remarkably good at matching people looking for problems with >> solutions. So, even in absence of a patch perhaps we've made the >> lives of future head-scratchers a little bit easier with this >> discussion. > > I agree that patch>gripe, and about the google aspect. But nonetheless, a well-intentioned gripe is > ignorance of a problem. > > As mentioned earlier, I'm sick just now and will be back in hospital again tomorrow & monday, so a patch may be a littlebit much to ask from me here :-) It's a bit much even keeping up with the posts on the thread so far. > > I might try to fix the documentation a bit later, though as someone with no experience in marking up volatility on pl/pgsqlfunctions I doubt my efforts would be that great. I also have other OSS project contributions that need some attentionfirst. > > Re: the google effect. Are these mailing list archives mirrored anywhere, incidentally? For example, I notice we just losthttp:reddit.com/r/amd at the weekend, all the discussion of the last few years on that forum is out of reach. The community maintains it's own mailing list archives in postgresql.org. Short of an array of tactical nuclear strikes this is going to be preserved because it represents the history of the project and in many ways is as important as the source code itself. The archives are also mirrored by a number of high quality providers such as nabble (which tend to beat our archives in google rankings -- likely due to the improved interface). merlin
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
"Graeme B. Bell"
Date:
On 09 Jul 2015, at 17:42, Merlin Moncure <mmoncure@gmail.com> wrote: > The community maintains it's own mailing list archives in > postgresql.org. Short of an array of tactical nuclear strikes this is > going to be preserved Good to know, I've seen a lot of dead software projects throughout my life. But still - we will have to pray that Kim Jong Un never decides to become a MySQL contributor... :) Graeme.
Re: Hmmm... why does CPU-intensive pl/pgsql code parallelise so badly when queries parallelise fine? Anyone else seen this?
From
James Cloos
Date:
>>>>> "GBB" == Graeme B Bell <graeme.bell@nibio.no> writes: GBB> 1a. For example AMD CPUs list the number of integer cores (e.g. 16), GBB> but there is actually only half as many cores available for floating GBB> point work (8). So if your functions need to use floating point, your GBB> scaling will suffer badly on FP functions. That is half as many 256-bit float units; for scalar math and for 128-bit vector math each core gets a half of the float unit. Only for the 256-bit vector math do the schedulars have to compete for float unit access. -JimC -- James Cloos <cloos@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6