Thread: SQL functions vs. PL/PgSQL functions
Hi, everyone. I'm working with a client to try to optimize their use of PostgreSQL. They're running 8.3 on a Windows platform, packaged as part of a physical product that is delivered to customers. We're planning to upgrade to 9.0 at some point in the coming months, but this question is relevant for 8.3 (and perhaps beyond). All of the database-related logic for this application is in server-side functions, written in PL/PgSQL. That is, the application never issues a SELECT or INSERT; rather, it invokes a function with parameters, and the function handles the query. It's not unusual for a function to invoke one or more other PL/PgSQL functions as part of its execution. Since many of these PL/PgSQL functions are just acting as wrappers around queries, I thought that it would be a cheap speedup for us to change some of them to SQL functions, rather than PL/PgSQL. After all, PL/PgSQL is (I thought) interpreted, whereas SQL functions can be inlined and handled directly by the optimizer and such. We made the change to one or two functions, and were rather surprised to see the performance drop by quite a bit. My question is whether this is somehow to be expected. Under what conditions will SQL functions be slower than PL/PgSQL functions? Is there a heuristic that I can/should use to know this in advance? Does it matter if the SELECT being executed operates against a table, or a PL/PgSQL function? Thanks in advance for any insights everyone can offer. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
On 13/10/2010 3:30 PM, Reuven M. Lerner wrote: > My question is whether this is somehow to be expected. Under what > conditions will SQL functions be slower than PL/PgSQL functions? The main cases I can think of: - Where the SQL function is inlined (PL/PgSQL functions can't be inlined, some SQL functions can) and the inlining turns out to be a performance loss rather than a gain. - Where the PL/PgSQL function was constructing queries dynamically for EXECUTE ... USING, so each query contained its parameters directly. If converted to an SQL function (or a PL/PgSQL function using SELECT / PERFORM instead of EXECUTE ... USING) the planner will make more generic choices because it doesn't have stats on specific parameter values. These choices are sometimes not all that great. Beyond that, I'd have to wait to hear from someone who has more real knowledge than my hand-waving can provide. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
On Wed, Oct 13, 2010 at 3:30 AM, Reuven M. Lerner <reuven@lerner.co.il> wrote: > Hi, everyone. I'm working with a client to try to optimize their use of > PostgreSQL. They're running 8.3 on a Windows platform, packaged as part > of a physical product that is delivered to customers. > > We're planning to upgrade to 9.0 at some point in the coming months, but > this question is relevant for 8.3 (and perhaps beyond). > > All of the database-related logic for this application is in server-side > functions, written in PL/PgSQL. That is, the application never issues a > SELECT or INSERT; rather, it invokes a function with parameters, and the > function handles the query. It's not unusual for a function to invoke > one or more other PL/PgSQL functions as part of its execution. > > Since many of these PL/PgSQL functions are just acting as wrappers around > queries, I thought that it would be a cheap speedup for us to change some > of them to SQL functions, rather than PL/PgSQL. After all, PL/PgSQL is (I > thought) interpreted, whereas SQL functions can be inlined and handled > directly by the optimizer and such. > > We made the change to one or two functions, and were rather surprised to > see the performance drop by quite a bit. > > My question is whether this is somehow to be expected. Under what > conditions will SQL functions be slower than PL/PgSQL functions? Is there > a heuristic that I can/should use to know this in advance? Does it matter > if the SELECT being executed operates against a table, or a PL/PgSQL > function? > > Thanks in advance for any insights everyone can offer. *) SQL functions require you to use $n notation for input arguments vs the argument name. *) SQL functions are fairly transparent to planning/execution. They are re-planned every time they are run (as are views) *) simple SQL functions can be inlined, allowing for much smarter plans where they are called (especially if they are immutable/stable) *) SQL functions are much more forcefully validated when created. This is of course very nice, but can occasionally be a pain, if you want the function to apply to a search path other than the default search path. This forces me to disable body checking in particular cases. *) In the not so old days, SQL functions could be called in more conexts (select func() vs select * from func()). This is now changed though. *) SQL returning setof functions, can send RETURNING from insert/update to the output of the function. This is the ONLY way to do this at present (until we get wCTE) w/o involving the client. *) plpgsql functions are completely planned and that plan is held for the duration of the session, or until a invalidation event occurs (statistics driven, table dropping, etc). This adds overhead to first call but reduces overhead in subsequent calls since you don't have to re-plan. This also means you can't float the function over multiple search paths on the same connection (EVER, even if you DISCARD). This also means you have to be aware of temp table interactions w/plans if you are concerned about performance. *) plpgsql allows dynamic execution (can use to get around above), specific variable names, sane error handling, and all kinds of other wonderful things too numerous to mention. *) plpgsql simple expressions (like n:=n+1) can bypass SPI, and therefore run pretty quickly. both sql and plpgsql functions create a mvcc snapshot as soon as the function is entered. This can and will cause headaches if you are writing highly concurrent systems utilizing serializable transactions. (this is one of the biggest annoyances with a 100% pl interface to your db). when you make the jump to 9.0, you might want to check out libpqtypes if you are writing your client in C. it will greatly easy sending complex data to/from the database to receiving functions. certain other db interfaces can also do this, for example python has a very good database driver for postgres. merlin
"Reuven M. Lerner" <reuven@lerner.co.il> writes: > All of the database-related logic for this application is in server-side > functions, written in PL/PgSQL. That is, the application never issues a > SELECT or INSERT; rather, it invokes a function with parameters, and the > function handles the query. It's not unusual for a function to invoke > one or more other PL/PgSQL functions as part of its execution. > Since many of these PL/PgSQL functions are just acting as wrappers around > queries, I thought that it would be a cheap speedup for us to change some > of them to SQL functions, rather than PL/PgSQL. After all, PL/PgSQL is (I > thought) interpreted, whereas SQL functions can be inlined and handled > directly by the optimizer and such. > We made the change to one or two functions, and were rather surprised to > see the performance drop by quite a bit. > My question is whether this is somehow to be expected. It's not particularly surprising, especially not if your past development has tended to tune the app so that plpgsql works well. In the first place, SQL operations issued in plpgsql aren't somehow "interpreted" when everything else is "compiled". It's the same execution engine. It would be fair to speak of control logic in plpgsql as being interpreted; but since SQL functions don't have any ability to include control logic at all, you're not going to be moving anything of that description over. Besides, the control logic usually takes next to no time compared to the SQL operations. The reason that plpgsql-issued queries are sometimes slower than queries executed directly is that plpgsql parameterizes the queries according to whatever plpgsql variables/parameters they use, and sometimes you get a worse plan if the planner can't see the exact values of particular variables used in a query. The reason plpgsql does that is that it saves the plans for individual SQL queries within a function for the life of the session. SQL functions involve no such state --- either they get inlined into the calling query, in which case they have to be planned when that query is, or else they are planned on-the-fly at beginning of execution. So your change has definitely de-optimized things in the sense of introducing more planning work. Now you could have seen a win anyway, if plpgsql's parameterized query plans were sufficiently inefficient that planning on-the-fly with actual variable values would beat them out. But that's evidently not the case for (most of?) your usage patterns. In places where it is the case, the usual advice is to fix it by using EXECUTE, not by giving up plpgsql's ability to cache plans everywhere else. It's possible that at some point we'll try to introduce plan caching for non-inlined SQL functions. But at best this would put them on a par with plpgsql speed-wise. Really the only place where a SQL function will be a win for performance is if it can be inlined into the calling query, and that's pretty much never the case in the usage pattern you're talking about. (The sort of inlining we're talking about is more or less textual substitution, and you can't insert an INSERT/UPDATE/DELETE in a SELECT.) regards, tom lane
Wow. Thanks so much to all of you for the thoughtful and helpful responses! Reuven
On Wed, Oct 13, 2010 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's possible that at some point we'll try to introduce plan caching > for non-inlined SQL functions. hm, I think the search_path/function plan issue would have to be dealt with before doing this -- a while back IIRC you suggested function plans might be organized around search_path setting at plan time, or this would break a fair amount of code (for example, mine) :-). merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Wed, Oct 13, 2010 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's possible that at some point we'll try to introduce plan caching >> for non-inlined SQL functions. > hm, I think the search_path/function plan issue would have to be dealt > with before doing this -- Yeah, perhaps. There doesn't seem to be any groundswell of demand for doing anything about that anyway. Particularly since plpgsql is now installed by default, a reasonable answer to "I'd like the system to cache plans for this" is now "so write it in plpgsql instead". regards, tom lane