Thread: pl/pgsql functions outperforming sql ones?
Assuming there was some sort of cost to pl/pgsql, I rewrote a bunch of stored functions s in straight SQL. Each stored proc was calling the next, so to get the full effect I had to track down all the pl/pgsql stored functions and convert them to sql. However, I was surprised to find after all of the rewrites, the LANGUAGE sql procs caused the queries to run slower than the LANGUAGE plpgsql.
None of the stored functions selected from tables, the operated on and returned scalar values - it was all assign variables, if/then/else - not even any looping.
For those who need the dirty details, here they are. If you happen to think this behavior is expected, I needn’t bore you – just let me know!
Thanks,
Carlo
This was all triggered during the optimization of a query like this:
SELECT myVar
FROM myTable
WHERE myFunc(myVar);
Looking at EXPLAIN ANALYSE I saw something like this:
Filter: myFunc(myVar)
I rewrote the body of myFunc(myVar) something like this:
SELECT CASE WHEN myVar IS NULL THEN false ELSE myOtherFunc(myVar) END
When I reran EXPLAIN ANALYZE I got this:
Filter: SELECT CASE WHEN myVar IS NULL THEN false ELSE myOtherFunc(myVar) END
Nice. So, I did the same treatment to myOtherFunc() (converted to straight sql) but the EXPLAIN ANALYZE didn’t change (reasonable, I guess – how deep would I expect it to go?)
All of the procs were IMMUTABLE.
I was very surprised to find that the query now ran much slower by a factor of 4.
On Thu, Jan 26, 2012 at 6:09 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Assuming there was some sort of cost to pl/pgsql, I rewrote a bunch of > stored functions s in straight SQL. Each stored proc was calling the next, > so to get the full effect I had to track down all the pl/pgsql stored > functions and convert them to sql. However, I was surprised to find after > all of the rewrites, the LANGUAGE sql procs caused the queries to run slower > than the LANGUAGE plpgsql. One reason that plpgsql can outperform sql functions is that plpgsql caches plans. That said, I don't think that's what's happening here. Did you confirm the performance difference outside of EXPLAIN ANALYZE? In particular cases EXPLAIN ANALYZE can skew times, either by injecting time calls or in how it discards results. merlin
Yes, I did test it - i.e. I ran the functions on their own as I had always noticed a minor difference between EXPLAIN ANALYZE results and direct query calls. Interesting, so sql functions DON'T cache plans? Will plan-caching be of any benefit to SQL that makes no reference to any tables? The SQL is emulating the straight non-set-oriented procedural logic of the original plpgsql. -----Original Message----- From: Merlin Moncure [mailto:mmoncure@gmail.com] Sent: January 27, 2012 10:47 AM To: Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? On Thu, Jan 26, 2012 at 6:09 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Assuming there was some sort of cost to pl/pgsql, I rewrote a bunch of > stored functions s in straight SQL. Each stored proc was calling the next, > so to get the full effect I had to track down all the pl/pgsql stored > functions and convert them to sql. However, I was surprised to find after > all of the rewrites, the LANGUAGE sql procs caused the queries to run slower > than the LANGUAGE plpgsql. One reason that plpgsql can outperform sql functions is that plpgsql caches plans. That said, I don't think that's what's happening here. Did you confirm the performance difference outside of EXPLAIN ANALYZE? In particular cases EXPLAIN ANALYZE can skew times, either by injecting time calls or in how it discards results. merlin
You can use PREPARE... EXECUTE to "cache" the plan (as well as parsing). However, I find it unlikely this will would explain the loss in performance you experienced. Deron On Fri, Jan 27, 2012 at 11:36 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Yes, I did test it - i.e. I ran the functions on their own as I had always > noticed a minor difference between EXPLAIN ANALYZE results and direct query > calls. > > Interesting, so sql functions DON'T cache plans? Will plan-caching be of any > benefit to SQL that makes no reference to any tables? The SQL is emulating > the straight non-set-oriented procedural logic of the original plpgsql. > > -----Original Message----- > From: Merlin Moncure [mailto:mmoncure@gmail.com] > Sent: January 27, 2012 10:47 AM > To: Carlo Stonebanks > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? > > On Thu, Jan 26, 2012 at 6:09 PM, Carlo Stonebanks > <stonec.register@sympatico.ca> wrote: >> Assuming there was some sort of cost to pl/pgsql, I rewrote a bunch of >> stored functions s in straight SQL. Each stored proc was calling the next, >> so to get the full effect I had to track down all the pl/pgsql stored >> functions and convert them to sql. However, I was surprised to find after >> all of the rewrites, the LANGUAGE sql procs caused the queries to run > slower >> than the LANGUAGE plpgsql. > > One reason that plpgsql can outperform sql functions is that plpgsql > caches plans. That said, I don't think that's what's happening here. > Did you confirm the performance difference outside of EXPLAIN ANALYZE? > In particular cases EXPLAIN ANALYZE can skew times, either by > injecting time calls or in how it discards results. > > merlin > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
Was I even right in thinking I would gain any performance by converting to SQL? -----Original Message----- From: Deron [mailto:fecastle@gmail.com] Sent: January 27, 2012 2:29 PM To: Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? You can use PREPARE... EXECUTE to "cache" the plan (as well as parsing). However, I find it unlikely this will would explain the loss in performance you experienced. Deron On Fri, Jan 27, 2012 at 11:36 AM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Yes, I did test it - i.e. I ran the functions on their own as I had always > noticed a minor difference between EXPLAIN ANALYZE results and direct query > calls. > > Interesting, so sql functions DON'T cache plans? Will plan-caching be of any > benefit to SQL that makes no reference to any tables? The SQL is emulating > the straight non-set-oriented procedural logic of the original plpgsql. > > -----Original Message----- > From: Merlin Moncure [mailto:mmoncure@gmail.com] > Sent: January 27, 2012 10:47 AM > To: Carlo Stonebanks > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? > > On Thu, Jan 26, 2012 at 6:09 PM, Carlo Stonebanks > <stonec.register@sympatico.ca> wrote: >> Assuming there was some sort of cost to pl/pgsql, I rewrote a bunch of >> stored functions s in straight SQL. Each stored proc was calling the next, >> so to get the full effect I had to track down all the pl/pgsql stored >> functions and convert them to sql. However, I was surprised to find after >> all of the rewrites, the LANGUAGE sql procs caused the queries to run > slower >> than the LANGUAGE plpgsql. > > One reason that plpgsql can outperform sql functions is that plpgsql > caches plans. That said, I don't think that's what's happening here. > Did you confirm the performance difference outside of EXPLAIN ANALYZE? > In particular cases EXPLAIN ANALYZE can skew times, either by > injecting time calls or in how it discards results. > > merlin > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
On Jan 27, 2012, at 2:59 PM, Carlo Stonebanks wrote: > Was I even right in thinking I would gain any performance by converting to > SQL? As always, it depends. I converted an immutable pl/pgsql function to an SQL function and the body of the function barelychanged. However, I experienced an order-of-magnitude speed-up because the SQL function could be folded into the plan(like a view) while a pl/pgsql function will never be folded (and the planner punts and assumes the function will return100 rows for set-returning functions). However, not all SQL functions can be folded into the plan. On the other hand, a pl/pgsql function can make use of memoization for number-crunching routines and make business-logicalshort-circuiting decisions. Cheers, M
2012/1/27 Carlo Stonebanks <stonec.register@sympatico.ca>: > Yes, I did test it - i.e. I ran the functions on their own as I had always > noticed a minor difference between EXPLAIN ANALYZE results and direct query > calls. > > Interesting, so sql functions DON'T cache plans? Will plan-caching be of any > benefit to SQL that makes no reference to any tables? The SQL is emulating > the straight non-set-oriented procedural logic of the original plpgsql. > It is not necessary usually - simple SQL functions are merged to outer query - there are e few cases where this optimization cannot be processed and then there are performance lost. For example this optimization is not possible (sometimes) when some parameter is volatile Regards Pavel Stehule
Pavel, are you saying that the code of the stored function is actually being added to the SQL query, instead of a call to it? For example, I have seen this: SELECT myVar FROM myTable WHERE myVar > 0 AND myFunc(myVar) And seen the SQL body of myVar appended to the outer query: ... Filter: SELECT CASE WHERE myVar < 10 THEN true ELSE false END Is this what we are talking about? Two questions: 1) Is this also done when the function is called as a SELECT column; e.g. would: SELECT myFunc(myVar) AS result - become: SELECT ( SELECT CASE WHERE myVar < 10 THEN true ELSE false END ) AS result? 2) Does that not bypass the benefits of IMMUTABLE? -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Pavel Stehule Sent: January 28, 2012 1:38 AM To: Carlo Stonebanks Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? 2012/1/27 Carlo Stonebanks <stonec.register@sympatico.ca>: > Yes, I did test it - i.e. I ran the functions on their own as I had always > noticed a minor difference between EXPLAIN ANALYZE results and direct query > calls. > > Interesting, so sql functions DON'T cache plans? Will plan-caching be of any > benefit to SQL that makes no reference to any tables? The SQL is emulating > the straight non-set-oriented procedural logic of the original plpgsql. > It is not necessary usually - simple SQL functions are merged to outer query - there are e few cases where this optimization cannot be processed and then there are performance lost. For example this optimization is not possible (sometimes) when some parameter is volatile Regards Pavel Stehule -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Hello 2012/1/30 Carlo Stonebanks <stonec.register@sympatico.ca>: > Pavel, are you saying that the code of the stored function is actually being > added to the SQL query, instead of a call to it? For example, I have seen > this: > > SELECT myVar > FROM myTable > WHERE myVar > 0 AND myFunc(myVar) > > And seen the SQL body of myVar appended to the outer query: > > ... Filter: SELECT CASE WHERE myVar < 10 THEN true ELSE false END > > Is this what we are talking about? Two questions: yes - it is SQL function "inlining" > > 1) Is this also done when the function is called as a SELECT column; > e.g. would: > SELECT myFunc(myVar) AS result > - become: > SELECT ( > SELECT CASE WHERE myVar < 10 THEN true ELSE false END > ) AS result? > yes CREATE OR REPLACE FUNCTION public.fx(integer, integer) RETURNS integer LANGUAGE sql AS $function$ select coalesce($1, $2) $function$ postgres=# explain verbose select fx(random()::int, random()::int); QUERY PLAN -------------------------------------------------------------- Result (cost=0.00..0.02 rows=1 width=0) Output: COALESCE((random())::integer, (random())::integer) (2 rows) > 2) Does that not bypass the benefits of IMMUTABLE? > no - optimizator works with expanded query - usually is preferred style a writing SQL functions without flags, because optimizer can work with definition of SQL function and can set well flags. SQL function is not black box for optimizer like plpgsql does. And SQL optimizer chooses a inlining or some other optimizations. Sometimes explicit flags are necessary, but usually not for scalar SQL functions. postgres=# create or replace function public.fxs(int) postgres-# returns setof int as $$ postgres$# select * from generate_series(1,$1) postgres$# $$ language sql; CREATE FUNCTION postgres=# explain verbose select * from fxs(10); QUERY PLAN ------------------------------------------------------------------- Function Scan on public.fxs (cost=0.25..10.25 rows=1000 width=4) Output: fxs Function Call: fxs(10) (3 rows) postgres=# create or replace function public.fxs(int) returns setof int as $$ select * from generate_series(1,$1) $$ language sql IMMUTABLE; CREATE FUNCTION postgres=# explain verbose select * from fxs(10); QUERY PLAN ----------------------------------------------------------------------------------- Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000 width=4) Output: generate_series.generate_series Function Call: generate_series(1, 10) --<<<< inlined query (3 rows) Regards Pavel Stehule > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Pavel Stehule > Sent: January 28, 2012 1:38 AM > To: Carlo Stonebanks > Cc: Merlin Moncure; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? > > 2012/1/27 Carlo Stonebanks <stonec.register@sympatico.ca>: >> Yes, I did test it - i.e. I ran the functions on their own as I had > always >> noticed a minor difference between EXPLAIN ANALYZE results and direct > query >> calls. >> >> Interesting, so sql functions DON'T cache plans? Will plan-caching be of > any >> benefit to SQL that makes no reference to any tables? The SQL is emulating >> the straight non-set-oriented procedural logic of the original plpgsql. >> > > It is not necessary usually - simple SQL functions are merged to outer > query - there are e few cases where this optimization cannot be > processed and then there are performance lost. > > For example this optimization is not possible (sometimes) when some > parameter is volatile > > Regards > > Pavel Stehule > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Update: The main stored function in question and all of its sub sub-functions were recoded to new pure sql functions. I then stub tested the sub functions sql vs. plpgsql. Here were the results for new sql vs old plpgsql: Individual sub functions tested 20-30% faster But the main function calling new sql sub functions ran 100% slower So I tried this: I modified the old plpgsql function to call the new sql sub functions. THAT ran 20-30% faster then the unmodified version. That modified function is listed below. All the functions ending in 2 are the new SQL versions. Any thoughts or insight would be much appreciated. Carlo CREATE OR REPLACE FUNCTION mdx_lib.lex_compare_candidate3(character varying, character varying) RETURNS numeric AS $BODY$ /* Rate two strings candidacy for lex_compare. param 1: first string to compare param 2: 2nd string to compare returns: numeric result like mdx_lib.lex_distance 0 is a failure, 1 a perfect match */ declare str1 varchar = $1; str2 varchar = $2; acro1 varchar; acro2 varchar; str_dist numeric; acro_dist numeric; result numeric; begin if str1 = str2 then result = 0; else str1 = lower(regexp_replace(str1, '[^[:alnum:]]', '', 'g')); str2 = lower(regexp_replace(str2, '[^[:alnum:]]', '', 'g')); if str1 = str2 then result = 0.1; else str_dist = mdx_lib.lex_distance2(str1, str2); acro1 = mdx_lib.lex_acronym2(str1); acro2 = mdx_lib.lex_acronym2(str2); acro_dist = mdx_lib.lex_distance2(acro1, acro2); result = (acro_dist + (str_dist * 2)) / 2; end if; end if; result = 1 - result; if result < 0 then result = 0; end if; return result; end; $BODY$ LANGUAGE plpgsql IMMUTABLE COST 100; -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Pavel Stehule Sent: January 28, 2012 1:38 AM To: Carlo Stonebanks Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? 2012/1/27 Carlo Stonebanks <stonec.register@sympatico.ca>: > Yes, I did test it - i.e. I ran the functions on their own as I had always > noticed a minor difference between EXPLAIN ANALYZE results and direct query > calls. > > Interesting, so sql functions DON'T cache plans? Will plan-caching be of any > benefit to SQL that makes no reference to any tables? The SQL is emulating > the straight non-set-oriented procedural logic of the original plpgsql. > It is not necessary usually - simple SQL functions are merged to outer query - there are e few cases where this optimization cannot be processed and then there are performance lost. For example this optimization is not possible (sometimes) when some parameter is volatile Regards Pavel Stehule -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Pavel, thank you very much for your explanation. Is it possible to define under what conditions that sql procs will outperform plpgsql ones, and vice-versa? -----Original Message----- From: Pavel Stehule [mailto:pavel.stehule@gmail.com] Sent: January 30, 2012 2:57 AM To: Carlo Stonebanks Cc: Merlin Moncure; pgsql-performance@postgresql.org Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? Hello 2012/1/30 Carlo Stonebanks <stonec.register@sympatico.ca>: > Pavel, are you saying that the code of the stored function is actually being > added to the SQL query, instead of a call to it? For example, I have seen > this: > > SELECT myVar > FROM myTable > WHERE myVar > 0 AND myFunc(myVar) > > And seen the SQL body of myVar appended to the outer query: > > ... Filter: SELECT CASE WHERE myVar < 10 THEN true ELSE false END > > Is this what we are talking about? Two questions: yes - it is SQL function "inlining" > > 1) Is this also done when the function is called as a SELECT column; > e.g. would: > SELECT myFunc(myVar) AS result > - become: > SELECT ( > SELECT CASE WHERE myVar < 10 THEN true ELSE false END > ) AS result? > yes CREATE OR REPLACE FUNCTION public.fx(integer, integer) RETURNS integer LANGUAGE sql AS $function$ select coalesce($1, $2) $function$ postgres=# explain verbose select fx(random()::int, random()::int); QUERY PLAN -------------------------------------------------------------- Result (cost=0.00..0.02 rows=1 width=0) Output: COALESCE((random())::integer, (random())::integer) (2 rows) > 2) Does that not bypass the benefits of IMMUTABLE? > no - optimizator works with expanded query - usually is preferred style a writing SQL functions without flags, because optimizer can work with definition of SQL function and can set well flags. SQL function is not black box for optimizer like plpgsql does. And SQL optimizer chooses a inlining or some other optimizations. Sometimes explicit flags are necessary, but usually not for scalar SQL functions. postgres=# create or replace function public.fxs(int) postgres-# returns setof int as $$ postgres$# select * from generate_series(1,$1) postgres$# $$ language sql; CREATE FUNCTION postgres=# explain verbose select * from fxs(10); QUERY PLAN ------------------------------------------------------------------- Function Scan on public.fxs (cost=0.25..10.25 rows=1000 width=4) Output: fxs Function Call: fxs(10) (3 rows) postgres=# create or replace function public.fxs(int) returns setof int as $$ select * from generate_series(1,$1) $$ language sql IMMUTABLE; CREATE FUNCTION postgres=# explain verbose select * from fxs(10); QUERY PLAN ---------------------------------------------------------------------------- ------- Function Scan on pg_catalog.generate_series (cost=0.00..10.00 rows=1000 width=4) Output: generate_series.generate_series Function Call: generate_series(1, 10) --<<<< inlined query (3 rows) Regards Pavel Stehule > > > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Pavel Stehule > Sent: January 28, 2012 1:38 AM > To: Carlo Stonebanks > Cc: Merlin Moncure; pgsql-performance@postgresql.org > Subject: Re: [PERFORM] pl/pgsql functions outperforming sql ones? > > 2012/1/27 Carlo Stonebanks <stonec.register@sympatico.ca>: >> Yes, I did test it - i.e. I ran the functions on their own as I had > always >> noticed a minor difference between EXPLAIN ANALYZE results and direct > query >> calls. >> >> Interesting, so sql functions DON'T cache plans? Will plan-caching be of > any >> benefit to SQL that makes no reference to any tables? The SQL is emulating >> the straight non-set-oriented procedural logic of the original plpgsql. >> > > It is not necessary usually - simple SQL functions are merged to outer > query - there are e few cases where this optimization cannot be > processed and then there are performance lost. > > For example this optimization is not possible (sometimes) when some > parameter is volatile > > Regards > > Pavel Stehule > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
2012/1/31 Carlo Stonebanks <stonec.register@sympatico.ca>: > Pavel, thank you very much for your explanation. > > Is it possible to define under what conditions that sql procs will > outperform plpgsql ones, and vice-versa? yes, little bit :) when inlining is possible, then SQL function will be faster - typical use case is simple scalar functions (with nonvolatile real parameters). Regards Pavel > > -----Original Message-----
On Sat, Jan 28, 2012 at 11:20 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote: > Update: The main stored function in question and all of its sub > sub-functions were recoded to new pure sql functions. > > I then stub tested the sub functions sql vs. plpgsql. > > Here were the results for new sql vs old plpgsql: > > Individual sub functions tested 20-30% faster > > But the main function calling new sql sub functions ran 100% slower > > So I tried this: > > I modified the old plpgsql function to call the new sql sub functions. > > THAT ran 20-30% faster then the unmodified version. > > That modified function is listed below. All the functions ending in 2 are > the new SQL versions. One advantage of PL/pgsql for code like this is that you can compute values once and save them in variables. SQL doesn't have variables, so you can end up repeating the same SQL in multiple places (causing multiple evaluation), or even if you manage to avoid that, the system can inline things in multiple places and produce the same effect. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company