Thread: plpgsql vs. SQL performance

plpgsql vs. SQL performance

From
google@newtopia.com (Michael Pohl)
Date:
I am occasionally seeing plpgsql functions significantly underperform
their straight SQL equivalents.  A simple example I ran into this
evening:

create or replace function pl_get_user_item_count(int, int)
returns int as '
declare
 input_user_id   alias for $1;
 input_status_id alias for $2;
 item_count      int;
begin
 select
  into item_count count(id)
 from
  "item"
 where
  user_id   = input_user_id and
  status_id = input_status_id;
 return item_count;
end;
' language 'plpgsql';

This function is taking around 2.11 seconds to execute, vs. 0.09 for
the identical SQL:

[michael@server1 ~]$ time psql -c "select pl_get_user_item_count(1,2)"
swap_dev  pl_get_user_item_count
------------------------
                      9
(1 row)

0.000u 0.010s 0:02.11 0.4%      0+0k 0+0io 229pf+0w

[michael@server1 ~]$ time psql -c "select count(id) from item where
user_id = 1 and status_id = 2" swap_dev
 count
-------
     9
(1 row)

0.000u 0.000s 0:00.09 0.0%      0+0k 0+0io 229pf+0w

I can provide table schema and 'explain' output if that would help.
My general question is:  Should I expect certain SQL to execute
significantly more slowly when wrapped in a plpgsql function?  My db
experience is mainly with Sybase, and I'm used to performance boosts
with Transact-SQL stored procedures.  It seems strange to see any
penalty at all for using a stored procedure, much less a harsh one as
in the example above.

Input appreciated.

thanks,

michael

Re: plpgsql vs. SQL performance

From
Tom Lane
Date:
google@newtopia.com (Michael Pohl) writes:
> I am occasionally seeing plpgsql functions significantly underperform
> their straight SQL equivalents.

Almost certainly, a different query plan is getting chosen in the
plpgsql case.

One common cause of this problem is sloppiness about datatypes.  You
have declared $1 and $2 of the plpgsql function to be integer; are
the columns they're being compared to also integer?  If not, that's
likely preventing indexscans from being used.

Another common cause of this sort of thing is that the planner makes
conservative choices because it doesn't have exact runtime values for
the constants in the query.  What you are really comparing here is

plpgsql:
    select ... where user_id = $1 and status_id = $2

SQL:
    select ... where user_id = 1 and status_id = 2

In the latter case the planner can consult pg_statistic to get a pretty
good idea about how many rows will be selected, whereas in the former
case its guess is much more approximate.  (I'd still expect an indexscan
to get picked though, unless you have *very* skewed data statistics for
these columns.  Usually it's inequalities that push the planner to use
a seqscan in these cases.)

            regards, tom lane