Thread: Cursor fetch performance issue

Cursor fetch performance issue

From

Tony Capobianco

Date:

24 January 2012, 16:41:57

We are migrating our Oracle warehouse to Postgres 9.

This function responds well:

pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
 getmemberadminprevious_sp2
----------------------------
 <unnamed portal 1>
(1 row)

Time: 7.549 ms

However, when testing, this fetch takes upwards of 38 minutes:

BEGIN;
select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
FETCH ALL IN "<unnamed portal 2>";

How can I diagnose any performance issues with the fetch in the cursor?

Thanks.
Tony

Re: Cursor fetch performance issue

From

Pavel Stehule

Date:

24 January 2012, 16:48:32

Hello

2012/1/24 Tony Capobianco <tcapobianco@prospectiv.com>:
> We are migrating our Oracle warehouse to Postgres 9.
>
> This function responds well:
>
> pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
>  getmemberadminprevious_sp2
> ----------------------------
>  <unnamed portal 1>
> (1 row)
>
> Time: 7.549 ms
>
> However, when testing, this fetch takes upwards of 38 minutes:
>
> BEGIN;
> select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
> FETCH ALL IN "<unnamed portal 2>";
>
> How can I diagnose any performance issues with the fetch in the cursor?
>

Cursors are optimized to returns small subset of result - if you plan
to read complete result, then set

set cursor_tuple_fraction to 1.0;

this is session config value, you can set it before selected cursors queries

Regards

Pavel Stehule

> Thanks.
> Tony
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: Cursor fetch performance issue

From

Tony Capobianco

Date:

24 January 2012, 16:57:57

Running just the sql of the function returns only 10 rows:

pg=# SELECT m.memberid, m.websiteid, m.emailaddress,
pg-#                m.firstname, m.lastname, m.regcomplete, m.emailok
pg-#         FROM   members m
pg-#         WHERE  m.emailaddress LIKE 'test.email@hotmail.com'
pg-#         AND    m.changedate_id < 5868 ORDER BY m.emailaddress, m.websiteid;
 memberid  | websiteid |    emailaddress        | firstname | lastname | regcomplete | emailok
-----------+-----------+------------------------+-----------+----------+-------------+---------
 247815829 |         1 | test.email@hotmail.com | email     | test     |           1 |       1
 300960335 |        62 | test.email@hotmail.com |           |          |           1 |       1
 300959937 |       625 | test.email@hotmail.com |           |          |           1 |       1
 260152830 |      1453 | test.email@hotmail.com |           |          |           1 |       1
 300960163 |      1737 | test.email@hotmail.com | email     | test     |           1 |       1
 300960259 |      1824 | test.email@hotmail.com | email     | test     |           1 |       1
 300959742 |      1928 | test.email@hotmail.com | email     | test     |           1 |       1
 368122699 |      2457 | test.email@hotmail.com | email     | test     |           1 |       1
 403218613 |      2464 | test.email@hotmail.com | email     | test     |           1 |       0
 378951994 |      2656 | test.email@hotmail.com |           |          |           1 |       1
(10 rows)

Time: 132.626 ms

So, it would seem that's a small enough number of rows.  Unfortunately, issuing:

set cursor_tuple_fraction to 1.0;

Did not have an effect on performance.  Is it common to modify this
cursor_tuple_fraction parameter each time we execute the function?


On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:
> Hello
>
> 2012/1/24 Tony Capobianco <tcapobianco@prospectiv.com>:
> > We are migrating our Oracle warehouse to Postgres 9.
> >
> > This function responds well:
> >
> > pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
> >  getmemberadminprevious_sp2
> > ----------------------------
> >  <unnamed portal 1>
> > (1 row)
> >
> > Time: 7.549 ms
> >
> > However, when testing, this fetch takes upwards of 38 minutes:
> >
> > BEGIN;
> > select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
> > FETCH ALL IN "<unnamed portal 2>";
> >
> > How can I diagnose any performance issues with the fetch in the cursor?
> >
>
> Cursors are optimized to returns small subset of result - if you plan
> to read complete result, then set
>
> set cursor_tuple_fraction to 1.0;
>
> this is session config value, you can set it before selected cursors queries
>
> Regards
>
> Pavel Stehule
>
> > Thanks.
> > Tony
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Cursor fetch performance issue

From

Pavel Stehule

Date:

24 January 2012, 17:12:39

2012/1/24 Tony Capobianco <tcapobianco@prospectiv.com>:
> Running just the sql of the function returns only 10 rows:
>
> pg=# SELECT m.memberid, m.websiteid, m.emailaddress,
> pg-#                m.firstname, m.lastname, m.regcomplete, m.emailok
> pg-#         FROM   members m
> pg-#         WHERE  m.emailaddress LIKE 'test.email@hotmail.com'
> pg-#         AND    m.changedate_id < 5868 ORDER BY m.emailaddress, m.websiteid;
>  memberid  | websiteid |    emailaddress        | firstname | lastname | regcomplete | emailok
> -----------+-----------+------------------------+-----------+----------+-------------+---------
>  247815829 |         1 | test.email@hotmail.com | email     | test     |           1 |       1
>  300960335 |        62 | test.email@hotmail.com |           |          |           1 |       1
>  300959937 |       625 | test.email@hotmail.com |           |          |           1 |       1
>  260152830 |      1453 | test.email@hotmail.com |           |          |           1 |       1
>  300960163 |      1737 | test.email@hotmail.com | email     | test     |           1 |       1
>  300960259 |      1824 | test.email@hotmail.com | email     | test     |           1 |       1
>  300959742 |      1928 | test.email@hotmail.com | email     | test     |           1 |       1
>  368122699 |      2457 | test.email@hotmail.com | email     | test     |           1 |       1
>  403218613 |      2464 | test.email@hotmail.com | email     | test     |           1 |       0
>  378951994 |      2656 | test.email@hotmail.com |           |          |           1 |       1
> (10 rows)
>
> Time: 132.626 ms
>
> So, it would seem that's a small enough number of rows.  Unfortunately, issuing:
>
> set cursor_tuple_fraction to 1.0;
>
> Did not have an effect on performance.  Is it common to modify this
> cursor_tuple_fraction parameter each time we execute the function?
>

no, usually only before some strange query. Check execution plan,
please - but I don't think so your slow query depends on cursor usage.

postgres=# set cursor_tuple_fraction TO 1.0;
SET
postgres=# explain declare x cursor for select * from foo where a % 2
= 0 order by a;
                           QUERY PLAN
────────────────────────────────────────────────────────────────
 Sort  (cost=19229.19..19241.69 rows=5000 width=4)
   Sort Key: a
   ->  Seq Scan on foo  (cost=0.00..18922.00 rows=5000 width=4)
         Filter: ((a % 2) = 0)
(4 rows)

postgres=# set cursor_tuple_fraction TO 1.0;
SET
postgres=# explain declare x cursor for select * from foo where a % 2
= 0 order by a;
                           QUERY PLAN
────────────────────────────────────────────────────────────────
 Sort  (cost=19229.19..19241.69 rows=5000 width=4)
   Sort Key: a
   ->  Seq Scan on foo  (cost=0.00..18922.00 rows=5000 width=4)
         Filter: ((a % 2) = 0)
(4 rows)

postgres=# set cursor_tuple_fraction TO 0.1;
SET
postgres=# explain declare x cursor for select * from foo where a % 2
= 0 order by a;
                                QUERY PLAN
───────────────────────────────────────────────────────────────────────────
 Index Scan using foo_pkey on foo  (cost=0.00..32693.34 rows=5000 width=4)
   Filter: ((a % 2) = 0)
(2 rows)

Regards

Pavel Stehule
>
> On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:
>> Hello
>>
>> 2012/1/24 Tony Capobianco <tcapobianco@prospectiv.com>:
>> > We are migrating our Oracle warehouse to Postgres 9.
>> >
>> > This function responds well:
>> >
>> > pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
>> >  getmemberadminprevious_sp2
>> > ----------------------------
>> >  <unnamed portal 1>
>> > (1 row)
>> >
>> > Time: 7.549 ms
>> >
>> > However, when testing, this fetch takes upwards of 38 minutes:
>> >
>> > BEGIN;
>> > select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
>> > FETCH ALL IN "<unnamed portal 2>";
>> >
>> > How can I diagnose any performance issues with the fetch in the cursor?
>> >
>>
>> Cursors are optimized to returns small subset of result - if you plan
>> to read complete result, then set
>>
>> set cursor_tuple_fraction to 1.0;
>>
>> this is session config value, you can set it before selected cursors queries
>>
>> Regards
>>
>> Pavel Stehule
>>
>> > Thanks.
>> > Tony
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>

Re: Cursor fetch performance issue

From

Andy Colson

Date:

24 January 2012, 17:12:40

> On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:
>> Hello
>>
>> 2012/1/24 Tony Capobianco<tcapobianco@prospectiv.com>:
>>> We are migrating our Oracle warehouse to Postgres 9.
>>>
>>> This function responds well:
>>>
>>> pg=# select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
>>>   getmemberadminprevious_sp2
>>> ----------------------------
>>>   <unnamed portal 1>
>>> (1 row)
>>>
>>> Time: 7.549 ms
>>>
>>> However, when testing, this fetch takes upwards of 38 minutes:
>>>
>>> BEGIN;
>>> select public.getMemberAdminPrevious_sp2(247815829, 1,'test.email@hotmail.com', 'email', 'test');
>>> FETCH ALL IN "<unnamed portal 2>";
>>>
>>> How can I diagnose any performance issues with the fetch in the cursor?
>>>
>>
>> Cursors are optimized to returns small subset of result - if you plan
>> to read complete result, then set
>>
>> set cursor_tuple_fraction to 1.0;
>>
>> this is session config value, you can set it before selected cursors queries
>>
>> Regards
>>
>> Pavel Stehule
>>
>>> Thanks.
>>> Tony
>>>
>>>
>>> --
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
>


On 1/24/2012 2:57 PM, Tony Capobianco wrote:
 > Running just the sql of the function returns only 10 rows:
 >
 > pg=# SELECT m.memberid, m.websiteid, m.emailaddress,
 > pg-#                m.firstname, m.lastname, m.regcomplete, m.emailok
 > pg-#         FROM   members m
 > pg-#         WHERE  m.emailaddress LIKE 'test.email@hotmail.com'
 > pg-#         AND    m.changedate_id<  5868 ORDER BY m.emailaddress,
m.websiteid;
 >   memberid  | websiteid |    emailaddress        | firstname |
lastname | regcomplete | emailok
 >
-----------+-----------+------------------------+-----------+----------+-------------+---------
 >   247815829 |         1 | test.email@hotmail.com | email     | test
    |           1 |       1
 >   300960335 |        62 | test.email@hotmail.com |           |
    |           1 |       1
 >   300959937 |       625 | test.email@hotmail.com |           |
    |           1 |       1
 >   260152830 |      1453 | test.email@hotmail.com |           |
    |           1 |       1
 >   300960163 |      1737 | test.email@hotmail.com | email     | test
    |           1 |       1
 >   300960259 |      1824 | test.email@hotmail.com | email     | test
    |           1 |       1
 >   300959742 |      1928 | test.email@hotmail.com | email     | test
    |           1 |       1
 >   368122699 |      2457 | test.email@hotmail.com | email     | test
    |           1 |       1
 >   403218613 |      2464 | test.email@hotmail.com | email     | test
    |           1 |       0
 >   378951994 |      2656 | test.email@hotmail.com |           |
    |           1 |       1
 > (10 rows)
 >
 > Time: 132.626 ms
 >
 > So, it would seem that's a small enough number of rows.
Unfortunately, issuing:
 >
 > set cursor_tuple_fraction to 1.0;
 >
 > Did not have an effect on performance.  Is it common to modify this
 > cursor_tuple_fraction parameter each time we execute the function?
 >
 >


So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?

SELECT m.memberid, m.websiteid, m.emailaddress,
        m.firstname, m.lastname, m.regcomplete, m.emailok
        FROM   members m
        WHERE  m.emailaddress LIKE $1
        AND    m.changedate_id < $2
       ORDER BY m.emailaddress, m.websiteid;

Or is it creating the string and executing it:

sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
     ||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
     ||  ' FROM   members m
     ||  ' WHERE  m.emailaddress LIKE ' || arg1
     ||  ' AND    m.changedate_id < ' || arg2
     ||  ' ORDER BY m.emailaddress, m.websiteid ';
execute(sql);

Maybe its the planner doesnt plan so well with $1 arguments vs actual
arguments thing.

-Andy

Re: Cursor fetch performance issue

From

Pavel Stehule

Date:

24 January 2012, 17:18:10

Hello

>
> So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?
>
> SELECT m.memberid, m.websiteid, m.emailaddress,
>       m.firstname, m.lastname, m.regcomplete, m.emailok
>       FROM   members m
>       WHERE  m.emailaddress LIKE $1
>       AND    m.changedate_id < $2
>      ORDER BY m.emailaddress, m.websiteid;
>
> Or is it creating the string and executing it:
>
> sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
>    ||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
>    ||  ' FROM   members m
>    ||  ' WHERE  m.emailaddress LIKE ' || arg1
>    ||  ' AND    m.changedate_id < ' || arg2
>    ||  ' ORDER BY m.emailaddress, m.websiteid ';
> execute(sql);
>
> Maybe its the planner doesnt plan so well with $1 arguments vs actual
> arguments thing.
>

sure, it could be blind optimization problem in plpgsql. Maybe you
have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably

http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html

Regards

Pavel Stehule

> -Andy
>
>

Re: Cursor fetch performance issue

From

Tom Lane

Date:

24 January 2012, 17:28:35

Tony Capobianco <tcapobianco@prospectiv.com> writes:
> Running just the sql of the function returns only 10 rows:
> pg=# SELECT m.memberid, m.websiteid, m.emailaddress,
> pg-#                m.firstname, m.lastname, m.regcomplete, m.emailok
> pg-#         FROM   members m
> pg-#         WHERE  m.emailaddress LIKE 'test.email@hotmail.com'
> pg-#         AND    m.changedate_id < 5868 ORDER BY m.emailaddress, m.websiteid;

Based on that, I'd bet your problem is that the function is executing
    WHERE  m.emailaddress LIKE $1
(for some spelling of $1) and you are therefore not getting the benefit
of the index optimizations that can happen when LIKE's pattern is
constant.  Do you actually need LIKE rather than just "=" here?

            regards, tom lane

Re: Cursor fetch performance issue

From

Tony Capobianco

Date:

24 January 2012, 17:34:24

Here's the explain:

pg=# explain select getMemberAdminPrevious_sp(247815829, 1,'test.email@hotmail.com', 'Email', 'Test');
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.26 rows=1 width=0)
(1 row)

Time: 1.167 ms

There was discussion of 'LIKE' v. '=' and wildcard characters are not
being entered into the $1 parameter.

This is not generating a sql string.  I feel it's something to do with
the fetch of the refcursor.  The cursor is a larger part of a function:

CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 (
  p_memberid       IN numeric,
  p_websiteid      IN numeric,
  p_emailaddress   IN varchar,
  p_firstname      IN varchar,
  p_lastname       IN varchar)
RETURNS refcursor AS $$
DECLARE
  ref            refcursor;
  l_sysdateid    numeric;
BEGIN
  l_sysdateid := sysdateid();
  if (p_memberid != 0) then
    if (p_emailaddress IS NOT NULL) then
      OPEN ref FOR
        SELECT m.memberid, m.websiteid, m.emailaddress,
               m.firstname, m.lastname, m.regcomplete, m.emailok
        FROM   members m
        WHERE  m.emailaddress LIKE p_emailaddress
        AND    m.changedate_id < l_sysdateid ORDER BY m.emailaddress,
m.websiteid;
    end if;
  end if;
  Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
   Return null;
END;
$$ LANGUAGE 'plpgsql';


On Tue, 2012-01-24 at 22:17 +0100, Pavel Stehule wrote:
> Hello
>
> >
> > So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?
> >
> > SELECT m.memberid, m.websiteid, m.emailaddress,
> >       m.firstname, m.lastname, m.regcomplete, m.emailok
> >       FROM   members m
> >       WHERE  m.emailaddress LIKE $1
> >       AND    m.changedate_id < $2
> >      ORDER BY m.emailaddress, m.websiteid;
> >
> > Or is it creating the string and executing it:
> >
> > sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
> >    ||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
> >    ||  ' FROM   members m
> >    ||  ' WHERE  m.emailaddress LIKE ' || arg1
> >    ||  ' AND    m.changedate_id < ' || arg2
> >    ||  ' ORDER BY m.emailaddress, m.websiteid ';
> > execute(sql);
> >
> > Maybe its the planner doesnt plan so well with $1 arguments vs actual
> > arguments thing.
> >
>
> sure, it could be blind optimization problem in plpgsql. Maybe you
> have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably
>
> http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html
>
> Regards
>
> Pavel Stehule
>
> > -Andy
> >
> >
>

Re: Cursor fetch performance issue

From

Heikki Linnakangas

Date:

25 January 2012, 04:58:37

On 24.01.2012 23:34, Tony Capobianco wrote:
> Here's the explain:
>
> pg=# explain select getMemberAdminPrevious_sp(247815829, 1,'test.email@hotmail.com', 'Email', 'Test');
>                  QUERY PLAN
> ------------------------------------------
>   Result  (cost=0.00..0.26 rows=1 width=0)
> (1 row)
>
> Time: 1.167 ms

That's not very helpful. We'd need to see the plan of the query within
the function, not the plan on invoking the function. The auto_explain
contrib module with auto_explain_log_nested_statements=on might be
useful to get that.

> There was discussion of 'LIKE' v. '=' and wildcard characters are not
> being entered into the $1 parameter.
>
> This is not generating a sql string.  I feel it's something to do with
> the fetch of the refcursor.  The cursor is a larger part of a function:
>
> CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 (
>    p_memberid       IN numeric,
>    p_websiteid      IN numeric,
>    p_emailaddress   IN varchar,
>    p_firstname      IN varchar,
>    p_lastname       IN varchar)
> RETURNS refcursor AS $$
> DECLARE
>    ref            refcursor;
>    l_sysdateid    numeric;
> BEGIN
>    l_sysdateid := sysdateid();
>    if (p_memberid != 0) then
>      if (p_emailaddress IS NOT NULL) then
>        OPEN ref FOR
>          SELECT m.memberid, m.websiteid, m.emailaddress,
>                 m.firstname, m.lastname, m.regcomplete, m.emailok
>          FROM   members m
>          WHERE  m.emailaddress LIKE p_emailaddress
>          AND    m.changedate_id<  l_sysdateid ORDER BY m.emailaddress,
> m.websiteid;
>      end if;
>    end if;
>    Return ref;
> EXCEPTION
> WHEN NO_DATA_FOUND THEN
>     Return null;
> END;
> $$ LANGUAGE 'plpgsql';

The theory that the query takes a long time because "LIKE
p_emailaddress" is not optimizeable by the planner seems the most likely
to me.

If you don't actually use any wildcards in the email, try replacing LIKE
with =. If you do, then you can try the "OPEN ref FOR EXECUTE" syntax.
That way the query is re-planned every time, and the planner can take
advantage of the parameter value. That enables it to use an index on the
email address column, when there isn't in fact any wildcards in the
value, and also estimate the selectivities better which can lead to a
better plan. Like this:

CREATE OR REPLACE FUNCTION public.getmemberadminprevious_sp2(p_memberid
numeric, p_websiteid numeric, p_emailaddress character varying,
p_firstname character varying, p_lastname character varying)
  RETURNS refcursor
  LANGUAGE plpgsql
AS $function$
DECLARE
   ref            refcursor;
   l_sysdateid    numeric;
BEGIN
   l_sysdateid := sysdateid();
   if (p_memberid != 0) then
     if (p_emailaddress IS NOT NULL) then
       OPEN ref FOR EXECUTE $query$
         SELECT m.memberid, m.websiteid, m.emailaddress,
                m.firstname, m.lastname, m.regcomplete, m.emailok
         FROM   members m
         WHERE  m.emailaddress LIKE $1
         AND    m.changedate_id < $2 ORDER BY m.emailaddress,
m.websiteid;
       $query$ USING p_emailaddress, l_sysdateid;
     end if;
   end if;
   Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
    Return null;
END;
$function$

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com