Thread: Cost of initiating cursors

Cost of initiating cursors

From
Matthew Churcher
Date:
Hi All,

I have a question in regards to the cost of initiating a cursor (for loop) over a large number of rows (100,000+) and
actuallyretrieving little or none of them.  

For example:
  FOR curr_foo
  IN
  SELECT foo FROM bar
    WHERE wibble
    ORDER BY wobble
  LOOP
     EXIT; -- always break out of loop
  END LOOP;


For some reason this is hugely expensive and slow regardless of the selected execution plan and available indexes. The
WHEREand particularly the ORDER BY clause appear to be highly significant despite having appropriate indexes in place.  

It's the combination of the following behaviours I find particular perplexing:-
  1.) Removing the WHERE and ORDER BY clauses results in a very fast query
  2.) Adding a LIMIT clause also results in a very fast query.

This is perplexing because I don't see why ORDER BY should affect the cost of opening the cursor when indexes are in
placebut since it does why would LIMIT reduce the cost of ORDER BY as PostgreSQL would still need to order all of
candidaterecords. This is all assuming the cursor isn't actually retrieving all the rows which is my understanding of
howit should work. The configuration parameter 'cursor_tuple_fraction' is having no observable effect. 

This is being seen on Postgres 9.1 (Ubuntu x64), on a server with fast disks and large amount of memory. Basic memory
tuninghas also been performed. 

Thanks in advanced, I appreciate any insights.
Kind regards,
Matthew Churcher


Re: Cost of initiating cursors

From
Tom Lane
Date:
Matthew Churcher <matthew.churcher@realvnc.com> writes:
> For example:
>   FOR curr_foo
>   IN
>   SELECT foo FROM bar
>     WHERE wibble
>     ORDER BY wobble
>   LOOP
>      EXIT; -- always break out of loop
>   END LOOP;

> For some reason this is hugely expensive and slow regardless of the selected execution plan and available indexes.
TheWHERE and particularly the ORDER BY clause appear to be highly significant despite having appropriate indexes in
place. 

For a full-table query, the planner will frequently decide that a
seqscan-and-sort is cheaper than an indexscan that happens to produce
the right order (but involves a lot of random page access).  That's
problematic for this case because the whole scan and sort has to happen
before the first result row can be delivered.  Grabbing the remaining
rows would be quite cheap, but since you don't want them, the work
is wasted.

> It's the combination of the following behaviours I find particular perplexing:-
>   1.) Removing the WHERE and ORDER BY clauses results in a very fast query

No surprise.  The query devolves to a seqscan, and you stop after
getting the first row.

>   2.) Adding a LIMIT clause also results in a very fast query.

That would encourage the planner to use a fast-start plan, ie an
indexscan in the right order, which wins here since you stop after
fetching the first row.

The basic problem with the above FOR loop is that you're not giving the
planner any visibility into the fact that you just want the first (few)
rows, so it's going for a plan that will be more efficient for actually
performing the whole query it's been given.  Adding a LIMIT is a good
thing because it fixes that lack of communication.

> The configuration parameter 'cursor_tuple_fraction' is having no observable effect.

You did not say what values you'd tried, but a sufficiently small value
ought to force a similar result.  Whether such a value would be a wise
thing to set globally is a different question.  On the whole I think
adding a LIMIT isn't a bad idea here.

            regards, tom lane


Re: Cost of initiating cursors

From
Matthew Churcher
Date:
Thanks Tom,

That's very enlightening and I really appreciate you taking time to respond.

I've tried cursor_tuple_fraction values as low as 0.0000000001 which by my reckoning should be low enough and also 0
butthe planner is still selecting seqscan-and-sort.  

Kind regards,
Matthew Churcher


----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Matthew Churcher" <matthew.churcher@realvnc.com>
Cc: pgsql-general@postgresql.org
Sent: Thursday, 11 April, 2013 4:12:25 PM
Subject: Re: [GENERAL] Cost of initiating cursors

Matthew Churcher <matthew.churcher@realvnc.com> writes:
> For example:
>   FOR curr_foo
>   IN
>   SELECT foo FROM bar
>     WHERE wibble
>     ORDER BY wobble
>   LOOP
>      EXIT; -- always break out of loop
>   END LOOP;

> For some reason this is hugely expensive and slow regardless of the selected execution plan and available indexes.
TheWHERE and particularly the ORDER BY clause appear to be highly significant despite having appropriate indexes in
place. 

For a full-table query, the planner will frequently decide that a
seqscan-and-sort is cheaper than an indexscan that happens to produce
the right order (but involves a lot of random page access).  That's
problematic for this case because the whole scan and sort has to happen
before the first result row can be delivered.  Grabbing the remaining
rows would be quite cheap, but since you don't want them, the work
is wasted.

> It's the combination of the following behaviours I find particular perplexing:-
>   1.) Removing the WHERE and ORDER BY clauses results in a very fast query

No surprise.  The query devolves to a seqscan, and you stop after
getting the first row.

>   2.) Adding a LIMIT clause also results in a very fast query.

That would encourage the planner to use a fast-start plan, ie an
indexscan in the right order, which wins here since you stop after
fetching the first row.

The basic problem with the above FOR loop is that you're not giving the
planner any visibility into the fact that you just want the first (few)
rows, so it's going for a plan that will be more efficient for actually
performing the whole query it's been given.  Adding a LIMIT is a good
thing because it fixes that lack of communication.

> The configuration parameter 'cursor_tuple_fraction' is having no observable effect.

You did not say what values you'd tried, but a sufficiently small value
ought to force a similar result.  Whether such a value would be a wise
thing to set globally is a different question.  On the whole I think
adding a LIMIT isn't a bad idea here.

            regards, tom lane


Re: Cost of initiating cursors

From
Tom Lane
Date:
Matthew Churcher <matthew.churcher@realvnc.com> writes:
> I've tried cursor_tuple_fraction values as low as 0.0000000001 which by my reckoning should be low enough and also 0
butthe planner is still selecting seqscan-and-sort.  

Hm.  Keep in mind that plpgsql caches query plans, so if you're
examining this by checking the runtime of the function you showed,
you'd need to start a fresh session, set cursor_tuple_fraction, then
try the function to be sure you were seeing the effects of the
parameter change.

            regards, tom lane