Thread: Scrollable cursors and Sort performance

Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 09:32:57

I'm interested in the behaviour of ExecSort, which *for all queries*
prepares the sort result for randomAccess, even when part of a plan that
will *never* go backwards/rewind etc.

A recent performance test shows this output from mid-way through a heap
sort with trace_sort=on  (the query itself is not relevant here)

> LOG:  00000: finished writing final run 65 to tape 64: CPU
57.50s/484.51u sec elapsed 597.46 sec
> LOG:  00000: finished merge step: CPU 107.90s/653.53u sec elapsed
941.83 sec

Which shows that the *unnecessary* final merge takes 344 secs, adding
approximately 60% to the elapsed time of the query and nearly doubling
the CPU requirement.

[Aside: you'll notice the above test was performed with my recent sort
improvement patch applied, but the behaviour of ExecSort is identical in
both cases. However in the current cvstip case, you simply don't notice
the extra expense of the request for randomAccess because of the
additional time taken by the sort]

So, why does the planner think random access is required? Well, only for
use in queries; CREATE INDEX for example does not force this.

To allow support for CURSORs, of course.

>From the code, we never call ExecSort with a direction other than
Forward unless we are issuing a FETCH with a direction other than one
identified internally as FETCH_FORWARD. According to the SQL standard,
that can only happen when a scrollable cursor has been declared using
DECLARE ... SCROLL. The current PostgreSQL manual says the following:
FETCH: "The cursor should be declared with the SCROLL option if one
intends to use any variants of FETCH other than FETCH NEXT or FETCH
FORWARD with a positive count. For simple queries PostgreSQL will allow
backwards fetch from cursors not declared with SCROLL, but this behavior
is best not relied on. If the cursor is declared with NO SCROLL, no
backward fetches are allowed."
DECLARE: "The SCROLL option should be specified when defining a cursor
that will be used to fetch backwards. This is required by the SQL
standard. However, for compatibility with earlier versions, PostgreSQL
will allow backward fetches without SCROLL, if the cursor's query plan
is simple enough that no extra overhead is needed to support it.
However, application developers are advised not to rely on using
backward fetches from a cursor that has not been created with SCROLL. If
NO SCROLL is specified, then backward fetches are disallowed in any
case."

The current behaviour is to plan every query as if it would allow
backwards scans, then immediately disallow backwards scans *if* it fails
the "no extra overhead" test later on in the Declare Cursor processing.
[portalcmds.c:PerformCursorOpen()]. (i.e. you pay, but get no benefit)

My suggestion is that the backwards-compatible behaviour of allowing
backwards/absolute FETCHes *without* a specific SCROLL command be
deprecated in the next release, so that the default is *disallow*. We've
warned people and now its time to turn it off by default. (This would be
re-enabled using default_cursor_scroll = on). If that is not acceptable,
then we should re-evaluate the idea that sorts *always* allow backward
scans [execAmi.c:ExecSupportsBackwardScan()], replacing this either with
*never* or some kind of query cost test (but that seems much less
preferable). Materialize already provides the infrastructure required to
do this.

This will then allow us to use the firm knowledge that a plan will only
ever be scanned in a Forwards direction at plan time. ...and that will
allow us to optimize out the rather large step taken during Sort to
freeze the result unnecessarily for randomAccess. This will then give a
good perfomance gain for larger joins and aggregations, neither of which
would ever allow backwards scans using them current method anyway.

I intend to add a short patch to pass down the cursor state during
planning, so that when it is explicitly specified the sort node is able
to recognise this and avoid work. Also, to add a GUC to force the
not-explicitly-specified case to be the same as the NO SCROLL case, as
the standard requires.

Comments?

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

10 February 2006, 11:13:27

Simon Riggs <simon@2ndquadrant.com> writes:
> Which shows that the *unnecessary* final merge takes 344 secs, adding
> approximately 60% to the elapsed time of the query and nearly doubling
> the CPU requirement.

The merge step would certainly have to happen anyway, so this claim is
not justified.  You have not presented any evidence about how much of
the reported time is actually I/O related.
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 12:12:57

On Fri, 2006-02-10 at 10:13 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Which shows that the *unnecessary* final merge takes 344 secs, adding
> > approximately 60% to the elapsed time of the query and nearly doubling
> > the CPU requirement.
> 
> The merge step would certainly have to happen anyway, so this claim is
> not justified.  You have not presented any evidence about how much of
> the reported time is actually I/O related.

You are right that the last read off disk and merge steps would still be
required, if the full result set were to be read. However, I am pointing
out a task in addition to that. Reading a large file from disk and then
writing it back down *when there is no benefit* seems like a task we
would want to avoid, whatever the size and however (sequential/random)
the I/Os are spent. We need not debate the CPU time differences.

The cited test sorted 1561238 data blocks, or 12.7 GB. Which at 120 MB/s
would take 100 seconds one-way on a stand-alone system. I think a very
large chunk of the elapsed time could reasonably be accounted for from
I/O costs, since it would need to read then write all of that data.

In the test, the post-sort retrieval of rows took 150 secs, indicating a
sequential retrieval rate of 85 MB/sec, so there is no reason to believe
that a slow disk over-emphasised performance loss for this case.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

10 February 2006, 12:22:48

On Fri, Feb 10, 2006 at 01:32:44PM +0000, Simon Riggs wrote:
> I intend to add a short patch to pass down the cursor state during
> planning, so that when it is explicitly specified the sort node is able
> to recognise this and avoid work. Also, to add a GUC to force the
> not-explicitly-specified case to be the same as the NO SCROLL case, as
> the standard requires.

So is this only an issue if you're using a cursor, or does this affect
plain SELECT ... ORDER BY as well?

Reason I'm asking is that users should be able to explicitly be able to
turn the extra step off somehow. I'm not clear if NO SCROLL is
sufficient to do that or not.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 12:49:00

On Fri, 2006-02-10 at 10:22 -0600, Jim C. Nasby wrote:
> On Fri, Feb 10, 2006 at 01:32:44PM +0000, Simon Riggs wrote:
> > I intend to add a short patch to pass down the cursor state during
> > planning, so that when it is explicitly specified the sort node is able
> > to recognise this and avoid work. Also, to add a GUC to force the
> > not-explicitly-specified case to be the same as the NO SCROLL case, as
> > the standard requires.
> 
> So is this only an issue if you're using a cursor, or does this affect
> plain SELECT ... ORDER BY as well?
> 
> Reason I'm asking is that users should be able to explicitly be able to
> turn the extra step off somehow. I'm not clear if NO SCROLL is
> sufficient to do that or not.

It effects all sorts, whether or not they are even cursors.

If a cursor is defined NO SCROLL, which is the SQL Standard implicit
default, then we are safe to assume there will be no rewinds or backward
scans. The PostgreSQL current implicit default is SCROLL, which means
that no part of the executor can currently make useful assumptions about
scan direction, so this is a wider issue than just sorts.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

10 February 2006, 12:58:38

Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2006-02-10 at 10:13 -0500, Tom Lane wrote:
>> The merge step would certainly have to happen anyway, so this claim is
>> not justified.  You have not presented any evidence about how much of
>> the reported time is actually I/O related.

> You are right that the last read off disk and merge steps would still be
> required, if the full result set were to be read. However, I am pointing
> out a task in addition to that. Reading a large file from disk and then
> writing it back down *when there is no benefit* seems like a task we
> would want to avoid, whatever the size and however (sequential/random)
> the I/Os are spent. We need not debate the CPU time differences.

If the cost of avoiding it were zero, then sure.  But propagating the
needed information down to the sort step is not a zero-effort thing,
and therefore I'd like to see an argument for it that's not got obvious
logical holes.

Your analysis of when randomAccess is required needs work, in any case,
since you forgot about ExecReScan.  Not to mention mark/restore.

I also don't care for the proposal to replace Sort with Sort/Materialize
in cases where random access is needed: that will certainly be *slower*
than what we do now.  When you are talking about penalizing some cases
to make other ones faster, you definitely need an argument without holes
in it.

I suspect that the right thing is not to see this as a planner issue at
all, but to try to drive the choice off the context in which the plan
gets invoked.  Possibly we could pass a "need random access" boolean
down through the ExecInitNode calls (I seem to recall some prior
discussion of doing something like that, in the context of telling
Materialize that it could be a no-op in some cases).

Lastly, there isn't any obvious reason that I can see for having to
change the default assumption about cursors.  Most queries don't go
through cursors.  For those that do, we already document that specifying
NO SCROLL can be a performance win, so any app that's not saying that
when it could has only itself to blame.
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 13:46:39

On Fri, 2006-02-10 at 11:58 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2006-02-10 at 10:13 -0500, Tom Lane wrote:
> >> The merge step would certainly have to happen anyway, so this claim is
> >> not justified.  You have not presented any evidence about how much of
> >> the reported time is actually I/O related.
> 
> > You are right that the last read off disk and merge steps would still be
> > required, if the full result set were to be read. However, I am pointing
> > out a task in addition to that. Reading a large file from disk and then
> > writing it back down *when there is no benefit* seems like a task we
> > would want to avoid, whatever the size and however (sequential/random)
> > the I/Os are spent. We need not debate the CPU time differences.
> 
> If the cost of avoiding it were zero, then sure.  But propagating the
> needed information down to the sort step is not a zero-effort thing,
> and therefore I'd like to see an argument for it that's not got obvious
> logical holes.
> 
> I also don't care for the proposal to replace Sort with Sort/Materialize
> in cases where random access is needed: that will certainly be *slower*
> than what we do now.  When you are talking about penalizing some cases
> to make other ones faster, you definitely need an argument without holes
> in it.

That wasn't the initial proposal...

> Your analysis of when randomAccess is required needs work, in any case,
> since you forgot about ExecReScan.  Not to mention mark/restore.

OK - other thoughts there for later also.

> I suspect that the right thing is not to see this as a planner issue at
> all, but to try to drive the choice off the context in which the plan
> gets invoked.  Possibly we could pass a "need random access" boolean
> down through the ExecInitNode calls (I seem to recall some prior
> discussion of doing something like that, in the context of telling
> Materialize that it could be a no-op in some cases).

Yeh, that was me just being a little vague on implementation, but
handing off from planner to executor via the Plan node is what I was
hacking at now. I'll follow your recommendation and do it for the
general case. Propagating it down should allow a few similar
optimizations. 

Any others please shout 'em in everybody.

> Lastly, there isn't any obvious reason that I can see for having to
> change the default assumption about cursors.  Most queries don't go
> through cursors.  For those that do, we already document that specifying
> NO SCROLL can be a performance win, so any app that's not saying that
> when it could has only itself to blame.

The obvious reason is why force people to go out of their way for a
performance win? This is the same as OIDs, AFAICS. Some people used them
in their programs - well fine, they can keep 'em. Most people didn't and
don't and will appreciate having their programs speed up.

Not everybody gets the chance to change the SQL in an application
program, however much they might want to and know they should. Third
party software is most software.

The only way to please both is to have a GUC, whatever it is set to by
default.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Martijn van Oosterhout

Date:

10 February 2006, 14:14:24

On Fri, Feb 10, 2006 at 04:48:42PM +0000, Simon Riggs wrote:
> If a cursor is defined NO SCROLL, which is the SQL Standard implicit
> default, then we are safe to assume there will be no rewinds or backward
> scans. The PostgreSQL current implicit default is SCROLL, which means
> that no part of the executor can currently make useful assumptions about
> scan direction, so this is a wider issue than just sorts.

Umm, the documentation says: PostgreSQL will allow backward fetches
without SCROLL, if the cursor's query plan is simple enough that no
extra overhead is needed to support it.

So if the default is SCROLL someone needs to fix the docs because
that's not what it says. It says that *some plans* can be fetched
backwards even if you don't say scroll. The documentation clearly says
we don't need to support backwards searches without scroll if it
causes problems.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

10 February 2006, 15:04:33

Simon Riggs <simon@2ndquadrant.com> writes:
> Not everybody gets the chance to change the SQL in an application
> program, however much they might want to and know they should. Third
> party software is most software.

Right.  You are proposing to *break* some applications in order to
make other ones faster.  How popular do you think you will be?
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 15:16:54

On Fri, 2006-02-10 at 19:14 +0100, Martijn van Oosterhout wrote:
> On Fri, Feb 10, 2006 at 04:48:42PM +0000, Simon Riggs wrote:
> > If a cursor is defined NO SCROLL, which is the SQL Standard implicit
> > default, then we are safe to assume there will be no rewinds or backward
> > scans. The PostgreSQL current implicit default is SCROLL, which means
> > that no part of the executor can currently make useful assumptions about
> > scan direction, so this is a wider issue than just sorts.
> 
> Umm, the documentation says: PostgreSQL will allow backward fetches
> without SCROLL, if the cursor's query plan is simple enough that no
> extra overhead is needed to support it.
> 
> So if the default is SCROLL someone needs to fix the docs because
> that's not what it says. It says that *some plans* can be fetched
> backwards even if you don't say scroll. The documentation clearly says
> we don't need to support backwards searches without scroll if it
> causes problems.

Changing the docs is not the problem here. I don't understand the point
you are making and how it effects the issue.

The problem is knowing before the sort is executed whether the sort
result will ever be used in the future by a backward scan. We can only
do this by definition, restricting the future use of a FETCH. 

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

10 February 2006, 17:04:43

On Fri, 2006-02-10 at 14:04 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Not everybody gets the chance to change the SQL in an application
> > program, however much they might want to and know they should. Third
> > party software is most software.
> 
> Right.  You are proposing to *break* some applications in order to
> make other ones faster.  How popular do you think you will be?

How does this differ from OIDs?

I'm suggesting that people have a choice.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

11 February 2006, 13:32:32

On Fri, Feb 10, 2006 at 07:16:41PM +0000, Simon Riggs wrote:
> On Fri, 2006-02-10 at 19:14 +0100, Martijn van Oosterhout wrote:
> > On Fri, Feb 10, 2006 at 04:48:42PM +0000, Simon Riggs wrote:
> > > If a cursor is defined NO SCROLL, which is the SQL Standard implicit
> > > default, then we are safe to assume there will be no rewinds or backward
> > > scans. The PostgreSQL current implicit default is SCROLL, which means
> > > that no part of the executor can currently make useful assumptions about
> > > scan direction, so this is a wider issue than just sorts.
> > 
> > Umm, the documentation says: PostgreSQL will allow backward fetches
> > without SCROLL, if the cursor's query plan is simple enough that no
> > extra overhead is needed to support it.
> > 
> > So if the default is SCROLL someone needs to fix the docs because
> > that's not what it says. It says that *some plans* can be fetched
> > backwards even if you don't say scroll. The documentation clearly says
> > we don't need to support backwards searches without scroll if it
> > causes problems.
> 
> Changing the docs is not the problem here. I don't understand the point
> you are making and how it effects the issue.
> 
> The problem is knowing before the sort is executed whether the sort
> result will ever be used in the future by a backward scan. We can only
> do this by definition, restricting the future use of a FETCH. 

I think the point that Martijn was trying to make was that per our docs
it would be perfectly acceptable for us to make any cursor NO SCROLL
implicitly if it means less work for the optimizer.

Whether that's a good thing is a different story... ISTM doing so would
be very confusing for users.

Since this affects all queries with sorts I would definately be in favor
of exposing an option to change this ASAP, even if the default was to
maintain the current behavior for compatability. Disk sorts are hugely
expensive, and anything to reduce that expense would be very welcome.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

11 February 2006, 13:44:40

On Sat, Feb 11, 2006 at 11:32:02AM -0600, Jim C. Nasby wrote:
> I think the point that Martijn was trying to make was that per our docs
> it would be perfectly acceptable for us to make any cursor NO SCROLL
> implicitly if it means less work for the optimizer.

Ok, I take that back. The actual quote[1] is:

"Depending upon the complexity of the query's execution plan, specifying
SCROLL may impose a performance penalty on the query's execution time."

Clearly that says it can affect execution time, not that we're free to
alter the default behavior at will.

But speaking of documentation, it doesn't actually say what the default
is. Care update that, or should I formally submit a patch?

[1] http://www.postgresql.org/docs/8.1/interactive/sql-declare.html
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

Bruce Momjian

Date:

11 February 2006, 13:46:52

Jim C. Nasby wrote:
> On Sat, Feb 11, 2006 at 11:32:02AM -0600, Jim C. Nasby wrote:
> > I think the point that Martijn was trying to make was that per our docs
> > it would be perfectly acceptable for us to make any cursor NO SCROLL
> > implicitly if it means less work for the optimizer.
> 
> Ok, I take that back. The actual quote[1] is:
> 
> "Depending upon the complexity of the query's execution plan, specifying
> SCROLL may impose a performance penalty on the query's execution time."
> 
> Clearly that says it can affect execution time, not that we're free to
> alter the default behavior at will.
> 
> But speaking of documentation, it doesn't actually say what the default
> is. Care update that, or should I formally submit a patch?

Patch please.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

11 February 2006, 15:47:43

On Sat, 2006-02-11 at 11:44 -0600, Jim C. Nasby wrote:
> On Sat, Feb 11, 2006 at 11:32:02AM -0600, Jim C. Nasby wrote:
> > I think the point that Martijn was trying to make was that per our docs
> > it would be perfectly acceptable for us to make any cursor NO SCROLL
> > implicitly if it means less work for the optimizer.
> 
> Ok, I take that back. The actual quote[1] is:
> 
> "Depending upon the complexity of the query's execution plan, specifying
> SCROLL may impose a performance penalty on the query's execution time."
> 
> Clearly that says it can affect execution time, not that we're free to
> alter the default behavior at will.
> 
> But speaking of documentation, it doesn't actually say what the default
> is. Care update that, or should I formally submit a patch?
> 
> [1] http://www.postgresql.org/docs/8.1/interactive/sql-declare.html

Why do that ahead of me making the suggested changes?

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

11 February 2006, 15:58:11

On Sat, Feb 11, 2006 at 07:47:32PM +0000, Simon Riggs wrote:
> On Sat, 2006-02-11 at 11:44 -0600, Jim C. Nasby wrote:
> > On Sat, Feb 11, 2006 at 11:32:02AM -0600, Jim C. Nasby wrote:
> > > I think the point that Martijn was trying to make was that per our docs
> > > it would be perfectly acceptable for us to make any cursor NO SCROLL
> > > implicitly if it means less work for the optimizer.
> > 
> > Ok, I take that back. The actual quote[1] is:
> > 
> > "Depending upon the complexity of the query's execution plan, specifying
> > SCROLL may impose a performance penalty on the query's execution time."
> > 
> > Clearly that says it can affect execution time, not that we're free to
> > alter the default behavior at will.
> > 
> > But speaking of documentation, it doesn't actually say what the default
> > is. Care update that, or should I formally submit a patch?
> > 
> > [1] http://www.postgresql.org/docs/8.1/interactive/sql-declare.html
> 
> Why do that ahead of me making the suggested changes?

Because right now it doesn't say what the actual default is.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

11 February 2006, 20:50:25

> On Sat, 2006-02-11 at 11:44 -0600, Jim C. Nasby wrote:
>> But speaking of documentation, it doesn't actually say what the default
>> is. Care update that, or should I formally submit a patch?
>> 
>> [1] http://www.postgresql.org/docs/8.1/interactive/sql-declare.html

Actually, if you submit a patch that says either "SCROLL is the default"
or "NO SCROLL is the default", it will be rejected as incorrect.  The
reason is that the default behavior is different from either of these,
as is explained in the NOTES section.
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

12 February 2006, 01:22:38

On Sat, Feb 11, 2006 at 07:50:22PM -0500, Tom Lane wrote:
> > On Sat, 2006-02-11 at 11:44 -0600, Jim C. Nasby wrote:
> >> But speaking of documentation, it doesn't actually say what the default
> >> is. Care update that, or should I formally submit a patch?
> >>
> >> [1] http://www.postgresql.org/docs/8.1/interactive/sql-declare.html
>
> Actually, if you submit a patch that says either "SCROLL is the default"
> or "NO SCROLL is the default", it will be rejected as incorrect.  The
> reason is that the default behavior is different from either of these,
> as is explained in the NOTES section.

Ok, so *that's* where the bit about the query plan being simple enough.
Based on that, ISTM that it should be premissable for us to decide that
a cursor requiring a sort isn't "simple enough" to support SCROLL.

In any case, here's a patch that makes the non-standard behavior easier
for people to find.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Attachment

declare.sgml.patch

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

12 February 2006, 12:38:59

On Fri, 2006-02-10 at 17:46 +0000, Simon Riggs wrote:
> > Lastly, there isn't any obvious reason that I can see for having to
> > change the default assumption about cursors.  Most queries don't go
> > through cursors.  For those that do, we already document that specifying
> > NO SCROLL can be a performance win, so any app that's not saying that
> > when it could has only itself to blame.
> 
> The obvious reason is why force people to go out of their way for a
> performance win? This is the same as OIDs, AFAICS. Some people used them
> in their programs - well fine, they can keep 'em. Most people didn't and
> don't and will appreciate having their programs speed up.
> 
> Not everybody gets the chance to change the SQL in an application
> program, however much they might want to and know they should. Third
> party software is most software.
> 
> The only way to please both is to have a GUC, whatever it is set to by
> default.

Further consideration: Since not all queries are cursors, as you say,
the issue is less important, once we patch the executor as discussed.
Maybe we *can* dispense with the GUC?

Currently, ExecSupportsBackwardScan() makes the decision based upon the
node types present in the plan. For an in-memory sort, allowing backward
scans is effectively zero cost. It is only when the sort goes to disk
that it incurs extra overhead that you might wish to avoid.

It would be possible to update the state of the cursor AFTER the sort
has been performed. The cursor's query is not executed until the first
fetch, at which point all meaningful cursor requests can be accomplished
with a forward-only cursor. So it would be possible to make the cursor
NO SCROLL at that point. My thinking is that non-deterministic behaviour
like that is probably not helpful and could still result in some
existing programs breaking.

So, although having a GUC is not the only way, it does seem like the
right thing to do....but secondary to the main executor change.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

26 February 2006, 20:26:21

Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2006-02-10 at 11:58 -0500, Tom Lane wrote:
>> I suspect that the right thing is not to see this as a planner issue at
>> all, but to try to drive the choice off the context in which the plan
>> gets invoked.  Possibly we could pass a "need random access" boolean
>> down through the ExecInitNode calls (I seem to recall some prior
>> discussion of doing something like that, in the context of telling
>> Materialize that it could be a no-op in some cases).

> Yeh, that was me just being a little vague on implementation, but
> handing off from planner to executor via the Plan node is what I was
> hacking at now. I'll follow your recommendation and do it for the
> general case. Propagating it down should allow a few similar
> optimizations. 

Have you done anything with this idea yet?  I was just thinking of
attacking it myself.  After looking at my old notes about Materialize,
I am thinking that we should add a "int flags" parameter to the InitNode
calls along with ExecutorStart and probably PortalStart.  This would
contain a bitwise OR of at least the following flag bits:
need-ReScanneed-backwards-scanneed-mark-restoreno-execute (so flags can replace ExecutorStart's explainOnly param)

We'd have lots of room for expansion, but these are the ones that seem
to have immediate use.  And most callers of ExecutorStart/PortalStart
know they don't need these things, so could just pass zero always.

Interesting point: how should EXPLAIN ANALYZE set these bits?  For its
own purposes it need not request random access, but it might be
interesting to make it possible to examine both the random and nonrandom
behaviors, now that these will be significantly different performancewise.
Possibly we could make EXPLAIN ANALYZE EXECUTE set the random-access
bits.
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

27 February 2006, 10:20:46

On Sun, 2006-02-26 at 19:26 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2006-02-10 at 11:58 -0500, Tom Lane wrote:
> >> I suspect that the right thing is not to see this as a planner issue at
> >> all, but to try to drive the choice off the context in which the plan
> >> gets invoked.  Possibly we could pass a "need random access" boolean
> >> down through the ExecInitNode calls (I seem to recall some prior
> >> discussion of doing something like that, in the context of telling
> >> Materialize that it could be a no-op in some cases).
> 
> > Yeh, that was me just being a little vague on implementation, but
> > handing off from planner to executor via the Plan node is what I was
> > hacking at now. I'll follow your recommendation and do it for the
> > general case. Propagating it down should allow a few similar
> > optimizations. 
> 
> Have you done anything with this idea yet?  I was just thinking of
> attacking it myself.  

Started, but have been side-tracked by other urgent matters.

Happy to complete this today with you? This mini-project has made me
realise I don't understand the executor as well as I should, so I'm keen
to see it through, even if I'm a bit slower at it.

> After looking at my old notes about Materialize,
> I am thinking that we should add a "int flags" parameter to the InitNode
> calls along with ExecutorStart and probably PortalStart.  This would
> contain a bitwise OR of at least the following flag bits:
> 
>     need-ReScan
>     need-backwards-scan
>     need-mark-restore
>     no-execute (so flags can replace ExecutorStart's explainOnly param)
> 
> We'd have lots of room for expansion, but these are the ones that seem
> to have immediate use.  And most callers of ExecutorStart/PortalStart
> know they don't need these things, so could just pass zero always.

Design-wise I was looking at putting a named struc in there, so it would
be easily expandible in the future to carry anything else that needs to
be passed down through the nodes like this. I guess thats the same
line-of-thought you're on too.

> Interesting point: how should EXPLAIN ANALYZE set these bits?  For its
> own purposes it need not request random access, but it might be
> interesting to make it possible to examine both the random and nonrandom
> behaviors, now that these will be significantly different performancewise.
> Possibly we could make EXPLAIN ANALYZE EXECUTE set the random-access
> bits.

Good point. Whichever we do will be wrong in some cases.... I've no real
opinion on this other than a vague preference for it to be quick.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

27 February 2006, 11:07:25

Simon Riggs <simon@2ndquadrant.com> writes:
> On Sun, 2006-02-26 at 19:26 -0500, Tom Lane wrote:
>> After looking at my old notes about Materialize,
>> I am thinking that we should add a "int flags" parameter to the InitNode
>> calls along with ExecutorStart and probably PortalStart.

> Design-wise I was looking at putting a named struc in there, so it would
> be easily expandible in the future to carry anything else that needs to
> be passed down through the nodes like this.

That would be the hard way, primarily because it would require copying
and modifying the struct at each level of recursion --- which'd turn
what should be a nearly zero-cost patch into something with possibly
nontrivial cost.

Copy and modify is needed because as one descends through the plan tree
the requirements change.  For instance, MergeJoin requires mark/restore
capability of its right input, but this will never be a requirement
propagated from the top (or anyplace else).  Materialize on the other
hand should turn off some of the bits, since it won't pass backwards
scan or mark/restore calls down to its child.  These are trivial changes
to implement with a flag-word representation, not so with a struct.

If I saw a need for non-boolean parameters in this structure then maybe
I'd agree, but there's no evidence of a need for them.  What the child
plan nodes need to know is "will I get any mark/restore calls" and such
like, and those are certainly boolean conditions.

I'm envisioning coding like

ExecInitMergeJoin(MergeJoin *node, EState *estate, int flags)
.../* reject unsupported cases */Assert(!(flags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
...innerPlanState(mergestate) = ExecInitNode(innerPlan(node),                                          estate,
                               flags | EXEC_FLAG_MARK);

nodeSort.c would have a test like
node->random = (flags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)) != 0;

etc etc.
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

Simon Riggs

Date:

27 February 2006, 13:45:03

On Mon, 2006-02-27 at 10:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Sun, 2006-02-26 at 19:26 -0500, Tom Lane wrote:
> >> After looking at my old notes about Materialize,
> >> I am thinking that we should add a "int flags" parameter to the InitNode
> >> calls along with ExecutorStart and probably PortalStart.
> 
> > Design-wise I was looking at putting a named struc in there, so it would
> > be easily expandible in the future to carry anything else that needs to
> > be passed down through the nodes like this.
> 
> That would be the hard way, primarily because it would require copying
> and modifying the struct at each level of recursion --- which'd turn
> what should be a nearly zero-cost patch into something with possibly
> nontrivial cost.

Yeh, didn't take me long to see the costs; I just gave that idea up
prior to reading your post. I won't go into *why* I was trying that,
especially since I've stopped...

Following your recipe pretty close as of now.

Best Regards, Simon Riggs

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

27 February 2006, 18:55:48

On Mon, Feb 27, 2006 at 02:17:23PM +0000, Simon Riggs wrote:
> > Interesting point: how should EXPLAIN ANALYZE set these bits?  For its
> > own purposes it need not request random access, but it might be
> > interesting to make it possible to examine both the random and nonrandom
> > behaviors, now that these will be significantly different performancewise.
> > Possibly we could make EXPLAIN ANALYZE EXECUTE set the random-access
> > bits.
> 
> Good point. Whichever we do will be wrong in some cases.... I've no real
> opinion on this other than a vague preference for it to be quick.

Wouldn't an EXPLAIN ANALYZE DECLARE ... have the right information to
know if backward scan, etc was needed?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

27 February 2006, 19:01:25

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, Feb 27, 2006 at 02:17:23PM +0000, Simon Riggs wrote:
>>> Possibly we could make EXPLAIN ANALYZE EXECUTE set the random-access
>>> bits.
>> 
>> Good point. Whichever we do will be wrong in some cases.... I've no real
>> opinion on this other than a vague preference for it to be quick.

> Wouldn't an EXPLAIN ANALYZE DECLARE ... have the right information to
> know if backward scan, etc was needed?

There is no EXPLAIN ANALYZE DECLARE, and AFAICS it would be a
contradiction in terms to have one, since DECLARE doesn't run the query.
Perhaps the correct addition would be EXPLAIN ANALYZE FETCH.  (EXECUTE
is unrelated, now that I think harder about it.)
        regards, tom lane

Re: Scrollable cursors and Sort performance

From

"Jim C. Nasby"

Date:

27 February 2006, 19:12:06

On Mon, Feb 27, 2006 at 06:01:21PM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Mon, Feb 27, 2006 at 02:17:23PM +0000, Simon Riggs wrote:
> >>> Possibly we could make EXPLAIN ANALYZE EXECUTE set the random-access
> >>> bits.
> >> 
> >> Good point. Whichever we do will be wrong in some cases.... I've no real
> >> opinion on this other than a vague preference for it to be quick.
> 
> > Wouldn't an EXPLAIN ANALYZE DECLARE ... have the right information to
> > know if backward scan, etc was needed?
> 
> There is no EXPLAIN ANALYZE DECLARE, and AFAICS it would be a
> contradiction in terms to have one, since DECLARE doesn't run the query.
> Perhaps the correct addition would be EXPLAIN ANALYZE FETCH.  (EXECUTE
> is unrelated, now that I think harder about it.)

You have no idea how glad I am that I'm not the only one who doesn't know about
'new' features (this first appeared in the docs in 7.4)... :)

decibel=# explain analyze declare test cursor for select * from pg_users;                       QUERY PLAN
         
 
----------------------------------------------------------Seq Scan on pg_authid  (cost=0.00..1.01 rows=1 width=79)
Filter:rolcanlogin
 
(2 rows)

So, since presumably that accepts a full cursor declaration, would that suffice
for controlling EXPLAIN ANALYZE?

BTW, ISTM that it would also be useful to have EXPLAIN FETCH, since you
could have already defined a cursor. But I suspect a more common case
would be cut & paste of the declare from application code into psql,
which would make EXPLAIN DECLARE easier to use. Though, I never really
use cursors, so...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scrollable cursors and Sort performance

From

Tom Lane

Date:

27 February 2006, 19:20:40

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Mon, Feb 27, 2006 at 06:01:21PM -0500, Tom Lane wrote:
>> There is no EXPLAIN ANALYZE DECLARE, and AFAICS it would be a
>> contradiction in terms to have one, since DECLARE doesn't run the query.

> You have no idea how glad I am that I'm not the only one who doesn't know about
> 'new' features (this first appeared in the docs in 7.4)... :)

> decibel=# explain analyze declare test cursor for select * from pg_users;
>                         QUERY PLAN                        
> ----------------------------------------------------------
>  Seq Scan on pg_authid  (cost=0.00..1.01 rows=1 width=79)
>    Filter: rolcanlogin
> (2 rows)

Please notice that it didn't run the query (no actual-time data).

Perhaps it would be better if the code raised an error instead of
silently ignoring the ANALYZE keyword.  I think this behavior
was chosen on the grounds that since DECLARE doesn't run the query,
it's consistent for EXPLAIN ANALYZE DECLARE to be a no-op as well.
But it's confusing now that I look at it again.  In any case,
one should clearly need to say FETCH to get a cursor to execute.
        regards, tom lane