Re: Scrollable cursors and Sort performance - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Scrollable cursors and Sort performance
Date
Msg-id 1139593587.1258.586.camel@localhost.localdomain
Whole thread Raw
In response to Re: Scrollable cursors and Sort performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Scrollable cursors and Sort performance
Re: Scrollable cursors and Sort performance
Re: Scrollable cursors and Sort performance
List pgsql-hackers
On Fri, 2006-02-10 at 11:58 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Fri, 2006-02-10 at 10:13 -0500, Tom Lane wrote:
> >> The merge step would certainly have to happen anyway, so this claim is
> >> not justified.  You have not presented any evidence about how much of
> >> the reported time is actually I/O related.
> 
> > You are right that the last read off disk and merge steps would still be
> > required, if the full result set were to be read. However, I am pointing
> > out a task in addition to that. Reading a large file from disk and then
> > writing it back down *when there is no benefit* seems like a task we
> > would want to avoid, whatever the size and however (sequential/random)
> > the I/Os are spent. We need not debate the CPU time differences.
> 
> If the cost of avoiding it were zero, then sure.  But propagating the
> needed information down to the sort step is not a zero-effort thing,
> and therefore I'd like to see an argument for it that's not got obvious
> logical holes.
> 
> I also don't care for the proposal to replace Sort with Sort/Materialize
> in cases where random access is needed: that will certainly be *slower*
> than what we do now.  When you are talking about penalizing some cases
> to make other ones faster, you definitely need an argument without holes
> in it.

That wasn't the initial proposal...

> Your analysis of when randomAccess is required needs work, in any case,
> since you forgot about ExecReScan.  Not to mention mark/restore.

OK - other thoughts there for later also.

> I suspect that the right thing is not to see this as a planner issue at
> all, but to try to drive the choice off the context in which the plan
> gets invoked.  Possibly we could pass a "need random access" boolean
> down through the ExecInitNode calls (I seem to recall some prior
> discussion of doing something like that, in the context of telling
> Materialize that it could be a no-op in some cases).

Yeh, that was me just being a little vague on implementation, but
handing off from planner to executor via the Plan node is what I was
hacking at now. I'll follow your recommendation and do it for the
general case. Propagating it down should allow a few similar
optimizations. 

Any others please shout 'em in everybody.

> Lastly, there isn't any obvious reason that I can see for having to
> change the default assumption about cursors.  Most queries don't go
> through cursors.  For those that do, we already document that specifying
> NO SCROLL can be a performance win, so any app that's not saying that
> when it could has only itself to blame.

The obvious reason is why force people to go out of their way for a
performance win? This is the same as OIDs, AFAICS. Some people used them
in their programs - well fine, they can keep 'em. Most people didn't and
don't and will appreciate having their programs speed up.

Not everybody gets the chance to change the SQL in an application
program, however much they might want to and know they should. Third
party software is most software.

The only way to please both is to have a GUC, whatever it is set to by
default.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 8.0.6 crash
Next
From: Greg Stark
Date:
Subject: Re: PostgreSQL 8.0.6 crash