Thread: Solution for LIMIT cost estimation

Solution for LIMIT cost estimation

From
Tom Lane
Date:
We have discussed in the past the need for the optimizer to take LIMIT
into account when choosing plans.  Currently, since planning is done
on the basis of total plan cost for retrieving all tuples, there's
no way to prefer a "fast start" plan over "slow start".  But that's
what you want if there's a small LIMIT.

Up to now I haven't seen a practical way to fix this, because the
optimizer does its planning bottom-up (start with scan plans, then
make joins, etc) and there's no easy way to know at the bottom of
the pyramid whether you can expect to take advantage of a LIMIT that
exists at the top.  For example, if there's a SORT or GROUP step
in between, you can't apply the LIMIT to the bottom level; but the
bottom guys don't know whether there will be such a step.

I have thought of a fairly clean way to attack this problem, which
is to represent the cost of a plan in two parts instead of only one.
Instead of just "cost", have "startup cost" and "cost per tuple".
(Actually, it'd probably be easier to work with "startup cost" and
"total cost if all tuples are retrieved", but that's a trivial
representational detail; the point is that our cost model will now be
of the form a*N+b when N tuples are retrieved.)  It'd be pretty easy
to produce plausible numbers for all the plan types we use.  Then,
the plan comparators would keep any plan that wins on either startup
or total cost, rather than only considering total cost.  Finally
at the top level of planning, when there is a LIMIT the preferred
plan would be selected by comparing a*LIMIT+b rather than total cost.

I think I can get this done before beta, but before I go into hack-
attack mode I wanted to run it up the flagpole and see if anyone
has any complaints or better ideas.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
"Ross J. Reedstrom"
Date:
Tom - 
This would fit in really well with some ideas I've been working on
with remote table access. Mariposa makes the design decision that
anything might be remote, and the optimizer doesn't get to know what
is and what isn't, so they run the single-site optimization, basically
assuming everything is local, then fragment the plan and ship parts off
for remote execution.

I've been thinking about ways to let the optimizer know about remote
tables, and the capabilites of the server serving the remote table.
I won't fill in the details here.  I'm writing up a proposal for how
this will all work: I've got a telecon today to gauge administrative
support for developing this during work hours, which will _dramatically_
speed up it's implementation ;-)

It does seem to me, however, that the cost of going remote is best
described with the a*N+b model you're describing here: for remote tables,
b will be large, and dominate, unless you're pulling a _lot_ of tuples.

Suffice to say, sounds great to me.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

On Thu, Feb 10, 2000 at 11:48:15AM -0500, Tom Lane wrote:
> We have discussed in the past the need for the optimizer to take LIMIT
> into account when choosing plans.  Currently, since planning is done
> on the basis of total plan cost for retrieving all tuples, there's
> no way to prefer a "fast start" plan over "slow start".  But that's
> what you want if there's a small LIMIT.
> 
> Up to now I haven't seen a practical way to fix this, because the
> optimizer does its planning bottom-up (start with scan plans, then
> make joins, etc) and there's no easy way to know at the bottom of
> the pyramid whether you can expect to take advantage of a LIMIT that
> exists at the top.  For example, if there's a SORT or GROUP step
> in between, you can't apply the LIMIT to the bottom level; but the
> bottom guys don't know whether there will be such a step.
> 
> I have thought of a fairly clean way to attack this problem, which
> is to represent the cost of a plan in two parts instead of only one.
> Instead of just "cost", have "startup cost" and "cost per tuple".
> (Actually, it'd probably be easier to work with "startup cost" and
> "total cost if all tuples are retrieved", but that's a trivial
> representational detail; the point is that our cost model will now be
> of the form a*N+b when N tuples are retrieved.)  It'd be pretty easy
> to produce plausible numbers for all the plan types we use.  Then,
> the plan comparators would keep any plan that wins on either startup
> or total cost, rather than only considering total cost.  Finally
> at the top level of planning, when there is a LIMIT the preferred
> plan would be selected by comparing a*LIMIT+b rather than total cost.
> 
> I think I can get this done before beta, but before I go into hack-
> attack mode I wanted to run it up the flagpole and see if anyone
> has any complaints or better ideas.
> 
>             regards, tom lane
> 
> ************


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
A couple of things occur to me. One is that it sounds like this
proposal could mean that successive SELECTS with LIMIT could
execute completely different plans and therefore return inconsistent
results. For example, let's say I have 26 customers a through z.
My first call to SELECT name from customer limit 3 might return...
a
b
c
and then my next SELECT name from customer limit 3, 3 might return
a
b 
c
again, when I might expect d e f. Of course in this case I could SORT,
but unless name is a unique field that won't work. I could sort on oid,
but that is expensive if I don't really want it. I could use a cursor,
but web pages don't like to do that because you don't know how long you 
may need to keep the cursor open.

In short, I think the fact that limit doesn't alter the plan may
be more of a feature than a bug.

The other thing is, I would like at some stage to change limit so
that it is attached to a SELECT rather than an entire query so
you could...
SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
and I'm not sure how this would interact with that.


Tom Lane wrote:
> 
> We have discussed in the past the need for the optimizer to take LIMIT
> into account when choosing plans.  Currently, since planning is done
> on the basis of total plan cost for retrieving all tuples, there's
> no way to prefer a "fast start" plan over "slow start".  But that's
> what you want if there's a small LIMIT.
> 
> Up to now I haven't seen a practical way to fix this, because the
> optimizer does its planning bottom-up (start with scan plans, then
> make joins, etc) and there's no easy way to know at the bottom of
> the pyramid whether you can expect to take advantage of a LIMIT that
> exists at the top.  For example, if there's a SORT or GROUP step
> in between, you can't apply the LIMIT to the bottom level; but the
> bottom guys don't know whether there will be such a step.
> 
> I have thought of a fairly clean way to attack this problem, which
> is to represent the cost of a plan in two parts instead of only one.
> Instead of just "cost", have "startup cost" and "cost per tuple".
> (Actually, it'd probably be easier to work with "startup cost" and
> "total cost if all tuples are retrieved", but that's a trivial
> representational detail; the point is that our cost model will now be
> of the form a*N+b when N tuples are retrieved.)  It'd be pretty easy
> to produce plausible numbers for all the plan types we use.  Then,
> the plan comparators would keep any plan that wins on either startup
> or total cost, rather than only considering total cost.  Finally
> at the top level of planning, when there is a LIMIT the preferred
> plan would be selected by comparing a*LIMIT+b rather than total cost.
> 
> I think I can get this done before beta, but before I go into hack-
> attack mode I wanted to run it up the flagpole and see if anyone
> has any complaints or better ideas.
> 
>                         regards, tom lane
> 
> ************


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 10:01 AM 2/11/00 +1100, Chris Bitmead wrote:
>
>A couple of things occur to me. One is that it sounds like this
>proposal could mean that successive SELECTS with LIMIT could
>execute completely different plans and therefore return inconsistent
>results. For example, let's say I have 26 customers a through z.
>My first call to SELECT name from customer limit 3 might return...
>a
>b
>c
>and then my next SELECT name from customer limit 3, 3 might return
>a
>b 
>c
>again, when I might expect d e f. Of course in this case I could SORT,
>but unless name is a unique field that won't work.

Well...SQL *is* a set language, and the tuples returned aren't guaranteed
to be returned in the same order from query to query.  The order in
which they're returned is entirely undefined.

You MUST establish an order on the target tuples if you expect to
see them returned in a consistent order.  The RDMS only has to
deliver the tuples that satisfy the query, nothing more.

You aren't guaranteed what you want even with the optimizer the
way it is:

donb=# select * from foo;i 
---12
(2 rows)

donb=# delete from foo where i=1;
DELETE 1
donb=# insert into foo values(1);
INSERT 147724 1
donb=# select * from foo;i 
---21
(2 rows)

This isn't the only way to impact the ordering of delivered 
rows, either.  VACUUM ANALYZE could do it, for instance...

>The other thing is, I would like at some stage to change limit so
>that it is attached to a SELECT rather than an entire query so
>you could...
>SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
>and I'm not sure how this would interact with that.

Since ORDER BY applies to the target row, the rows returned from
the subselect would be in indeterminate order anyway...




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:
> Well...SQL *is* a set language, and the tuples returned aren't guaranteed
> to be returned in the same order from query to query.  The order in
> which they're returned is entirely undefined.

Which would make LIMIT a pretty useless function unless you include
every field in your ORDER BY, otherwise LIMIT returns not defined
results.
To keep strict SET based semantics LIMIT should disallowed unless you
ORDER BY a UNIQUE field, or you ORDER BY with every single field in the
clause.

> You MUST establish an order on the target tuples if you expect to
> see them returned in a consistent order.  The RDMS only has to
> deliver the tuples that satisfy the query, nothing more.
> 
> You aren't guaranteed what you want even with the optimizer the
> way it is:

I know, I know, but the current behaviour is "close enough" for
a lot of applications.

> >The other thing is, I would like at some stage to change limit so
> >that it is attached to a SELECT rather than an entire query so
> >you could...
> >SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
> >and I'm not sure how this would interact with that.
> 
> Since ORDER BY applies to the target row, the rows returned from
> the subselect would be in indeterminate order anyway...

Oh. Well then I'd like ORDER BY in the subselect too :-).


RE: [HACKERS] Solution for LIMIT cost estimation

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> 
> I have thought of a fairly clean way to attack this problem, which
> is to represent the cost of a plan in two parts instead of only one.
> Instead of just "cost", have "startup cost" and "cost per tuple".
> (Actually, it'd probably be easier to work with "startup cost" and
> "total cost if all tuples are retrieved", but that's a trivial
> representational detail; the point is that our cost model will now be
> of the form a*N+b when N tuples are retrieved.)  It'd be pretty easy
> to produce plausible numbers for all the plan types we use.  Then,
> the plan comparators would keep any plan that wins on either startup
> or total cost, rather than only considering total cost.  Finally
> at the top level of planning, when there is a LIMIT the preferred
> plan would be selected by comparing a*LIMIT+b rather than total cost.
>

I have no objection but have a question.

It seems current cost estimation couldn't be converted into a*N+b
form exactly.  For example,the cost of seq scan iscount of pages + count of tuples * cpu_weight + ..
count of pages couldn't be converted into a*N form.
The cost of index scan is more complicated.
I thought of no clear way to treat it when I thought about
this item once.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> ... it sounds like this
> proposal could mean that successive SELECTS with LIMIT could
> execute completely different plans and therefore return inconsistent
> results.
> In short, I think the fact that limit doesn't alter the plan may
> be more of a feature than a bug.

A good point (one I'm embarrassed not to have thought about!) but
I don't think it's a reason not to push forward in this direction.
We have had *way* too many complaints about Postgres not being able
to optimize LIMITed queries, so I'm not willing to ignore the issue;
something's got to be done about it.

As Don B. points out nearby, there's no guarantee anyway that
repeatedly issuing the same query with different LIMIT parameters
will get you consistent results.  The only upright and morally
correct approach is to use DECLARE CURSOR followed by FETCH commands
(all within a transaction of course) in order to get results that
are really all part of a single query.  Now DECLARE CURSOR is also
presently optimized on the basis of fetching the entire result set,
so that is still a problem --- I neglected to mention before that
I was intending to tweak the optimizer to optimize that case like a
moderate-sized LIMIT.

But having said that, I hear what you're saying and I think it's
worth thinking about.  Here are four possible alternative responses:

1. Optimize the query as I sketched previously, but using the "count"
part of the LIMIT spec while deliberately ignoring the "offset".
Then you get consistent results for fetching different chunks of the
query result as long as you use the same count each time.  (And as long
as no one else is changing the DB meanwhile, but we'll take that as
read for each of these choices.)

2. Ignore both the count and offset, and optimize any query containing
a LIMIT clause on the basis of some fixed assumption about what fraction
of the tuples will be fetched (I'm thinking something like 1% to 10%).
This allows different fetch sizes to be used without destroying
consistency, but it falls down on the goal of correctly optimizing
"LIMIT 1" hacks.

3. Use the count+offset for all it's worth, and document that you
can't expect to get consistent results from asking for different
LIMITed chunks of the "same" query unless you use ORDER BY to
ensure consistent ordering of the tuples.

4. Fascist variant of #3: make LIMIT without ORDER BY be an error.

SQL92 does not define LIMIT at all, so it's not much help in
deciding what to do.  Is there anything in SQL3?  What do other
DBMSes do about this issue?  Comments, other variants, better ideas
anyone?

> The other thing is, I would like at some stage to change limit so
> that it is attached to a SELECT rather than an entire query so
> you could...
> SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
> and I'm not sure how this would interact with that.

Since ORDER BY is only allowed at the top level by SQL92, there
would be no way for the user to ensure predictable results from
such a query.  I think that'd be a dangerous path to go down.
However, if you had an answer that ensured consistent results from
queries with sub-LIMITs, I don't see that there'd be any problem
with allowing the optimizer to optimize 'em.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> It seems current cost estimation couldn't be converted into a*N+b
> form exactly.  For example,the cost of seq scan is
>     count of pages + count of tuples * cpu_weight + ..
> count of pages couldn't be converted into a*N form.
> The cost of index scan is more complicated.

It would not be an exact conversion, because the cost of a query is
clearly *not* a perfectly linear function of the number of tuples
retrieved before stopping.  Another example, besides the ones you
mention, is that a nested-loop join will suffer a "hiccup" in
output rate each time it restarts the inner scan, if the inner scan
is of a kind that has nontrivial startup cost.  But I believe that
this effect is not very significant compared to all the other
approximations the optimizer already has to make.

Basically, my proposal is that the plan cost estimation routines
provide a "startup cost" (cost expended to retrieve the first
tuple) in addition to the "total cost" they already estimate.
Then, upper-level planning routines that want to estimate the cost
of fetching just part of the query result will estimate that cost
by linear interpolation between the startup cost and the total
cost.  Sure, it's just a rough approximation, but it'll be a long
time before that's the worst error made by the planner ;-)
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Philip Warner
Date:
At 22:52 10/02/00 -0500, Tom Lane wrote:
>
>But having said that, I hear what you're saying and I think it's
>worth thinking about.  Here are four possible alternative responses:
>

Another option is to do what Dec/Rdb does, and allow either optimizer hints
in a saved plan, or via modified SQL (less portable):
   select * from foo limit 1 row optimize for fast first;


I also have a question: does the optimizer know about relevant indexes when
it is trying to return an ordered result set? If not, then 'fast first'
retrieval may be substantially improved by including such knowledge.

ie.
   select * from foo order by f1,f2 limit 1 row;

should be trivial if there is an index on (f1,f2).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Another option is to do what Dec/Rdb does, and allow either optimizer hints
> in a saved plan, or via modified SQL (less portable):

>     select * from foo limit 1 row optimize for fast first;

The former is not going to happen in time for 7.0, and the latter is
not entirely palatable --- we are trying to become more SQL-spec-
compliant, not less...

> I also have a question: does the optimizer know about relevant indexes when
> it is trying to return an ordered result set? If not, then 'fast first'
> retrieval may be substantially improved by including such knowledge.

It does know about indexes.  The problem is that it is making planning
choices on the basis of cost to retrieve the entire ordered result set,
which leads to pessimal planning when you don't really want any such
thing.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
For my own curiousity, how does the presence of limit affect a plan
anyway?


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 10:52 PM 2/10/00 -0500, Tom Lane wrote:

>4. Fascist variant of #3: make LIMIT without ORDER BY be an error.
>
>SQL92 does not define LIMIT at all, so it's not much help in
>deciding what to do.  Is there anything in SQL3?  What do other
>DBMSes do about this issue?  Comments, other variants, better ideas
>anyone?

Well ... for my money I never expected LIMIT to be meaningful in
the sense of being deterministic without an ORDER BY clause.

But ... that doesn't mean that some folks might not want to use
it differently.  What if LIMIT 2 were more efficient that COUNT(*)
in order to determine if more than one row satisfies a condition?

I don't know if that's even a remote possibility given the current
implementation, but it is an example where a non-deterministic
tuple ordering might not matter.

But I wouldn't feel badly at all if LIMIT limited to queries
with ORDER BY.  I think this could be done gramatically, i.e.

[query] ORDER BY 

is the SQL paradign, and you'd just hang LIMIT on ORDER BY (or
more properly at the same grammar level allow them in any order).

[ORDER BY | LIMIT clause]*

in one form of pseudo-grammar, with appropriate semantic checking
so you can't say ORDER BY .. ORDER BY ...


>
>> The other thing is, I would like at some stage to change limit so
>> that it is attached to a SELECT rather than an entire query so
>> you could...
>> SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20;
>> and I'm not sure how this would interact with that.
>
>Since ORDER BY is only allowed at the top level by SQL92, there
>would be no way for the user to ensure predictable results from
>such a query.  I think that'd be a dangerous path to go down.

Yep.

>However, if you had an answer that ensured consistent results from
>queries with sub-LIMITs, I don't see that there'd be any problem
>with allowing the optimizer to optimize 'em.

No, it's not an optimizer problem.  



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> For my own curiousity, how does the presence of limit affect a plan
> anyway?

At the moment, it doesn't.  But it should.  To take an extreme example:
SELECT * FROM table WHERE x > 100 ORDER BY x LIMIT 1;

to get the tuple with lowest x > 100.  Assuming that there is an index
on x, the right way to implement this is with an indexscan, because a
single probe into the index will pull out the tuple you want.  But right
now the optimizer will choose a plan as if the LIMIT weren't there,
ie on the basis of estimated total cost to retrieve the whole ordered
result set.  On that basis it might well choose sequential scan + sort,
so you'd have to wait around for a sort to complete before you get your
answer.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
> Well ... for my money I never expected LIMIT to be meaningful in
> the sense of being deterministic without an ORDER BY clause.

> But ... that doesn't mean that some folks might not want to use
> it differently.  What if LIMIT 2 were more efficient that COUNT(*)
> in order to determine if more than one row satisfies a condition?

Hmm, that's an excellent example indeed.  A slight variant that is
even more plausible is LIMIT 1 when you just want to know if there
is any tuple satisfying the WHERE condition, and you don't really
care about which one you get.

> I don't know if that's even a remote possibility given the current
> implementation,

Absolutely --- COUNT(*) doesn't provide any way of stopping early,
so a LIMITed query could be far faster.  Given an appropriate plan
of course.  The problem right now is that the optimizer is quite
likely to pick the wrong plan.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
"Ross J. Reedstrom"
Date:
On Thu, Feb 10, 2000 at 10:52:12PM -0500, Tom Lane wrote:
> 
> SQL92 does not define LIMIT at all, so it's not much help in
> deciding what to do.  Is there anything in SQL3?  What do other
> DBMSes do about this issue?  Comments, other variants, better ideas
> anyone?
> 

I know I'm getting in on this late, but I thought I'd answer this.
The SQL92 draft only mentions LIMIT in the list of reserved words,
and once in the index, pointing to a page on lexical elements of SQL.

the SQL3 draft that Chris pointed me at (Aug94) only mentions LIMIT as a
limit clause of a RECURSIVE UNION, whatever that is. (No time to examine
it right now) This is from the file sql-foundation-aug94.txt.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005


Re: [HACKERS] Solution for LIMIT cost estimation

From
Hannu Krosing
Date:
"Ross J. Reedstrom" wrote:
> 
> 
> the SQL3 draft that Chris pointed me at (Aug94) only mentions LIMIT as a
> limit clause of a RECURSIVE UNION, whatever that is. (No time to examine
> it right now) This is from the file sql-foundation-aug94.txt.
> 

If I understood it right, RECURSIVE UNION is a way to query a tree
structured 
table, whith parent_id's in each child row.

AFAIK even have it in the TODO list ;)

The LIMIT there probably limits how many levels down the tree are
queried.

---------
Hannu


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:

> But ... that doesn't mean that some folks might not want to use
> it differently.  What if LIMIT 2 were more efficient that COUNT(*)
> in order to determine if more than one row satisfies a condition?

select count(*) > 1 from a;

And if that's not efficient, why not optimise _that_, since it 
expresses directly what you want?

> But I wouldn't feel badly at all if LIMIT limited to queries
> with ORDER BY.  I think this could be done gramatically, i.e.
> 
> [query] ORDER BY

If you are going to limit it thus, it only makes sense if you
either order by a unique key or order by every single column.
Otherwise, why limit it at all? And that can't be determined
gramatically.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris
Date:
Tom Lane wrote:
> 
>         SELECT * FROM table WHERE x > 100 ORDER BY x LIMIT 1;

Could it _ever_ be faster to sort the tuples when there is already an
index that can provide them in sorted order?


> 
> to get the tuple with lowest x > 100.  Assuming that there is an index
> on x, the right way to implement this is with an indexscan, because a
> single probe into the index will pull out the tuple you want.  But right
> now the optimizer will choose a plan as if the LIMIT weren't there,
> ie on the basis of estimated total cost to retrieve the whole ordered
> result set.  On that basis it might well choose sequential scan + sort,
> so you'd have to wait around for a sort to complete before you get your
> answer.
> 
>                         regards, tom lane

-- 
Chris Bitmead
mailto:chris@bitmead.com


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris
Date:
Hannu Krosing wrote:

> If I understood it right, RECURSIVE UNION is a way to query a tree
> structured
> table, whith parent_id's in each child row.
> 
> AFAIK even have it in the TODO list ;)
> 
> The LIMIT there probably limits how many levels down the tree are
> queried.

Original postgres used to be able to do this. The syntax 
"retrieve* from xxx" would keep executing (eg traversing a tree) until
complete. Might be worth checking the original sources when you come to
do this.

-- 
Chris Bitmead
mailto:chris@bitmead.com


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 11:07 PM 2/13/00 +1100, Chris wrote:
>Tom Lane wrote:
>> 
>>         SELECT * FROM table WHERE x > 100 ORDER BY x LIMIT 1;
>
>Could it _ever_ be faster to sort the tuples when there is already an
>index that can provide them in sorted order?

That's yet another optimization.  Working on optimizing the execution
of language constructs, whether statement oriented like C or set 
oriented like SQL, is largely a matter of accretion.  Just because
you can make the case with index run fast doesn't mean you don't
want to consider the case where an index isn't available.

I think you're on the losing end of this one, Chris.  In essence
you're asking that the optimizer not take advantage of the
set-oriented, non-ordered nature of SQL queries in order to make
your non-portable code easier to right.

Tom's example is only one instance where fully exploiting the 
fact that values returned by queries are unordered.  I don't think
we can really live with the restriction that queries must always
return tuples in the same order.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Hannu Krosing
Date:
Chris wrote:
> 
> Tom Lane wrote:
> >
> >         SELECT * FROM table WHERE x > 100 ORDER BY x LIMIT 1;
> 
> Could it _ever_ be faster to sort the tuples when there is already an
> index that can provide them in sorted order?

This has been discussed on this list several times, and it appears that
select+sort is quite often faster than index scan, mainly due to the fact 
that tables live on disk and disk accesses are expensive, and when doing 
index scans:

1- you have to scan two files (index and data), when they are on the same   disk it is much more 2 times slower than
sacnninga single file even  when doing it sequentially
 

2- scans on the both files are random access, so seek and latency times   come into play and readahead is useless

3- you often read the same data page many times

-------------
Hannu


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Chris <chris@bitmead.com> writes:
> Could it _ever_ be faster to sort the tuples when there is already an
> index that can provide them in sorted order?

Yes --- in fact usually, for large tables.  Once your table gets too
big for the disk cache to be effective, indexscan performance approaches
one random-access page fetch per tuple.  Sort performance behaves more
or less as p*log(p) accesses for p pages; and a far larger proportion
of those accesses are sequential than in the indexscan case.  So the
sort will win if you have more than log(p) tuples per page.  Do the
arithmetic...

The optimizer's job would be far simpler if no-brainer rules like
"indexscan is always better" worked.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> Don Baccus wrote:
>> But ... that doesn't mean that some folks might not want to use
>> it differently.  What if LIMIT 2 were more efficient that COUNT(*)
>> in order to determine if more than one row satisfies a condition?

> select count(*) > 1 from a;

> And if that's not efficient, why not optimise _that_, since it 
> expresses directly what you want?

Practicality, mostly.  To do it that way, the optimizer would have
to have extremely specific hard-wired knowledge about the behavior
of count() (which flies in the face of Postgres' open-ended approach
to aggregate functions); furthermore it would have to examine every
query to see if there is a count() - inequality operator - constant
clause placed in such a way that no other result need be delivered
by the query.  That's a lot of mechanism and overhead to extract the
same information that is provided directly by LIMIT; and it doesn't
eliminate the need for LIMIT, since this is only one application
for LIMIT (not even the primary one IMHO).

I have currently got it working (I think; not too well tested yet)
using the proposal I offered before of "pay attention to the size
of LIMIT, but ignore OFFSET", so that the same query plan will be
derived from similar queries with different OFFSETs.  Does anyone
have a substantial gripe with that compromise?
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 12:13 PM 2/13/00 -0500, Tom Lane wrote:
>Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
>> Don Baccus wrote:
>>> But ... that doesn't mean that some folks might not want to use
>>> it differently.  What if LIMIT 2 were more efficient that COUNT(*)
>>> in order to determine if more than one row satisfies a condition?
>
>> select count(*) > 1 from a;
>
>> And if that's not efficient, why not optimise _that_, since it 
>> expresses directly what you want?
>
>Practicality, mostly.  To do it that way, the optimizer would have
>to have extremely specific hard-wired knowledge about the behavior
>of count() (which flies in the face of Postgres' open-ended approach
>to aggregate functions);

Actually, the aggregate interface could pass in a predicate test that
the aggregate function could use to say "stop" once it knows that
the result of the predicate will be true at the end of the query.

Of the standard aggregates, "count()" is probably the only one that
could make use of it.  And of course only rarely is count() used
in such a way.

As someone who has long made his living implementing optimizing
compilers, I don't think that optimizing expressions such as the
one Chris mentions is all that difficult a task.

But there are far more important things to think about implementing
in Postgres.

>I have currently got it working (I think; not too well tested yet)
>using the proposal I offered before of "pay attention to the size
>of LIMIT, but ignore OFFSET", so that the same query plan will be
>derived from similar queries with different OFFSETs.  Does anyone
>have a substantial gripe with that compromise?

Not me, that's for sure.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 06:24 PM 2/13/00 +0200, Hannu Krosing wrote:
>Chris wrote:
>> 
>> Tom Lane wrote:
>> >
>> >         SELECT * FROM table WHERE x > 100 ORDER BY x LIMIT 1;
>> 
>> Could it _ever_ be faster to sort the tuples when there is already an
>> index that can provide them in sorted order?

>This has been discussed on this list several times, and it appears that
>select+sort is quite often faster than index scan, mainly due to the fact 
>that tables live on disk and disk accesses are expensive, and when doing 
>index scans:
>
>1- you have to scan two files (index and data), when they are on the same 
>   disk it is much more 2 times slower than sacnning a single file even
>   when doing it sequentially
>
>2- scans on the both files are random access, so seek and latency times 
>   come into play and readahead is useless
>
>3- you often read the same data page many times

Hmmm...yet any studly Oracle type knows that despite whatever veracity
this analysis has, in reality Oracle will utilize the index in the
manner suggested by Chris and the difference in execution time is,
well, astonishing.   Even without the limit.

We just had a discussion regarding this a few days ago over on
Philip Greenspun's web/db forum, where someone ran into a situation
where Oracle didn't recognize that the index could be used (involving
function calls, where presently Oracle doesn't dig into the parameter
list and to look to see if the referenced columns are indexed when
doing its optimization).  After tricking Oracle into using the index
by adding an additional column reference, he got a speedup of well
over an order of magnitude.

Again, with no limit clause (which Oracle doesn't implement anyway).



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 11:53 AM 2/13/00 -0500, Tom Lane wrote:
>Chris <chris@bitmead.com> writes:
>> Could it _ever_ be faster to sort the tuples when there is already an
>> index that can provide them in sorted order?
>
>Yes --- in fact usually, for large tables.  Once your table gets too
>big for the disk cache to be effective, indexscan performance approaches
>one random-access page fetch per tuple.  Sort performance behaves more
>or less as p*log(p) accesses for p pages; and a far larger proportion

>of those accesses are sequential than in the indexscan case.  So the
>sort will win if you have more than log(p) tuples per page.  Do the
>arithmetic...
>
>The optimizer's job would be far simpler if no-brainer rules like
>"indexscan is always better" worked.

Yet the optimizer currently takes the no-brainer point-of-view that
"indexscan is slow for tables much larger than the disk cache, therefore
treat all tables as though they're much larger than the disk cache".

The name of the game in the production database world is to do
everything possible to avoid a RAM bottleneck, while the point
of view PG is taking seems to be that RAM is always a bottleneck.

This presumption is far too pessimistic.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Peter Eisentraut
Date:
On 2000-02-10, Tom Lane mentioned:

> 4. Fascist variant of #3: make LIMIT without ORDER BY be an error.

Got my vote for that. At least make it a notice: "NOTICE:  LIMIT without
ORDER BY results in random data being returned" -- That'll teach 'em. ;)

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:

> >> select count(*) > 1 from a;
> >
> >> And if that's not efficient, why not optimise _that_, since it
> >> expresses directly what you want?
> >
> >Practicality, mostly.  To do it that way, the optimizer would have
> >to have extremely specific hard-wired knowledge about the behavior
> >of count() (which flies in the face of Postgres' open-ended approach
> >to aggregate functions);
> 
> Actually, the aggregate interface could pass in a predicate test that
> the aggregate function could use to say "stop" once it knows that
> the result of the predicate will be true at the end of the query.

That's the kind of thing I had in mind.

> Of the standard aggregates, "count()" is probably the only one that
> could make use of it.  And of course only rarely is count() used
> in such a way.

I think a lot of the agregates could make use of it. For example, tell
me all the departments who have spent more than $1000,000 this year...

select deptid, sum(amount) > 1000000 from purchases group by deptid;

> 
> As someone who has long made his living implementing optimizing
> compilers, I don't think that optimizing expressions such as the
> one Chris mentions is all that difficult a task.
> 
> But there are far more important things to think about implementing
> in Postgres.

Yep.

> 
> >I have currently got it working (I think; not too well tested yet)
> >using the proposal I offered before of "pay attention to the size
> >of LIMIT, but ignore OFFSET", so that the same query plan will be
> >derived from similar queries with different OFFSETs.  Does anyone
> >have a substantial gripe with that compromise?
> 
> Not me, that's for sure.
> 
> - Don Baccus, Portland OR <dhogaza@pacifier.com>
>   Nature photos, on-line guides, Pacific Northwest
>   Rare Bird Alert Service and other goodies at
>   http://donb.photo.net.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Tom Lane wrote:

> I have currently got it working (I think; not too well tested yet)
> using the proposal I offered before of "pay attention to the size
> of LIMIT, but ignore OFFSET", so that the same query plan will be
> derived from similar queries with different OFFSETs.  Does anyone
> have a substantial gripe with that compromise?

Would offset be any use if you did make use of it?


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 10:11 AM 2/14/00 +1100, Chris Bitmead wrote:

>> Of the standard aggregates, "count()" is probably the only one that
>> could make use of it.  And of course only rarely is count() used
>> in such a way.
>
>I think a lot of the agregates could make use of it. For example, tell
>me all the departments who have spent more than $1000,000 this year...
>
>select deptid, sum(amount) > 1000000 from purchases group by deptid;

This would be harder, because you could only guarantee that sum is
of all positive or negative numbers if the user provides a constraint.

>> As someone who has long made his living implementing optimizing
>> compilers, I don't think that optimizing expressions such as the
>> one Chris mentions is all that difficult a task.
>> 
>> But there are far more important things to think about implementing
>> in Postgres.
>
>Yep.

Good, because I was about to repeat myself :)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Don Baccus <dhogaza@pacifier.com> writes:
>> The optimizer's job would be far simpler if no-brainer rules like
>> "indexscan is always better" worked.

> Yet the optimizer currently takes the no-brainer point-of-view that
> "indexscan is slow for tables much larger than the disk cache, therefore
> treat all tables as though they're much larger than the disk cache".

Ah, you haven't seen the (as-yet-uncommitted) optimizer changes I'm
working on ;-)

What I still lack is a believable approximation curve for cache hit
ratio vs. table-size-divided-by-cache-size.  Anybody seen any papers
about that?  I made up a plausible-shaped function but it'd be nice to
have something with some actual theory or measurement behind it...
(Of course the cache size is only a magic number in the absence of any
hard info about what the kernel is doing --- but at least it will
optimize big tables differently than small ones now.)
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 06:43 PM 2/13/00 -0500, Tom Lane wrote:

>Ah, you haven't seen the (as-yet-uncommitted) optimizer changes I'm
>working on ;-)

Very good!

>What I still lack is a believable approximation curve for cache hit
>ratio vs. table-size-divided-by-cache-size.  Anybody seen any papers
>about that?  I made up a plausible-shaped function but it'd be nice to
>have something with some actual theory or measurement behind it...
>
>(Of course the cache size is only a magic number in the absence of any
>hard info about what the kernel is doing --- but at least it will
>optimize big tables differently than small ones now.)

If you've got the memory and allocate sufficient space to shared
buffers and still have plenty of kernel cache space left over, the
optimizer can hardly be over optimistic - things will fly! :)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Suggestion to split /data/base directory

From
Chairudin Sentosa
Date:
Hi postgresql hackers,

I have a suggestion that might improve the performance of
postgresql.
This is regarding the directory structure of /data/base.
The current situation is that every database has one
directory, ie. "mydb", so you will have /data/base/mydb directory.
All the data files, index files, etc are in the same
/data/base/mydb directory.

If I want to split data files and index files to different hardisk, it
is not possible right now.
The only solution right now to improve the performance is to use RAID
method.

My suggestion is to split files into 4 different directories:
/data/base/mydb/data
/data/base/mydb/index
/data/base/mydb/dictionary
/data/base/mydb/tmp

So I can put each directory on different hardisk, so I can
have 4 hardisks for 'mydb' database.

Is it doable and a good idea?

Regards,
Chai


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> Tom Lane wrote:
>> I have currently got it working (I think; not too well tested yet)
>> using the proposal I offered before of "pay attention to the size
>> of LIMIT, but ignore OFFSET", so that the same query plan will be
>> derived from similar queries with different OFFSETs.  Does anyone
>> have a substantial gripe with that compromise?

> Would offset be any use if you did make use of it?

Yes, because the number of tuples that will *actually* get fetched
is offset+limit.  If you had a large offset so that the tuples
getting returned were from somewhere near the end of the query,
then choosing a fast-start algorithm would be a Bad Idea; you'd
really want a plan that optimizes on the basis of total cost
rather than startup cost.

Hmm, I'm on the verge of talking myself out of the compromise ;-).
I'm not sure how many people will really use large offsets, but
anyone who does might be a pretty unhappy camper.  If you're asking
for OFFSET 1000000 LIMIT 1, the thing might pick a nested loop
which is exceedingly fast-start ... but also exceedingly expensive
when you go ahead and fetch many tuples anyway.

Perhaps we should stick to two alternatives:

1. If LIMIT is present, optimize on an assumption that X% of the
tuples are fetched, where X does *not* depend on the specific
values given for OFFSET or LIMIT.  (But we could make X a settable
parameter...)

2. Optimize using OFFSET+LIMIT as the expected number of tuples to
fetch.  Document that varying OFFSET or LIMIT will not necessarily
generate consistent results unless you specify ORDER BY to force a
consistent tuple order.

I don't really like #1, but I can see where #2 might cause some
unhappiness as well.  Comments, opinions?
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Tom Lane wrote:
> 
> Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
> > Tom Lane wrote:
> >> I have currently got it working (I think; not too well tested yet)
> >> using the proposal I offered before of "pay attention to the size
> >> of LIMIT, but ignore OFFSET", so that the same query plan will be
> >> derived from similar queries with different OFFSETs.  Does anyone
> >> have a substantial gripe with that compromise?
> 
> > Would offset be any use if you did make use of it?
> 
> Yes, because the number of tuples that will *actually* get fetched
> is offset+limit.  If you had a large offset so that the tuples
> getting returned were from somewhere near the end of the query,
> then choosing a fast-start algorithm would be a Bad Idea; you'd
> really want a plan that optimizes on the basis of total cost
> rather than startup cost.
> Hmm, I'm on the verge of talking myself out of the compromise ;-).
> I'm not sure how many people will really use large offsets, but
> anyone who does might be a pretty unhappy camper.  If you're asking
> for OFFSET 1000000 LIMIT 1, the thing might pick a nested loop
> which is exceedingly fast-start ... but also exceedingly expensive
> when you go ahead and fetch many tuples anyway.
> 
> Perhaps we should stick to two alternatives:
> 
> 1. If LIMIT is present, optimize on an assumption that X% of the
> tuples are fetched, where X does *not* depend on the specific
> values given for OFFSET or LIMIT.  (But we could make X a settable
> parameter...)
> 
> 2. Optimize using OFFSET+LIMIT as the expected number of tuples to
> fetch.  Document that varying OFFSET or LIMIT will not necessarily
> generate consistent results unless you specify ORDER BY to force a
> consistent tuple order.
> 
> I don't really like #1, but I can see where #2 might cause some
> unhappiness as well.  Comments, opinions?

I agree you should probably go the whole hog one way or the other. I
think
ignoring offset+limit is a useful option, but like I said at the
beginning, it doesn't bother me _that_ much.


Re: [HACKERS] Suggestion to split /data/base directory

From
Don Baccus
Date:
At 09:47 AM 2/14/00 +0700, Chairudin Sentosa wrote:

>My suggestion is to split files into 4 different directories:
>/data/base/mydb/data
>/data/base/mydb/index
>/data/base/mydb/dictionary
>/data/base/mydb/tmp

My preference would be for a simplistic "create tablespace" construct,
so location information could be captured within the database itself.

We've had discussions about this in the past and there seems to be
some recognition that the ability to spread stuff around disk drives
might be useful.  I mean, all those commercial sites that do it after
measuring their bottlenecks can't ALL be wrong, right?


>
>So I can put each directory on different hardisk, so I can
>have 4 hardisks for 'mydb' database.

You can already do this in an ugly fashion, by moving individual
files via links (ln -s).  ls *idx*, that kind of thing to find
your index tables (if you suffix them with "idx", then move and
ln to them.

>
>Is it doable and a good idea?

Doable, but IMO a bad idea because it lowers the motivation for doing
a relatively simple CREATE TABLESPACE hack that gives even more 
flexibility, and allows the db user to query where their tables
are stored within the db rather than depend on "ls".



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 03:32 PM 2/14/00 +1100, Chris Bitmead wrote:

>I agree you should probably go the whole hog one way or the other. I
>think
>ignoring offset+limit is a useful option, but like I said at the
>beginning, it doesn't bother me _that_ much.

It should bother you that folks who understand how SQL works might
be penalized in order to insulate the fact that those who don't know
how SQL works from an understanding of their own ignorance...

Shouldn't we be more concerned with folks who bother to read an
SQL primer?  Or Oracle or Informix docs on SQL?



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 11:24 PM 2/13/00 -0500, Tom Lane wrote:
>Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes:
>> Tom Lane wrote:
>>> I have currently got it working (I think; not too well tested yet)
>>> using the proposal I offered before of "pay attention to the size
>>> of LIMIT, but ignore OFFSET", so that the same query plan will be
>>> derived from similar queries with different OFFSETs.  Does anyone
>>> have a substantial gripe with that compromise?
>
>> Would offset be any use if you did make use of it?
>
>Yes, because the number of tuples that will *actually* get fetched
>is offset+limit.

Bravo, you're on it!  I resisted responding...so far this thread
is your's, baby.

>2. Optimize using OFFSET+LIMIT as the expected number of tuples to
>fetch.  Document that varying OFFSET or LIMIT will not necessarily
>generate consistent results unless you specify ORDER BY to force a
>consistent tuple order.
>
>I don't really like #1, but I can see where #2 might cause some
>unhappiness as well.  Comments, opinions?

Anyone unhappy about #2 doesn't really understand the SQL model.

My suggestion's pretty simple - the database world is full of folks
who are professionals and who understand the SQL model.  

We shouldn't penalize them for their professionalism.

Those who don't understand the SQL model should read the docmentation
you mention, of course, but the very fact that SQL doesn't impose
an ordering on the returned tuples is so basic to the language that
if they don't understand it, the doc should also recommend them to
"set theory for dummies" and "SQL for dummies" (unless the latter
was actually written by a dummy). 

In my narrow-minded compiler-writer space, it is not MY PROBLEM if
people don't bother learning the language they use.  I may or may
not choose to become one who teaches the language, but whether or not
I do has nothing to do with the implementation of the language.

It is perfectly fair to presume people understand the language.  It
is their job to learn it.

If they're surprised by how the language works, then they should've
considered buying an SQL book, all of which that I've seen EMPHASIZE
the set orientation, and non-orderedness of queries.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:
> 
> At 03:32 PM 2/14/00 +1100, Chris Bitmead wrote:
> 
> >I agree you should probably go the whole hog one way or the other. I
> >think
> >ignoring offset+limit is a useful option, but like I said at the
> >beginning, it doesn't bother me _that_ much.
> 
> It should bother you that folks who understand how SQL works might
> be penalized in order to insulate the fact that those who don't know
> how SQL works from an understanding of their own ignorance...
> 
> Shouldn't we be more concerned with folks who bother to read an
> SQL primer?  Or Oracle or Informix docs on SQL?

LIMIT is not SQL, both as a technical fact, and philosophically
because it reaches outside of set theory. What LIMIT does without
ORDER BY is non-deterministic, and therefore a subjective matter of
what is the most useful: a faster answer, or a more consistant answer.
My predudices are caused by what I use PostgreSQL for, which is
more favourable to the latter.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Philip Warner
Date:
At 23:24 13/02/00 -0500, Tom Lane wrote:
>
>Perhaps we should stick to two alternatives:
>
>1. If LIMIT is present, optimize on an assumption that X% of the
>tuples are fetched, where X does *not* depend on the specific
>values given for OFFSET or LIMIT.  (But we could make X a settable
>parameter...)
>
>2. Optimize using OFFSET+LIMIT as the expected number of tuples to
>fetch.  Document that varying OFFSET or LIMIT will not necessarily
>generate consistent results unless you specify ORDER BY to force a
>consistent tuple order.
>
>I don't really like #1, but I can see where #2 might cause some
>unhappiness as well.  Comments, opinions?

#1 seems pretty nasty as a concept, unless of course this actually reflects
the way that PG retrieves rows. My guess is that it will have to retrieve
rows 1 to (offset + limit), not (offset) to (offset + limit), so the whole
appreximation should be based on #2. 

[Aside: I suspect that trying to solve problems for people who want to use
context free (web) interfaces to retrieve blocks of rows is not a job for
the optimizer. It is far more suited to cursors and/or local temporary
tables, both of which require some context].

#2 seems more correct, in that it reflects a good estimation, but
pessimistic: with good indexes defined, the query may well only need to do
a scan of the index to get up to the 'offset-th' row. This, I am sure, must
be faster than retrieving all rows up to OFFSET.

This leaves two questions:

a. Does the optimizer know how to do 'index-only' queries (where all fields
are satisfied by the index)

b. Just to clarify, OFFSET does affect the tuples actually returned,
doesn't it?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
BTW, in the absense of an ORDER BY clause, doesn't offset totally
lose its meaning? If you're going to do this optimisation,
you may as well ignore offset entirely in this case.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> #1 seems pretty nasty as a concept, unless of course this actually reflects
> the way that PG retrieves rows. My guess is that it will have to retrieve
> rows 1 to (offset + limit), not (offset) to (offset + limit), so the whole
> appreximation should be based on #2. 

Right --- if we could start the query in the middle this would all be
a lot nicer, but we can't.  The implementation of OFFSET is just to
discard the first N tuples retrieved before beginning to hand any tuples
back to the client.  So the "right" approach for the optimizer is to
assume that OFFSET+LIMIT tuples will be retrieved.  The trouble is that
that can mean that the query plan changes depending on OFFSET, which
leads to consistency problems if you don't lock down the tuple ordering
with ORDER BY.

> a. Does the optimizer know how to do 'index-only' queries (where all fields
> are satisfied by the index)

Postgres doesn't have indexes that allow index-only queries --- you
still have to fetch the tuples, because the index doesn't carry
commit status.  I think that's irrelevant anyway, since we're not
only interested in the behavior for simple queries...

> b. Just to clarify, OFFSET does affect the tuples actually returned,
> doesn't it?

Of course.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Hannu Krosing
Date:
Chris Bitmead wrote:
> 
> Don Baccus wrote:
> >
> > At 03:32 PM 2/14/00 +1100, Chris Bitmead wrote:
> >
> > >I agree you should probably go the whole hog one way or the other. I
> > >think
> > >ignoring offset+limit is a useful option, but like I said at the
> > >beginning, it doesn't bother me _that_ much.
> >
> > It should bother you that folks who understand how SQL works might
> > be penalized in order to insulate the fact that those who don't know
> > how SQL works from an understanding of their own ignorance...
> >
> > Shouldn't we be more concerned with folks who bother to read an
> > SQL primer?  Or Oracle or Informix docs on SQL?
> 
> LIMIT is not SQL, both as a technical fact, and philosophically
> because it reaches outside of set theory.

I see limit as a shortcut (plus an optimizer hint) for the sequence
DECLARE CURSOR - MOVE offset - FETCH limit - CLOSE CURSOR

It's utility was much debated befor it was included in Postgres, 
the main argument for inclusion being "mySQL has it and it's useful 
for fast-start queries", the main argument against being "it's not SQL,
people won't understand it a and will start to misuse it".

Maybe we should still discourage the use of LIMIT, and rather introduce 
another "mode" for optimiser, activated by SET FastStart TO 'ON'.
Then queries with limit could be rewritten into
SET FastStart to 'ON';
DECLARE
MOVE
FETCH
CLOSE
SET FastStart to PREVIOUS_VALUE;

also maybe we will need PUSH/POP for set commands ?

> What LIMIT does without ORDER BY is non-deterministic, and therefore 
> a subjective matter of what is the most useful: a faster answer, 
> or a more consistant answer.

As SQL queries are all one-time things you can't be "consistent". 
It's like being able to grab the same set of socks from a bag and 
then trying to devise a strategy for getting them in same order 
without sorting them (i.e. possible but ridiculous)

If you need them in some order, you use ORDER BY, if you don't need 
any order you omit ORDER BY.

> My predudices are caused by what I use PostgreSQL for, which is
> more favourable to the latter.

Whats wrong with using ORDER BY ? 

I can't imagine a set of queries that need to be consistent _almost_
all the time, but without any order.

If you really need that kind of behaviour, the right decision is to 
select the rows into a work table that has an additional column for 
preserving order and then do the limit queries from that table.

But in that case it is often faster to have an index on said column
and to do WHERE ID BETWEEN OFFSET AND OFFSET+LIMITORDER BY ID
than to use LIMIT, more so for large offsets.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris
Date:
Hannu Krosing wrote:
> As SQL queries are all one-time things you can't be "consistent".
> It's like being able to grab the same set of socks from a bag and
> then trying to devise a strategy for getting them in same order
> without sorting them (i.e. possible but ridiculous)
> 
> If you need them in some order, you use ORDER BY, if you don't need
> any order you omit ORDER BY.
> 
> > My predudices are caused by what I use PostgreSQL for, which is
> > more favourable to the latter.
> 
> Whats wrong with using ORDER BY ?

Only that it's non intuitive that ORDER BY should change the actual
results of a series of LIMIT queries, not just the order. If there are
100 records, and I do 10x LIMIT 10,offset queries one might expect to
get all 100 records. And currently you do (barring something unusual
like a vacuum at an inopportune moment that drastically changes
statistics).
> I can't imagine a set of queries that need to be consistent 
> _almost_ all the time, but without any order.
> 
> If you really need that kind of behaviour, the right decision is 
>to select the rows into a work table that has an additional column 
>for preserving order and then do the limit queries from that 
>table.

Impractical for stateless web based stuff where keeping state around is
painful if not impossible.

I'm just playing devils advocate here. Changing this is probably not
going to hurt me, I just think it could confuse a lot of people.
> But in that case it is often faster to have an index on said column
> and to do
>  WHERE ID BETWEEN OFFSET AND OFFSET+LIMIT
>  ORDER BY ID
> than to use LIMIT, more so for large offsets.

-- 
Chris Bitmead
mailto:chris@bitmead.com


Solution for LIMIT cost estimation

From
Chris
Date:
How about this as a compromise:

If you give an offset without an ORDER BY the offset
is useless if this optimisation is in place. If you
allowed the offset with the optimisation and no
order by it would be encouraging broken behaviour.

So therefore it would be reasonable to optimise a 
limit,offset query with no order by as if there were
no offset. This would give consistent results, albeit
it may not choose the best plan. But at least it 
won't hurt anyone.

The only snag is that it's not technically correct to
have an offset unless the ORDER BY yields a unique
criteria. If it's not unique, either because that
field is declared UNIQUE or because every single
field is mentioned in the order by, then optimisation
should be turned off if there is an offset. If it is
allowed people will randomly get missing results. I 
mean the only purpose of OFFSET is to get something 
like consistency between calls.

The thing is, I'll bet a whole lot of people will use
LIMIT,OFFSET with an ORDER BY, just not a fully unique
ORDER BY. That's why I find this "optimisation" 
questionable. Unless you're _extremely_ careful with 
your ORDER BY clause your results would be crap. Or
if the above idea is implemented, the execution
plan would be crap. If offset were not available,
then none of this would matter.

If this optimisation is implemented, are we going to
carefully explain exactly when an ORDER BY clause will
and won't yield consistent results? Because not just
any ORDER BY is good enough. Anybody who read that
manual page is probably going to be very confused.

-- 
Chris Bitmead
mailto:chris@bitmead.com


Re: [HACKERS] Solution for LIMIT cost estimation

From
Philip Warner
Date:
At 23:12 14/02/00 +1100, Chris wrote:
>
>How about this as a compromise:
>
>If you give an offset without an ORDER BY the offset
>is useless if this optimisation is in place. If you
>allowed the offset with the optimisation and no
>order by it would be encouraging broken behaviour.

Not that I would do this necessarily, but 
   select * from t where <stuff> offset 1 limit 1

is a valid way of checking for duplicates. I would hate to see the
optimization turned off in this case.


>So therefore it would be reasonable to optimise a 
>limit,offset query with no order by as if there were
>no offset. This would give consistent results, albeit
>it may not choose the best plan. But at least it 
>won't hurt anyone.
...etc

The problem with using a stateless connection is that you have no
transaction control, so can not control table contents between calls. eg.
if it contains:

f
-
abel tasman
charles sturt
ferdinand marcos

(spot the odd one out)

and do a 'select * from t order by f offset 0 limit 2', then someone adds
'bruce stringsteen' and you try to get the next two rows via 'select * from
t order by f offset 2 limit 2', you will get 'charles sturt' again, and
miss bruce.

Either you have to say that the database is almost never updated (ie. it's
just a copy of real data, used for web applications), in which case you can
add all sorts of fields for optimizing stateless calls (notably an ID
field), or you have to implement some kind of state preservation, and dump
ID's into a temporary table or use 'held' cursors, which is not really that
hard [Don't know if PG supports either, but you can 'fake' temporary tables
pretty easily].

I may have missed something in what you need, but someone else has already
mentioned using 'MOVE' within a cursor, and it still seems to me that
putting the optimizer through hoops to achieve the result is liable to be a
problem in the long term. 

eg. The Dec/Rdb optimizer actually assesses it's strategy while it's
running. If the query is taking too long, or the estimates it used prove
too inaccurate, it may change strategy. If PG implemented such a thing,
then this whole approach to offset/limit would be blown away - a strategy
will change depending on the data retrieved. It would be a pity if this
sort of improvement in the optimizer were blocked because of problems
caused by breaking successive calls to offset/limit queries.

Maybe either 'held cursors' or 'local temporary tables' could be added to
the ToDo list for a future release.

As to documenting the behaviour, I suspect that any NOTICE should also say
'This behaviour may change in the future - don't rely on it unless you like
living dangerously'.

Just my 0.02c, but I don't like putting limits on an optimizer. 

As an aside, and because I like bringing this thing up, stored query
strategies would solve the problem for selected queries; you could specify
the strategy to be used in all executions of a prticular query...maybe this
could go on the ToDo list? ;-}




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Solution for LIMIT cost estimation

From
Karl DeBisschop
Date:
>> 4. Fascist variant of #3: make LIMIT without ORDER BY be an error.
>
>Got my vote for that. At least make it a notice: "NOTICE:  LIMIT without
>ORDER BY results in random data being returned" -- That'll teach 'em. ;)

Given the nature of SQL, how could it be otherwise.  Select is defined
to be unordered.  This seems to advocate building a generic SQL
tutorial into postgreSQL.

I for one would very much rather not have that notice.  My reasoning
is thus:

Say I need a quick shell script to verify that a table has been loaded
with reasonable values as part of a cron procedure.  One way to do the
might be to make a shell script:

#!/bin/sh
if ! psql feature -tc "select * from elm limit 1" | egrep "^ +[0-9]+|" >/dev/null;
then
echo no data loaded;
fi

Thus, I cron this and get email if there is no record returned.
AFAICT, this is what should happen.  But if you start adding wornings
to this perfectly valid query, which will presumedly come out on
STDERR, I will get email from this, even though the query and its
returns were valid and expected.  And I don't want to direct stderr to
/dev/null because I do want to be warned if there really is an error.

Just my $0.02 worth.

-- 
Karl DeBisschop <kdebisschop@alert.infoplease.com>
617.832.0332 (Fax: 617.956.2696)

Information Please - your source for FREE online reference
http://www.infoplease.com  - Your Ultimate Fact Finder
http://kids.infoplease.com - The Great Homework Helper

Netsaint Plugins Development
http://netsaintplug.sourceforge.net


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 04:28 PM 2/14/00 +1100, Chris Bitmead wrote:

>LIMIT is not SQL

No, of course not but of course you're ignoring my point

>My predudices are caused by what I use PostgreSQL for, which is
>more favourable to the latter.

This, actually, IS my primary point.  Tailoring a product to your
personal prejudices when it is meant to be used by a very wide
range of folks is not wise.

If Postgres is to be tailored to any particular person's 
prejudices, why yours and not mine?  Or Tom's?  Or Bruce's?

The reality is that the developers apparently made the decision
to make Postgres into a real, albeit open source, product with
the intention that it receive wide use.

THAT - or so I believe - is the goal, not to tailor it to
any one person (or any small set of persons) particular prejudices.

That, for instance, is why it was decided to turn PG into an SQL92
compliant RDBMS.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 10:47 PM 2/14/00 +1100, Chris wrote:

>Only that it's non intuitive that ORDER BY should change the actual
>results of a series of LIMIT queries, not just the order. If there are
>100 records, and I do 10x LIMIT 10,offset queries one might expect to
>get all 100 records.

The only person who will expect that is the person who hasn't bothered
to learn the fundamental SQL property that rows returned by queries
come back in non-deterministic order.

This is a FUNDAMENTAL concept in SQL, one that is mentioned in every
SQL book I've seen.

The same person probably expects NULL = NULL to return true, too.

So what?

> And currently you do (barring something unusual
>like a vacuum at an inopportune moment that drastically changes
>statistics).

Or an insert by another back end, not at all uncommon in the
kind of web environment where this construct is frequently
used.

>I'm just playing devils advocate here. Changing this is probably not
>going to hurt me, I just think it could confuse a lot of people.

See above.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 11:41 AM 2/14/00 +0200, Hannu Krosing wrote:

>It's utility was much debated befor it was included in Postgres, 
>the main argument for inclusion being "mySQL has it and it's useful 
>for fast-start queries", the main argument against being "it's not SQL,
>people won't understand it a and will start to misuse it".

Well, it appears people have started to misuse it! :)

Oracle has recently (8i or 8.1.6 if you prefer) offered something similar,
but it gives weird results depending on whether or not you have an index
on the column.  There's a kludgey workaround, which I forget since I don't
use Oracle, only laugh maniacly when it fails to install on a linux box
with less than 256MB combined RAM and swap space (i.e. virtual memory).

>Maybe we should still discourage the use of LIMIT, and rather introduce 
>another "mode" for optimiser, activated by SET FastStart TO 'ON'.
>Then queries with limit could be rewritten into
>SET FastStart to 'ON';
>DECLARE
>MOVE
>FETCH
>CLOSE
>SET FastStart to PREVIOUS_VALUE;
>
>also maybe we will need PUSH/POP for set commands ?

Well...personally I don't see LIMIT as being particularly harmful,
and it is a convenience.  Remember, for the web space you're speaking
of keeping overhead low is a real concern, and requiring a series
of queries where currently only one needed will probably go over like
a lead ballon.

If the documentation actually pointed out that LIMIT in the absence
of an ORDER BY clause probably doesn't do what you want, fewer folks
might expect it to work any differently than it does.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 11:12 PM 2/14/00 +1100, Chris wrote:

>So therefore it would be reasonable to optimise a 
>limit,offset query with no order by as if there were
>no offset. This would give consistent results, albeit
>it may not choose the best plan. But at least it 
>won't hurt anyone.

Why bother?

It will only give consistent results if the table doesn't
change, which is only likely to be during testing if the 
table is one which is inserted into, updated, and the like
during production, such as is true of bulletin boards and
the like.

And you normally want to order such queries anyway, by date
or by some ranking criteria.

You are making a mountain out of a molehill, here.  Or, 
a mountain out of a playa, there's really no molehill 
even because your code's broken to begin with.

>If this optimisation is implemented, are we going to
>carefully explain exactly when an ORDER BY clause will
>and won't yield consistent results? Because not just
>any ORDER BY is good enough.

This is already true in SQL as it is, EVEN WITHOUT 
LIMIT.  If your ORDER BY isn't good enough, each time
you query the db you might get rows back in a different
order.

Even if you grab all the rows and walk through them
yourself, discarding the first OFFSET rows and processing
the LIMIT rows, when you revisit and start over you have
exactly the SAME non-determinancy to worry about.

It has nothing to do with LIMIT, Chris.  It really doesn't.

It has to do with your desire to make broken code "work"
in a very limited set of circumstances that don't match
real world conditions often at all.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Just my 0.02c, but I don't like putting limits on an optimizer. 

That's my feeling too.  I'm leaning towards letting the optimizer do the
best it can with the given query (which means using OFFSET+LIMIT as the
estimated number of tuples to be fetched), and documenting the potential
gotcha as best we can.  Something like:

CAUTION: if you repeat a query several times with different OFFSET or
LIMIT values to fetch different portions of the whole result, you will
find that you get inconsistent results unless you specify an ORDER BY
condition that is strong enough to ensure that all selected tuples must
appear in a unique order.  Without ORDER BY, the system is free to
return the tuples in any order it finds convenient --- and it may well
make different implementation choices leading to different orderings
depending on the OFFSET and LIMIT values.  In general, you should be
very wary of using OFFSET or LIMIT with an unordered or partially
ordered query; you will get a difficult-to-predict, implementation-
dependent subset of the selected tuples.

Is that clear enough?  Can anyone improve on the wording?
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 02:27 PM 2/14/00 -0500, Tom Lane wrote:

>CAUTION: if you repeat a query several times with different OFFSET or
>LIMIT values to fetch different portions of the whole result, you will
>find that you get inconsistent results unless you specify an ORDER BY
>condition that is strong enough to ensure that all selected tuples must
>appear in a unique order.  Without ORDER BY, the system is free to
>return the tuples in any order it finds convenient

Personally, I would generalize this and leave out the reference to
LIMIT and OFFSET, except perhaps to point out that this is one
particular construct that confuses people.

As PG matures, so will the optimizer and query engine, and people
who've written code that depends on tuples being returned in a
single consistent order might find themselves in for a rude shock.

A well-deserved one (IMO), but still a shock.

The documentation won't stop most people who want to do this
from doing so, they'll test and try to "trick" the system by
taking advantage of behavior that might not be consistent in
future releases.

Still...if it stops even ONE person from doing this, the doc will
do some good.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:
> 
> At 10:47 PM 2/14/00 +1100, Chris wrote:
> 
> >Only that it's non intuitive that ORDER BY should change the actual
> >results of a series of LIMIT queries, not just the order. If there are
> >100 records, and I do 10x LIMIT 10,offset queries one might expect to
> >get all 100 records.
> 
> The only person who will expect that is the person who hasn't bothered
> to learn the fundamental SQL property that rows returned by queries
> come back in non-deterministic order.
> 
> This is a FUNDAMENTAL concept in SQL, one that is mentioned in every
> SQL book I've seen.

It's a logical fact that the existance of "offset", automatically
implies
ordering, no matter how many SQL textbooks you quote.

> It will only give consistent results if the table doesn't
>change, which is only likely to be during testing if the 
>table is one which is inserted into, updated, and the like
>during production, such as is true of bulletin boards and
>the like.

It's actually very typical for web applications to want to search
through
historical stuff that doesn't change any more. And ordering by title
or something might not be good enough.

IMHO, that's a better reasoning that wanting to misuse LIMIT to figure
out if there are duplicates or something, just because nobody can
be bothered optimising the correct SQL to do that.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Hannu Krosing
Date:
Don Baccus wrote:
> 
> At 11:41 AM 2/14/00 +0200, Hannu Krosing wrote:>Maybe we should still discourage the use of LIMIT, and rather
introduce
> >another "mode" for optimiser, activated by SET FastStart TO 'ON'.
> >Then queries with limit could be rewritten into
> >SET FastStart to 'ON';
> >DECLARE
> >MOVE
> >FETCH
> >CLOSE
> >SET FastStart to PREVIOUS_VALUE;
> >
> >also maybe we will need PUSH/POP for set commands ?
> 
> Well...personally I don't see LIMIT as being particularly harmful,
> and it is a convenience.  Remember, for the web space you're speaking
> of keeping overhead low is a real concern, and requiring a series
> of queries where currently only one needed will probably go over like
> a lead ballon.

I meant that the _backend_ could (in some distant future, when the 
optimiser is perfect :) implement LIMIT as above sequence.

---------------
Hannu


Re: [HACKERS] Solution for LIMIT cost estimation

From
Hannu Krosing
Date:
Don Baccus wrote:
> 
> This is a FUNDAMENTAL concept in SQL, one that is mentioned in every
> SQL book I've seen.
> 
> The same person probably expects NULL = NULL to return true, too.
> 

IIRC SQL3 defines different /classes/ of nulls where the above could be 
true if the NULLs belong to the same class. 

I.e. the absence of an orange is equal to the absence of the same orange,
but not equal to the absence of an apple (and possibly another orange) ;)

I may of course be completely wrong, as I did not read it too carefully 
being after completely other things at that time. 

I also could not figue out the use for such a feature.

----------------
Hannu


RE: [HACKERS] Solution for LIMIT cost estimation

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgresql.org
> [mailto:owner-pgsql-hackers@postgresql.org]On Behalf Of Tom Lane
> 
> Philip Warner <pjw@rhyme.com.au> writes:
> > Just my 0.02c, but I don't like putting limits on an optimizer. 
> 
> That's my feeling too.  I'm leaning towards letting the optimizer do the
> best it can with the given query (which means using OFFSET+LIMIT as the
> estimated number of tuples to be fetched), 

What about cursors ?
I heard from Jan that we could specify 'LIMIT ALL'  to tell optimizer that
the response to get first rows is needed.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 10:30 AM 2/15/00 +1100, Chris Bitmead wrote:

>It's a logical fact that the existance of "offset", automatically
>implies
>ordering, no matter how many SQL textbooks you quote.

Chris, that is your opinion and judging from the responses of other
folks on this list, it appears to be very much a minority opinion.

Minority of one, as a matter of fact.  There has been a parade
of posts disagreeing with your opinion.

Why not give up and get on with your life before I get tired of
being polite?  I'm *much* more stubborn than you are, particularly
when I'm right.




- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 02:43 AM 2/15/00 +0200, Hannu Krosing wrote:
>Don Baccus wrote:

>> Well...personally I don't see LIMIT as being particularly harmful,
>> and it is a convenience.  Remember, for the web space you're speaking
>> of keeping overhead low is a real concern, and requiring a series
>> of queries where currently only one needed will probably go over like
>> a lead ballon.
>
>I meant that the _backend_ could (in some distant future, when the 
>optimiser is perfect :) implement LIMIT as above sequence.

Oops!  Sorry...at the moment I'm near to loathing the very existence
of LIMIT so misunderstood :)



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 02:52 AM 2/15/00 +0200, Hannu Krosing wrote:
>Don Baccus wrote:
>> 
>> This is a FUNDAMENTAL concept in SQL, one that is mentioned in every
>> SQL book I've seen.
>> 
>> The same person probably expects NULL = NULL to return true, too.
>> 
>
>IIRC SQL3 defines different /classes/ of nulls where the above could be 
>true if the NULLs belong to the same class. 

>I.e. the absence of an orange is equal to the absence of the same orange,
>but not equal to the absence of an apple (and possibly another orange) ;)

>I may of course be completely wrong, as I did not read it too carefully 
>being after completely other things at that time. 

My recent foray into the SQL3 draft with Jan in order to figure out
MATCH <unspecified> semantics makes me suspicious of anyone's claim to
understand what the standard says :)

Particularly the authors!

I'm carrying a length of rope and am keeping mindful of the nearest
lamp post just in case I run across one in the street by accident.

>I also could not figue out the use for such a feature.

Well, I just looked at Date's summary of SQL3 and while he talks 
about the new user datatype and mild object-oriented innovations,
he doesn't talk about any change in the meaning of NULL.  Since
he makes no effort to hide his loathing for NULL or three-valued
logic as implemented in SQL, if it had changed I'm certain he
would've mentioned it.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don Baccus wrote:
> 
> At 10:30 AM 2/15/00 +1100, Chris Bitmead wrote:
> 
> >It's a logical fact that the existance of "offset", automatically
> >implies
> >ordering, no matter how many SQL textbooks you quote.
> 
> Chris, that is your opinion and judging from the responses of other
> folks on this list, it appears to be very much a minority opinion.
> 
> Minority of one, as a matter of fact.  There has been a parade
> of posts disagreeing with your opinion.

I've heard no-one say that offset is meaningful or in any sense
useful in the absense of order. If it means something please 
enlighten us. If not, try reading before posting.

> Why not give up and get on with your life before I get tired of
> being polite?  I'm *much* more stubborn than you are, particularly
> when I'm right.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Don Baccus
Date:
At 02:25 PM 2/15/00 +1100, Chris Bitmead wrote:

>I've heard no-one say that offset is meaningful or in any sense
>useful in the absense of order. If it means something please 
>enlighten us. If not, try reading before posting.

Actually the "limit 1 offset 1" example for uniqueness DID actually
give a meaningful hack on the usefulness of lack of order.

The basic problem, Chris, is that you want to rape the optimizer
in order to stroke ...

well...I'll be nice for one more post.

But...I'm losing patience.

Hell...push me and I'll just start deleting your questions and answers
on photo.net.  After all, you don't understand that weight isn't the only
parameter that contributes to the stability of tripod support..."I'm
leery of Gitzo carbon fiber tripods because I seek WEIGHT!".  If you
seek weight, eat butter.  If you seek stable tripods, seek carbon 
fiber and give up this bullshit weight fanatacism.

You're pretty much a putz.  I could go on and on, based only on photo.net
postings.  Display ignorance in one forum, and why should one be
surprised to see ignorance in another?  Sign me...glad to be a moderator
of photo.net.  Wish I were here, too.  Why do you go to so much bother
to demonstrate the fact that you don't know what the hell you're talking
about?

Here's a photo.net example:

"Do I have the right to photograph in non-public places?"

The very subject line displays your ignorance.  OF COURSE YOU DON'T.
Not by default.  By definition, a private owner of an enclosed space
like the museum in question owns that space.  Your lack of respect for
that authority displays selfishness.  You're similarly selfish in regard
to PG.

As long as rules on photography, etc, are uniformly stated and enforced, in
English-derived legal systems you don't have a limp d... to stand on.

"The other day I went to a special museum exhibit of ancient artifacts. I paid
about $AUD 20 to get in. 

I pulled out my camera and started taking a few photos of stuff, whereupon
one of the attendants chastised me and said photography wasn't allowed. I
was not using flash"

Hmmm...not using flash.  So what?  The issue is whether or not you can
photograph.

"because I know sometimes items can be damaged by excess light."

Which, in the case of flash has been totally debunked, though some museums
still use it as an excuse to avoid arguing over whether or not a private
venue is subject to public property access laws.  So not only are you 
sadly misinformed about law, but you appear to be sadly misinformed about
the effect of electronic flash on art.

"On the way out, I enquired about why I couldn't photograph. They said it
was a condition of the owner of the artifacts and was probably because they
hold "copyright" on the items."

Oh my gosh, so the person buying these things who wants to let the public
view them therefore abrogates all right to any image taken by a visitor?

Just because Chris is a self-centered, selfish man?  Theft is your RIGHT?

Gag me.  

OK, an apology to the forum.  Chris is a pain in the butt in the photo
forum I moderate, shows little common sense nor most particularly a sense
of community, is selfish and resents law when it suggests he can't do each
and every thing he might want to do in life.

I shouldn't bring this up but I'm pretty much tired of this discussion, and
he's tired me in the past in the photo forum I help moderate.  I was nice
there, didn't spank him in public, and now feel like I'm suffering over
here for my lack of diligence.

(paraphrases follow)

"I should get to photograph these artifacts even if they're
owned by someone else and even if they're being shown in a private forum".

"You guys should make sure that the optimizer doesn't cause my BROKEN code
to not "work", even though it doesn't really work today"

"Let's change how inheritance etc. works in a way that fits my personal
prejudice, regardless of how the rest of the world might view the issue"

And, yes, I'm being petty and vindicative but since you're so insistent
on being a total *bleeping* idiot, why not?  Give it up!  NO ONE
agrees with you.  

(I'm still being polite, want to push me?)

If you don't want SQL to be SQL, write your own query language and
build it on PG.  Convince the world that you're right, and you'll
be a very rich man.

No one is stopping you.  Distribute it as a rival copy.  You can
even incorporate each and every enhancement and bug fix that comes
along.

Since you own the one and only better-mouse-trap-ideal, you'll kick
our ass and we'll fade into oblivion.

It's a given, right? 

Oh, and while you're at it, finance your own museum and let me in 
to shoot and sell images resulting from my visit to my heart's desire,
all for free...I'm holding my breath, man.

(for those of you who don't know it, I actually make part of my living
as a freelance photographer, with a wide range of national [US] credits.
Despite this, I would NEVER consider questioning a private museum's right
to control photograher access to its exhibits.  Nor my home, for that
matter).

Chris, you're an exceedingly selfish man.  



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Don, you are one fucking son of a bitch to bring up things I've said
on photo.net here on this forum. Sure I've said some pretty dumb things
in the past, who hasn't. But for you to bring this thing up from years
ago into a completely different forum... Well you're petulent child.
Don't bother commenting on anything I write, or communicating with me
again, because I won't read it.

Don Baccus wrote:
> 
> At 02:25 PM 2/15/00 +1100, Chris Bitmead wrote:
> 
> >I've heard no-one say that offset is meaningful or in any sense
> >useful in the absense of order. If it means something please
> >enlighten us. If not, try reading before posting.
> 
> Actually the "limit 1 offset 1" example for uniqueness DID actually
> give a meaningful hack on the usefulness of lack of order.
> 
> The basic problem, Chris, is that you want to rape the optimizer
> in order to stroke ...
> 
> well...I'll be nice for one more post.
> 
> But...I'm losing patience.
> 
> Hell...push me and I'll just start deleting your questions and answers
> on photo.net.  After all, you don't understand that weight isn't the only
> parameter that contributes to the stability of tripod support..."I'm
> leery of Gitzo carbon fiber tripods because I seek WEIGHT!".  If you
> seek weight, eat butter.  If you seek stable tripods, seek carbon
> fiber and give up this bullshit weight fanatacism.
> 
> You're pretty much a putz.  I could go on and on, based only on photo.net
> postings.  Display ignorance in one forum, and why should one be
> surprised to see ignorance in another?  Sign me...glad to be a moderator
> of photo.net.  Wish I were here, too.  Why do you go to so much bother
> to demonstrate the fact that you don't know what the hell you're talking
> about?
> 
> Here's a photo.net example:
> 
> "Do I have the right to photograph in non-public places?"
> 
> The very subject line displays your ignorance.  OF COURSE YOU DON'T.
> Not by default.  By definition, a private owner of an enclosed space
> like the museum in question owns that space.  Your lack of respect for
> that authority displays selfishness.  You're similarly selfish in regard
> to PG.
> 
> As long as rules on photography, etc, are uniformly stated and enforced, in
> English-derived legal systems you don't have a limp d... to stand on.
> 
> "The other day I went to a special museum exhibit of ancient artifacts. I paid
> about $AUD 20 to get in.
> 
> I pulled out my camera and started taking a few photos of stuff, whereupon
> one of the attendants chastised me and said photography wasn't allowed. I
> was not using flash"
> 
> Hmmm...not using flash.  So what?  The issue is whether or not you can
> photograph.
> 
> "because I know sometimes items can be damaged by excess light."
> 
> Which, in the case of flash has been totally debunked, though some museums
> still use it as an excuse to avoid arguing over whether or not a private
> venue is subject to public property access laws.  So not only are you
> sadly misinformed about law, but you appear to be sadly misinformed about
> the effect of electronic flash on art.
> 
> "On the way out, I enquired about why I couldn't photograph. They said it
> was a condition of the owner of the artifacts and was probably because they
> hold "copyright" on the items."
> 
> Oh my gosh, so the person buying these things who wants to let the public
> view them therefore abrogates all right to any image taken by a visitor?
> 
> Just because Chris is a self-centered, selfish man?  Theft is your RIGHT?
> 
> Gag me.
> 
> OK, an apology to the forum.  Chris is a pain in the butt in the photo
> forum I moderate, shows little common sense nor most particularly a sense
> of community, is selfish and resents law when it suggests he can't do each
> and every thing he might want to do in life.
> 
> I shouldn't bring this up but I'm pretty much tired of this discussion, and
> he's tired me in the past in the photo forum I help moderate.  I was nice
> there, didn't spank him in public, and now feel like I'm suffering over
> here for my lack of diligence.
> 
> (paraphrases follow)
> 
> "I should get to photograph these artifacts even if they're
> owned by someone else and even if they're being shown in a private forum".
> 
> "You guys should make sure that the optimizer doesn't cause my BROKEN code
> to not "work", even though it doesn't really work today"
> 
> "Let's change how inheritance etc. works in a way that fits my personal
> prejudice, regardless of how the rest of the world might view the issue"
> 
> And, yes, I'm being petty and vindicative but since you're so insistent
> on being a total *bleeping* idiot, why not?  Give it up!  NO ONE
> agrees with you.
> 
> (I'm still being polite, want to push me?)
> 
> If you don't want SQL to be SQL, write your own query language and
> build it on PG.  Convince the world that you're right, and you'll
> be a very rich man.
> 
> No one is stopping you.  Distribute it as a rival copy.  You can
> even incorporate each and every enhancement and bug fix that comes
> along.
> 
> Since you own the one and only better-mouse-trap-ideal, you'll kick
> our ass and we'll fade into oblivion.
> 
> It's a given, right?
> 
> Oh, and while you're at it, finance your own museum and let me in
> to shoot and sell images resulting from my visit to my heart's desire,
> all for free...I'm holding my breath, man.
> 
> (for those of you who don't know it, I actually make part of my living
> as a freelance photographer, with a wide range of national [US] credits.
> Despite this, I would NEVER consider questioning a private museum's right
> to control photograher access to its exhibits.  Nor my home, for that
> matter).
> 
> Chris, you're an exceedingly selfish man.
> 
> - Don Baccus, Portland OR <dhogaza@pacifier.com>
>   Nature photos, on-line guides, Pacific Northwest
>   Rare Bird Alert Service and other goodies at
>   http://donb.photo.net.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Bruce Momjian
Date:
Folks, this type of behavour is being taken care of in a private manner.
It is being addressed.

---------------------------------------------------------------------------

> Don, you are one fucking son of a bitch to bring up things I've said
> on photo.net here on this forum. Sure I've said some pretty dumb things
> in the past, who hasn't. But for you to bring this thing up from years
> ago into a completely different forum... Well you're petulent child.
> Don't bother commenting on anything I write, or communicating with me
> again, because I won't read it.
> 
> Don Baccus wrote:
> > 
> > At 02:25 PM 2/15/00 +1100, Chris Bitmead wrote:
> > 
> > >I've heard no-one say that offset is meaningful or in any sense
> > >useful in the absense of order. If it means something please
> > >enlighten us. If not, try reading before posting.
> > 
> > Actually the "limit 1 offset 1" example for uniqueness DID actually
> > give a meaningful hack on the usefulness of lack of order.
> > 
> > The basic problem, Chris, is that you want to rape the optimizer
> > in order to stroke ...
> > 
> > well...I'll be nice for one more post.
> > 
> > But...I'm losing patience.
> > 
> > Hell...push me and I'll just start deleting your questions and answers
> > on photo.net.  After all, you don't understand that weight isn't the only
> > parameter that contributes to the stability of tripod support..."I'm
> > leery of Gitzo carbon fiber tripods because I seek WEIGHT!".  If you
> > seek weight, eat butter.  If you seek stable tripods, seek carbon
> > fiber and give up this bullshit weight fanatacism.
> > 
> > You're pretty much a putz.  I could go on and on, based only on photo.net
> > postings.  Display ignorance in one forum, and why should one be
> > surprised to see ignorance in another?  Sign me...glad to be a moderator
> > of photo.net.  Wish I were here, too.  Why do you go to so much bother
> > to demonstrate the fact that you don't know what the hell you're talking
> > about?
> > 
> > Here's a photo.net example:
> > 
> > "Do I have the right to photograph in non-public places?"
> > 
> > The very subject line displays your ignorance.  OF COURSE YOU DON'T.
> > Not by default.  By definition, a private owner of an enclosed space
> > like the museum in question owns that space.  Your lack of respect for
> > that authority displays selfishness.  You're similarly selfish in regard
> > to PG.
> > 
> > As long as rules on photography, etc, are uniformly stated and enforced, in
> > English-derived legal systems you don't have a limp d... to stand on.
> > 
> > "The other day I went to a special museum exhibit of ancient artifacts. I paid
> > about $AUD 20 to get in.
> > 
> > I pulled out my camera and started taking a few photos of stuff, whereupon
> > one of the attendants chastised me and said photography wasn't allowed. I
> > was not using flash"
> > 
> > Hmmm...not using flash.  So what?  The issue is whether or not you can
> > photograph.
> > 
> > "because I know sometimes items can be damaged by excess light."
> > 
> > Which, in the case of flash has been totally debunked, though some museums
> > still use it as an excuse to avoid arguing over whether or not a private
> > venue is subject to public property access laws.  So not only are you
> > sadly misinformed about law, but you appear to be sadly misinformed about
> > the effect of electronic flash on art.
> > 
> > "On the way out, I enquired about why I couldn't photograph. They said it
> > was a condition of the owner of the artifacts and was probably because they
> > hold "copyright" on the items."
> > 
> > Oh my gosh, so the person buying these things who wants to let the public
> > view them therefore abrogates all right to any image taken by a visitor?
> > 
> > Just because Chris is a self-centered, selfish man?  Theft is your RIGHT?
> > 
> > Gag me.
> > 
> > OK, an apology to the forum.  Chris is a pain in the butt in the photo
> > forum I moderate, shows little common sense nor most particularly a sense
> > of community, is selfish and resents law when it suggests he can't do each
> > and every thing he might want to do in life.
> > 
> > I shouldn't bring this up but I'm pretty much tired of this discussion, and
> > he's tired me in the past in the photo forum I help moderate.  I was nice
> > there, didn't spank him in public, and now feel like I'm suffering over
> > here for my lack of diligence.
> > 
> > (paraphrases follow)
> > 
> > "I should get to photograph these artifacts even if they're
> > owned by someone else and even if they're being shown in a private forum".
> > 
> > "You guys should make sure that the optimizer doesn't cause my BROKEN code
> > to not "work", even though it doesn't really work today"
> > 
> > "Let's change how inheritance etc. works in a way that fits my personal
> > prejudice, regardless of how the rest of the world might view the issue"
> > 
> > And, yes, I'm being petty and vindicative but since you're so insistent
> > on being a total *bleeping* idiot, why not?  Give it up!  NO ONE
> > agrees with you.
> > 
> > (I'm still being polite, want to push me?)
> > 
> > If you don't want SQL to be SQL, write your own query language and
> > build it on PG.  Convince the world that you're right, and you'll
> > be a very rich man.
> > 
> > No one is stopping you.  Distribute it as a rival copy.  You can
> > even incorporate each and every enhancement and bug fix that comes
> > along.
> > 
> > Since you own the one and only better-mouse-trap-ideal, you'll kick
> > our ass and we'll fade into oblivion.
> > 
> > It's a given, right?
> > 
> > Oh, and while you're at it, finance your own museum and let me in
> > to shoot and sell images resulting from my visit to my heart's desire,
> > all for free...I'm holding my breath, man.
> > 
> > (for those of you who don't know it, I actually make part of my living
> > as a freelance photographer, with a wide range of national [US] credits.
> > Despite this, I would NEVER consider questioning a private museum's right
> > to control photograher access to its exhibits.  Nor my home, for that
> > matter).
> > 
> > Chris, you're an exceedingly selfish man.
> > 
> > - Don Baccus, Portland OR <dhogaza@pacifier.com>
> >   Nature photos, on-line guides, Pacific Northwest
> >   Rare Bird Alert Service and other goodies at
> >   http://donb.photo.net.
> 
> ************
> 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Solution for LIMIT cost estimation

From
Chris Bitmead
Date:
Bruce Momjian wrote:
> 
> Folks, this type of behavour is being taken care of in a private manner.
> It is being addressed.

Apologies guys. I'm afraid I lost my cool. Sorry.


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> That's my feeling too.  I'm leaning towards letting the optimizer do the
>> best it can with the given query (which means using OFFSET+LIMIT as the
>> estimated number of tuples to be fetched), 

> What about cursors ?
> I heard from Jan that we could specify 'LIMIT ALL'  to tell optimizer that
> the response to get first rows is needed.

Hmm.  Right now I have it coded to treat 'LIMIT ALL' the same as
no LIMIT clause, which is the way it ought to work AFAICS.

DECLARE CURSOR doesn't appear to support OFFSET/LIMIT at all (the
grammar will take the clause, but analyze.c throws it away...).

I have the LIMIT support in the planner coded to build plans for
DECLARE CURSOR queries on the assumption that 10% of the rows will
be fetched, which is the sort of compromise that will satisfy
nobody ;-).

A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
being simply a hint to the optimizer about how much of the query
result will actually get fetched.  I think we could do that by
tweaking analyze.c to pass through the clauses the same as it does
for regular select, and have the planner discard the clauses after
it's done using them.  (We don't want them to get to the executor
and interfere with the actual behavior of FETCH commands, but I
don't see a reason why they can't live to reach the planner...)

Comments anyone?
        regards, tom lane


RE: [HACKERS] Solution for LIMIT cost estimation

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> That's my feeling too.  I'm leaning towards letting the
> optimizer do the
> >> best it can with the given query (which means using OFFSET+LIMIT as the
> >> estimated number of tuples to be fetched),
>
> > What about cursors ?
> > I heard from Jan that we could specify 'LIMIT ALL'  to tell
> optimizer that
> > the response to get first rows is needed.
>
> Hmm.  Right now I have it coded to treat 'LIMIT ALL' the same as
> no LIMIT clause, which is the way it ought to work AFAICS.
>
> DECLARE CURSOR doesn't appear to support OFFSET/LIMIT at all (the
> grammar will take the clause, but analyze.c throws it away...).
>
> I have the LIMIT support in the planner coded to build plans for
> DECLARE CURSOR queries on the assumption that 10% of the rows will
> be fetched, which is the sort of compromise that will satisfy
> nobody ;-).
>

Probably your change would work well in most cases.
It's nice.
However it seems more preferable to be able to select first/all rows hint.

> A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
> being simply a hint to the optimizer about how much of the query
> result will actually get fetched.  I think we could do that by
> tweaking analyze.c to pass through the clauses the same as it does
> for regular select, and have the planner discard the clauses after
> it's done using them.  (We don't want them to get to the executor
> and interfere with the actual behavior of FETCH commands, but I
> don't see a reason why they can't live to reach the planner...)
>
> Comments anyone?
>

The following was the reply from Jan 16 months ago.
Unfortunately PostgreSQL optimizer wasn't able to choose index scan
for queires with no qualification at that time.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

Re: [HACKERS] What about LIMIT in SELECT ? [1998/10/19]

Hiroshi Inoue wrote:

> When using cursors,in most cases the response to get first(next) rows
> is necessary for me,not the throughput.
> How can we tell PostgreSQL optimzer that the response is necessary ?
   With my LIMIT patch, the offset and the row count are part of   the querytree. And if a LIMIT is given, the
limitCountelemet   of the querytree (a Node *) isn't NULL what it is by default.
 
   When a LIMIT is given, the optimizer could assume that  first   rows  is  wanted (even if the limit is ALL maybe -
butI have   to think about this some more). And this assumption might let   it  decide  to use an index to resolve an
ORDERBY even if no   qualification was given.
 
   Telling the optimizer that first  rows  wanted  in  a  cursor   operation would read
       DECLARE CURSOR c FOR SELECT * FROM mytab ORDER BY a LIMIT ALL;


Jan



Re: [HACKERS] Solution for LIMIT cost estimation

From
Philip Warner
Date:
At 01:30 15/02/00 -0500, Tom Lane wrote:
>
>> What about cursors ?
>> I heard from Jan that we could specify 'LIMIT ALL'  to tell optimizer that
>> the response to get first rows is needed.
>
>Hmm.  Right now I have it coded to treat 'LIMIT ALL' the same as
>no LIMIT clause, which is the way it ought to work AFAICS.
>
>DECLARE CURSOR doesn't appear to support OFFSET/LIMIT at all (the
>grammar will take the clause, but analyze.c throws it away...).
>
>I have the LIMIT support in the planner coded to build plans for
>DECLARE CURSOR queries on the assumption that 10% of the rows will
>be fetched, which is the sort of compromise that will satisfy
>nobody ;-).
>
>A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
>being simply a hint to the optimizer about how much of the query
>result will actually get fetched.  

This seems a good approach until cursors are fixed. But is there a plan to
make cursors support LIMIT properly? Do you know why they ignore the LIMIT
clause?

Or am I missing something?



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Solution for LIMIT cost estimation

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>> A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
>> being simply a hint to the optimizer about how much of the query
>> result will actually get fetched.  

> This seems a good approach until cursors are fixed. But is there a plan to
> make cursors support LIMIT properly? Do you know why they ignore the LIMIT
> clause?

Should they obey LIMIT?  MOVE/FETCH seems like a considerably more
flexible interface, so I'm not quite sure why anyone would want to
use LIMIT in a cursor.

Still, it seems kind of inconsistent that cursors ignore LIMIT.
I don't know for sure why it was done that way.
        regards, tom lane


Re: [HACKERS] Solution for LIMIT cost estimation

From
Philip Warner
Date:
At 10:43 16/02/00 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>>> A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
>>> being simply a hint to the optimizer about how much of the query
>>> result will actually get fetched.  
>
>> This seems a good approach until cursors are fixed. But is there a plan to
>> make cursors support LIMIT properly? Do you know why they ignore the LIMIT
>> clause?
>
>Should they obey LIMIT?  MOVE/FETCH seems like a considerably more
>flexible interface, so I'm not quite sure why anyone would want to
>use LIMIT in a cursor.

I agree; but see below.


>Still, it seems kind of inconsistent that cursors ignore LIMIT.
>I don't know for sure why it was done that way.

It's the inconsistency that bothers me: if I run a SELECT statement, then
put it in a cursor, I should get the same rows returned. Ths current
behaviour should probably be considered a bug.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


RE: [HACKERS] Solution for LIMIT cost estimation

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> Philip Warner <pjw@rhyme.com.au> writes:
> >> A possible answer is to define OFFSET/LIMIT in DECLARE CURSOR as
> >> being simply a hint to the optimizer about how much of the query
> >> result will actually get fetched.
>
> > This seems a good approach until cursors are fixed. But is
> there a plan to
> > make cursors support LIMIT properly? Do you know why they
> ignore the LIMIT
> > clause?
>
> Should they obey LIMIT?  MOVE/FETCH seems like a considerably more
> flexible interface, so I'm not quite sure why anyone would want to
> use LIMIT in a cursor.
>

You are right.
What I want is to tell optimizer the hint whether all_rows(total throughput)
is needed or first_rows(constant response time) is needed.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp