Thread: LIMIT in DECLARE CURSOR: request for comments
Hiroshi and I had a discussion last night that needs to reach a wider audience than just the bystanders on pgsql-committers. Let me see if I can reconstruct the main points. In 7.0, a LIMIT clause can appear in a DECLARE CURSOR, but it's ignored: play=> select * from vv1; f1 ------------- 0 123456 -123456 2147483647-2147483647 0 (6 rows) play=> begin; BEGIN play=> declare c cursor for select * from vv1 limit 2; SELECT play=> fetch 10 from c; f1 ------------- 0 123456 -123456 2147483647-2147483647 0 (6 rows) The reason for this behavior is that LIMIT and the FETCH count are implemented by the same mechanism (ExecutorRun's count parameter) and so FETCH has no choice but to override the LIMIT with its own argument. Yesterday I reimplemented LIMIT as a separate plan node type, in order to make it work in views. A side effect of this is that ExecutorRun's count parameter is now *only* used for FETCH, and therefore a LIMIT appearing in a DECLARE CURSOR does what IMHO it should do: you get that many rows and no more from the cursor. regression=# begin; BEGIN regression=# declare c cursor for select * from vv1 limit 2; SELECT regression=# fetch 10 from c; f1 -------- 0123456 (2 rows) Hiroshi was a little concerned about this change in behavior, and so the first order of business is whether anyone wants to defend the old way? IMHO it was incontrovertibly a bug, but ... The second question is how the presence of a LIMIT clause ought to affect the planner's behavior. In 7.0, we taught the planner to pay attention to LIMIT as an indicator whether it ought to prefer fast-start plans over lowest-total-cost plans. For example, consider SELECT * FROM tab ORDER BY col; and assume there's a b-tree index on col. Then the planner has two possible choices of plan: an indexscan on col, or a sequential scan followed by sort. The indexscan will begin delivering tuples right away, whereas the sort has to finish the sequential scan and perform the sort before it can deliver the first tuple. OTOH the total cost to deliver the entire result is likely to be less for the sort plan (let's assume for this discussion that it is). So for the above query the planner should and will choose the sort plan. But for SELECT * FROM tab ORDER BY col LIMIT 1; it will choose the indexscan plan because of the low startup cost. This is implemented by pricing a query that uses LIMIT on the basis of linear interpolation between the startup and total costs, with the interpolation point determined by the fraction of tuples we expect to retrieve. This is all pretty clear and seems to work OK for stand-alone SELECT. But what about a DECLARE CURSOR? The planner has no way to know how much of the cursor's result will actually be FETCHed by the user, so it's not clear how to use all this shiny new LIMIT planning mechanism for a DECLARE CURSOR. What happens in 7.0 and current code is that for a DECLARE CURSOR, the planner ignores any LIMIT clause and arbitrarily assumes that the user will FETCH about 10% of the available data. Hence, the planning is done on the basis of least "startup + 0.10*(total - startup)" cost. Ignoring the limit clause was correct in 7.0, given the fact that the limit wouldn't actually be used at runtime, but it's wrong now (unless I'm beaten down on the semantics change). Also, the 10% estimate is the sort of compromise that's likely to satisfy nobody --- if you intend to fetch all the data, quite likely you want the least total cost, whereas if you only want the first few rows, you probably want a plan biased even more heavily towards startup cost at the expense of total cost. After thinking some more about yesterday's discussions, I propose that we adopt the following planning behavior for cursors: 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be just as reasonable, if people want to argue about the exact number; perhaps a SET variable is in order?). 10% seems to be a reasonable compromise between delivering tuples promptly and not choosing a plan that will take forever if the user fetches the whole result. 2. If DECLARE CURSOR contains a specific "LIMIT n" clause, plan on the assumption that n tuples will be fetched. For small n this allows the user to heavily bias the plan towards fast start. Since the LIMIT will actually be enforced by the executor, the user cannot bias the plan more heavily than is justified by the number of tuples he's intending to fetch, however. 3. If DECLARE CURSOR contains "LIMIT ALL", plan on the assumption that all tuples will be fetched, ie, select lowest-total-cost plan. (Note: LIMIT ALL has been in the grammar right along, but up to now it has been entirely equivalent to leaving out the LIMIT clause. This proposal essentially suggests allowing it to act as a planner hint that the user really does intend to fetch all the tuples.) Comments? regards, tom lane
At 12:18 27/10/00 -0400, Tom Lane wrote: > >1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the >basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be >just as reasonable, if people want to argue about the exact number; >perhaps a SET variable is in order?). 10% seems to be a reasonable >compromise between delivering tuples promptly and not choosing a plan >that will take forever if the user fetches the whole result. SET sounds good; will this work on a per-connection basis? >2. If DECLARE CURSOR contains a specific "LIMIT n" clause, plan on >the assumption that n tuples will be fetched. For small n this allows >the user to heavily bias the plan towards fast start. Since the LIMIT >will actually be enforced by the executor, the user cannot bias the >plan more heavily than is justified by the number of tuples he's >intending to fetch, however. Fine. >3. If DECLARE CURSOR contains "LIMIT ALL", plan on the assumption that >all tuples will be fetched, ie, select lowest-total-cost plan. Good. > >Comments? > I don't suppose you'd consider 'OPTIMIZE FOR TOTAL COST' and 'OPTIMIZE FOR FAST START' optimizer hints? Also, does the change you have made to the executor etc mean that subselect-with-limit is now possible? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 12:18 27/10/00 -0400, Tom Lane wrote: >> 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the >> basis of 10%-or-so fetch (I'd consider anywhere from 5% to 25% to be >> just as reasonable, if people want to argue about the exact number; >> perhaps a SET variable is in order?). > SET sounds good; will this work on a per-connection basis? A SET variable would be connection-local, same as any other ... > I don't suppose you'd consider 'OPTIMIZE FOR TOTAL COST' and 'OPTIMIZE FOR > FAST START' optimizer hints? I don't much care for adding such syntax to DECLARE CURSOR, if that's what you're suggesting. LIMIT ALL would have the same effect as 'OPTIMIZE FOR TOTAL COST' anyway. LIMIT 1 (or a small number) would have the effect of 'OPTIMIZE FOR FAST START', but would constrain you to not fetch any more rows than that. If we had a SET variable then you could twiddle that value to favor fast-start or total-cost concerns over a continuous range, without constraining how many rows you actually fetch from a LIMIT-less cursor. > Also, does the change you have made to the executor etc mean that > subselect-with-limit is now possible? The executor will do it, but unless Kevin figures out how to fix the grammar, you'll have to put the LIMIT into a view definition, not inline in a subquery. View-with-LIMIT does work as of today. regards, tom lane
At 12:18 PM 10/27/00 -0400, Tom Lane wrote: >Hiroshi was a little concerned about this change in behavior, and >so the first order of business is whether anyone wants to defend the >old way? IMHO it was incontrovertibly a bug, but ... Sure feels like a bug to me. Having it ignored isn't what I'd expect. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Tom Lane writes: > 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the > basis of 10%-or-so fetch I'd say that normally you're not using cursors because you intend to throw away 80% or 90% of the result set, but instead you're using it because it's convenient in your programming environment (e.g., ecpg). There are other ways of getting only some rows, this is not it. So I think if you want to make optimization decisions based on cursors being used versus a "normal" select, then the only thing you can safely take into account is the network roundtrip and client processing per fetch, but that might be as random as anything. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
At 10:51 31/10/00 +0100, Peter Eisentraut wrote: >Tom Lane writes: > >> 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the >> basis of 10%-or-so fetch > >I'd say that normally you're not using cursors because you intend to throw >away 80% or 90% of the result set, but instead you're using it because >it's convenient in your programming environment (e.g., ecpg). There are >other ways of getting only some rows, this is not it. Yes! >So I think if you want to make optimization decisions based on cursors >being used versus a "normal" select, then the only thing you can safely >take into account is the network roundtrip and client processing per >fetch, but that might be as random as anything. Which is why I like the client being able to ask the optimizer for certain kinds of solutions *explicitly*. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> 1. If DECLARE CURSOR does not contain a LIMIT, continue to plan on the >> basis of 10%-or-so fetch > I'd say that normally you're not using cursors because you intend to throw > away 80% or 90% of the result set, but instead you're using it because > it's convenient in your programming environment (e.g., ecpg). There are > other ways of getting only some rows, this is not it. I didn't say I was assuming that the user would only fetch 10% of the rows. Since what we're really doing is a linear interpolation between startup and total cost, what this is essentially doing is favoring low startup cost, but not to the complete exclusion of total cost. I think that that describes the behavior we want for a cursor pretty well. It remains to argue about what the relative weighting ought to be ... which might be best answered by making it user-settable. regards, tom lane