Re: [HACKERS] Solution for LIMIT cost estimation - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Solution for LIMIT cost estimation |
Date | |
Msg-id | 18098.950241132@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Solution for LIMIT cost estimation (Chris Bitmead <chrisb@nimrod.itg.telstra.com.au>) |
Responses |
Re: [HACKERS] Solution for LIMIT cost estimation
Re: [HACKERS] Solution for LIMIT cost estimation Re: [HACKERS] Solution for LIMIT cost estimation Re: [HACKERS] Solution for LIMIT cost estimation |
List | pgsql-hackers |
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > ... it sounds like this > proposal could mean that successive SELECTS with LIMIT could > execute completely different plans and therefore return inconsistent > results. > In short, I think the fact that limit doesn't alter the plan may > be more of a feature than a bug. A good point (one I'm embarrassed not to have thought about!) but I don't think it's a reason not to push forward in this direction. We have had *way* too many complaints about Postgres not being able to optimize LIMITed queries, so I'm not willing to ignore the issue; something's got to be done about it. As Don B. points out nearby, there's no guarantee anyway that repeatedly issuing the same query with different LIMIT parameters will get you consistent results. The only upright and morally correct approach is to use DECLARE CURSOR followed by FETCH commands (all within a transaction of course) in order to get results that are really all part of a single query. Now DECLARE CURSOR is also presently optimized on the basis of fetching the entire result set, so that is still a problem --- I neglected to mention before that I was intending to tweak the optimizer to optimize that case like a moderate-sized LIMIT. But having said that, I hear what you're saying and I think it's worth thinking about. Here are four possible alternative responses: 1. Optimize the query as I sketched previously, but using the "count" part of the LIMIT spec while deliberately ignoring the "offset". Then you get consistent results for fetching different chunks of the query result as long as you use the same count each time. (And as long as no one else is changing the DB meanwhile, but we'll take that as read for each of these choices.) 2. Ignore both the count and offset, and optimize any query containing a LIMIT clause on the basis of some fixed assumption about what fraction of the tuples will be fetched (I'm thinking something like 1% to 10%). This allows different fetch sizes to be used without destroying consistency, but it falls down on the goal of correctly optimizing "LIMIT 1" hacks. 3. Use the count+offset for all it's worth, and document that you can't expect to get consistent results from asking for different LIMITed chunks of the "same" query unless you use ORDER BY to ensure consistent ordering of the tuples. 4. Fascist variant of #3: make LIMIT without ORDER BY be an error. SQL92 does not define LIMIT at all, so it's not much help in deciding what to do. Is there anything in SQL3? What do other DBMSes do about this issue? Comments, other variants, better ideas anyone? > The other thing is, I would like at some stage to change limit so > that it is attached to a SELECT rather than an entire query so > you could... > SELECT * from x where y in (SELECT y from z LIMIT 10) LIMIT 20; > and I'm not sure how this would interact with that. Since ORDER BY is only allowed at the top level by SQL92, there would be no way for the user to ensure predictable results from such a query. I think that'd be a dangerous path to go down. However, if you had an answer that ensured consistent results from queries with sub-LIMITs, I don't see that there'd be any problem with allowing the optimizer to optimize 'em. regards, tom lane
pgsql-hackers by date: