Re: LIMIT causes SEQSCAN in subselect - Mailing list pgsql-performance
From | Mike Rylander |
---|---|
Subject | Re: LIMIT causes SEQSCAN in subselect |
Date | |
Msg-id | b918cf3d04121106375a955ce8@mail.gmail.com Whole thread Raw |
In response to | Re: LIMIT causes SEQSCAN in subselect (Josh Berkus <josh@agliodbs.com>) |
List | pgsql-performance |
On Fri, 10 Dec 2004 21:40:18 -0800, Josh Berkus <josh@agliodbs.com> wrote: > Mike, > The fact that the estimator knows that the LIMIT is pointless because there > are less rows in the subselect than the LIMIT will return is not something we > want to count on; sometimes the estimator has innaccurate information. The > UNIQUE index makes this more certain, except that I'm not sure that the > planner distinguishes between actual UNIQUE indexes and columns which are > estimated unique (per the pg_stats). And I think you can see in your case > that there's quite a difference between a column we're CERTAIN is unique, > versus a column we THINK is unique. Absolutely. At first I was going to ask if perhaps using the stats to discard the LIMIT would be possible, but since the stats are only guidelines I dropped that. The stats are just so tempting! > > > I realize this is a rather specialized case and not really great form. > > Exactly. You've grasped the main issue: that this has not been optimized > because it's bizarre and not very sensible query writing. Someday we'll get > around to optimizing the really wierd queries, but there's still a lot of > work to be done on the common ones (like count(*) ...). Absolutely. And if I can help out with the common cases to gain some Karmic currency I will. ;) After thinking about it some more, I don't think those queries we really all that wacky though. The problem with the example is that the generated query is very simple, and real-world queries that would be used in the subselect would be much more complex, and row estimation would be untrustworthy without a UNIQUE index. > > Keep in mind that the only reason we support LIMIT inside subqueries in the > first place is a workaround to slow aggregates, and a way to do RANK. It's > certainly not SQL-standard. > No it's not, but then nobody ever accused the authors of the SQL spec of being omniscient... I' cant think of another way to get, say, a 'top 10' list from a subselect, or use a paging iterator (LIMIT .. OFFSET ..) as the seed for an outer query. Well, other than an SRF of course. > > Just a matter of > > defining result sets independently, and creating a simple wrapper to > > join them. > > Well, if you think so, you know where to submit patches ... > Well, I do, but I was talking about it being 'easy' in the middleware. Just let PG handle optimizing the subselects. For example, you have a pile of predefined SELECTS that don't know they are related and are used for simple lookups. You tell the SQL generator thingy that it should use two of those, queries A and B, that they are related on x, and that you want to see the 'id' from A and the 'value' from B. Instead of having to preplan every possible combination of JOINS the SQL generator will toss the preplanned ones into subselects and join them in the outer query instead of having to rip them apart and calculate the join syntax. And yes, I know that view will take care of most of that for me... :) Thanks for all your comments. Pretty much what I expected, but I thought I'd raise a use case. I'll just have to give the query builder more smarts. > -- > Josh Berkus > Aglio Database Solutions > San Francisco > -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
pgsql-performance by date: