Re: Pull up aggregate subquery - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Pull up aggregate subquery |
Date | |
Msg-id | BANLkTi=iTUfvYgL5=WXfBK+xe0=Lh7MG9Q@mail.gmail.com Whole thread Raw |
In response to | Re: Pull up aggregate subquery (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Pull up aggregate subquery
|
List | pgsql-hackers |
On Tue, May 24, 2011 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I must be misunderstanding you, because index scans are the thing we >> already *do* parameterize; and what else would make any sense? > > The point I was trying to make is that the ultimate reason for having a > parameterized portion-of-a-plan will be that there's a parameterized > indexscan somewhere down at the bottom. I had originally imagined that > we might parameterize any old scan; for example consider replacing > > Nestloop > Join Filter: a.x = b.y > -> Seqscan on a > -> Seqscan on b > > with > > Nestloop > -> Seqscan on a > -> Seqscan on b > Filter: b.y = a.x Oh, I see. I have a general gripe with nested loop plans: we already consider too many of them. IIRC, when I last fooled around with this, the number of nested loop paths that we generate far exceeded the number of merge or hash join paths, and most of those paths suck and are a complete waste of time. It strikes me that we ought to be trying to find ways to get rid of some of the paths we're already considering, rather than adding any more. In this particular case, if the second plan is actually faster, it probably won't be by much; we could think about trying to make some kind of ex-post-facto transformation instead of throwing everything into the path machinery. > Although this isn't nearly as useful as if we could be using an index on > b.y, there would still be some marginal gain to be had, because we'd be > able to reject rows of b without first passing them up to the join node. > But I'm afraid that going all-out like that would slow the planner down > far too much (too many Paths to consider) to be justified by a marginal > runtime gain. Agreed. > So the idea I have at the moment is that we'll still only parameterize > indexscans, but then allow those to be joined to unrelated relations > while still remaining parameterized. That should reduce the number > of parameterized Paths hanging around, because only joinclauses that > match indexes will give rise to such Paths. That seems fine, yeah. If anything, we might want to limit it even more, but certainly that's a good place to start, and see how it shakes out. > But I think this is all fairly unrelated to the case that Hitoshi is on > about. As you said earlier, it seems like we'd have to derive both > parameterized and unparameterized plans for the subquery, which seems > mighty expensive. That was my first thought, too, but then I wondered if I was getting cheap. Most of the time, the subquery will be something simple, and replanning it twice won't really matter much. If it happens to be something complicated, then it will take longer, but on the other hand that's exactly the sort of byzantine query where you probably want the planner to pull out all the stops. Aggregates tend to "feel" slow almost invariably, because the amount of data being processed under the hood is much larger than what actually gets emitted, so I think we should at least consider the possibility that users really won't care about a bit of extra work. The case I'm concerned about is where you have several levels of nested aggregates, and the effect starts to multiply. But even if that turns out to be a problem, we could handle it by limiting consideration of the alternate path to the top 1 or 2 levels and handle the rest as we do now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: