Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: UNION ALL has higher cost than inheritance |
Date | |
Msg-id | AANLkTi=-jCsqj3v-pTZfOg+KbEr1S-AzSiOSdkU=mbvX@mail.gmail.com Whole thread Raw |
In response to | Re: UNION ALL has higher cost than inheritance (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: UNION ALL has higher cost than inheritance
|
List | pgsql-hackers |
On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I did some hacking on this and came up with the attached patch, which >> could use a bit more work on the comments but passes regression tests. >> However, this just solves the issue of being smart about top-level >> UNION ALL cases. It might be worth looking into using MergeAppend >> for the sorting required for other types of set operations. That would >> involve quite a different patch, and I'm not sure if it'd remove the >> need for this one or not. > > I spent some more time thinking about this. It seems difficult to make > any real further progress without essentially throwing out the current > approach to planning set-operation trees and starting over. What we'd > want is to generate Paths representing the different ways a set-op could > be implemented and then pick the best one. I can see the general shape > of how to do that: a set-op should be thought of as generating a > constrained join relation in a fashion similar to an OUTER JOIN > construct, and then the different Paths are possible implementations for > the join rel. We'd need new Path types for hashed or sorted UNION, > INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo > struct (or extend that struct to cover these cases). It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a semi- or anti-join (on the entire set of output columns). But without ALL it's completely different. > It strikes me that maybe top-level DISTINCT could be married into this > too, since it's basically the same as a one-input UNION operator. But > not sure what to do with DISTINCT ON. > > Another thought here is that the current definition of the SetOp > execution node type could be improved. Instead of taking a single > pre-merged input tuple stream, it'd be better if it had two inputs > like a JOIN node. Then we could eliminate the "flag" column, which > would simplify matters for planning and reduce the amount of data > that has to be passed through sorting. SetOp itself would need to > borrow some of the capability of MergeAppend so that it could read > two sorted streams in parallel and keep them in sync. > > But this all looks like a pretty substantial amount of work, and > given the low level of user demand for improving the performance of > set operations, it seems to belong fairly far down the to-do list. > So I'm not going to tackle it now. Barring objection, I'll clean up > yesterday's patch a bit more and commit it. I agree. If we had infinite resources it would be nice to tackle this, but I think we have bigger fish to fry. In particular, I wonder if you've thought any more about the generalized inner-indexscan machinery, or taken a look at any of the issues around KNNGIST. I'd like to see our limited supply of planner-fu invested in those areas, or perhaps in making inner join removal work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: