Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers

From Robert Haas
Subject Re: UNION ALL has higher cost than inheritance
Date
Msg-id AANLkTi=-jCsqj3v-pTZfOg+KbEr1S-AzSiOSdkU=mbvX@mail.gmail.com
Whole thread Raw
In response to Re: UNION ALL has higher cost than inheritance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: UNION ALL has higher cost than inheritance
List pgsql-hackers
On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I did some hacking on this and came up with the attached patch, which
>> could use a bit more work on the comments but passes regression tests.
>> However, this just solves the issue of being smart about top-level
>> UNION ALL cases.  It might be worth looking into using MergeAppend
>> for the sorting required for other types of set operations.  That would
>> involve quite a different patch, and I'm not sure if it'd remove the
>> need for this one or not.
>
> I spent some more time thinking about this.  It seems difficult to make
> any real further progress without essentially throwing out the current
> approach to planning set-operation trees and starting over.  What we'd
> want is to generate Paths representing the different ways a set-op could
> be implemented and then pick the best one.  I can see the general shape
> of how to do that: a set-op should be thought of as generating a
> constrained join relation in a fashion similar to an OUTER JOIN
> construct, and then the different Paths are possible implementations for
> the join rel.  We'd need new Path types for hashed or sorted UNION,
> INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
> struct (or extend that struct to cover these cases).

It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a
semi- or anti-join (on the entire set of output columns).   But
without ALL it's completely different.

> It strikes me that maybe top-level DISTINCT could be married into this
> too, since it's basically the same as a one-input UNION operator.  But
> not sure what to do with DISTINCT ON.
>
> Another thought here is that the current definition of the SetOp
> execution node type could be improved.  Instead of taking a single
> pre-merged input tuple stream, it'd be better if it had two inputs
> like a JOIN node.  Then we could eliminate the "flag" column, which
> would simplify matters for planning and reduce the amount of data
> that has to be passed through sorting.  SetOp itself would need to
> borrow some of the capability of MergeAppend so that it could read
> two sorted streams in parallel and keep them in sync.
>
> But this all looks like a pretty substantial amount of work, and
> given the low level of user demand for improving the performance of
> set operations, it seems to belong fairly far down the to-do list.
> So I'm not going to tackle it now.  Barring objection, I'll clean up
> yesterday's patch a bit more and commit it.

I agree.  If we had infinite resources it would be nice to tackle
this, but I think we have bigger fish to fry.  In particular, I wonder
if you've thought any more about the generalized inner-indexscan
machinery, or taken a look at any of the issues around KNNGIST.  I'd
like to see our limited supply of planner-fu invested in those areas,
or perhaps in making inner join removal work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Should we use make -k on the buildfarm?
Next
From: Robert Haas
Date:
Subject: Re: How to share the result data of separated plan