Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers

From Tom Lane
Subject Re: UNION ALL has higher cost than inheritance
Date
Msg-id 27098.1289240394@sss.pgh.pa.us
Whole thread Raw
In response to Re: UNION ALL has higher cost than inheritance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: UNION ALL has higher cost than inheritance
Re: UNION ALL has higher cost than inheritance
List pgsql-hackers
I wrote:
> I did some hacking on this and came up with the attached patch, which
> could use a bit more work on the comments but passes regression tests.
> However, this just solves the issue of being smart about top-level
> UNION ALL cases.  It might be worth looking into using MergeAppend
> for the sorting required for other types of set operations.  That would
> involve quite a different patch, and I'm not sure if it'd remove the
> need for this one or not.

I spent some more time thinking about this.  It seems difficult to make
any real further progress without essentially throwing out the current
approach to planning set-operation trees and starting over.  What we'd
want is to generate Paths representing the different ways a set-op could
be implemented and then pick the best one.  I can see the general shape
of how to do that: a set-op should be thought of as generating a
constrained join relation in a fashion similar to an OUTER JOIN
construct, and then the different Paths are possible implementations for
the join rel.  We'd need new Path types for hashed or sorted UNION,
INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
struct (or extend that struct to cover these cases).

It strikes me that maybe top-level DISTINCT could be married into this
too, since it's basically the same as a one-input UNION operator.  But
not sure what to do with DISTINCT ON.

Another thought here is that the current definition of the SetOp
execution node type could be improved.  Instead of taking a single
pre-merged input tuple stream, it'd be better if it had two inputs
like a JOIN node.  Then we could eliminate the "flag" column, which
would simplify matters for planning and reduce the amount of data
that has to be passed through sorting.  SetOp itself would need to
borrow some of the capability of MergeAppend so that it could read
two sorted streams in parallel and keep them in sync.

But this all looks like a pretty substantial amount of work, and
given the low level of user demand for improving the performance of
set operations, it seems to belong fairly far down the to-do list.
So I'm not going to tackle it now.  Barring objection, I'll clean up
yesterday's patch a bit more and commit it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Aidan Van Dyk
Date:
Subject: Re: Protecting against unexpected zero-pages: proposal
Next
From: Pavel Stehule
Date:
Subject: proposal: plpgsql - iteration over fields of rec or row variable