Home > mailing lists

Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: UNION ALL has higher cost than inheritance
Date	November 8, 2010 14:20:04
Msg-id	27098.1289240394@sss.pgh.pa.us Whole thread Raw
In response to	Re: UNION ALL has higher cost than inheritance (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: UNION ALL has higher cost than inheritance Re: UNION ALL has higher cost than inheritance
List	pgsql-hackers

Tree view

I wrote:
> I did some hacking on this and came up with the attached patch, which
> could use a bit more work on the comments but passes regression tests.
> However, this just solves the issue of being smart about top-level
> UNION ALL cases.  It might be worth looking into using MergeAppend
> for the sorting required for other types of set operations.  That would
> involve quite a different patch, and I'm not sure if it'd remove the
> need for this one or not.

I spent some more time thinking about this.  It seems difficult to make
any real further progress without essentially throwing out the current
approach to planning set-operation trees and starting over.  What we'd
want is to generate Paths representing the different ways a set-op could
be implemented and then pick the best one.  I can see the general shape
of how to do that: a set-op should be thought of as generating a
constrained join relation in a fashion similar to an OUTER JOIN
construct, and then the different Paths are possible implementations for
the join rel.  We'd need new Path types for hashed or sorted UNION,
INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
struct (or extend that struct to cover these cases).

It strikes me that maybe top-level DISTINCT could be married into this
too, since it's basically the same as a one-input UNION operator.  But
not sure what to do with DISTINCT ON.

Another thought here is that the current definition of the SetOp
execution node type could be improved.  Instead of taking a single
pre-merged input tuple stream, it'd be better if it had two inputs
like a JOIN node.  Then we could eliminate the "flag" column, which
would simplify matters for planning and reduce the amount of data
that has to be passed through sorting.  SetOp itself would need to
borrow some of the capability of MergeAppend so that it could read
two sorted streams in parallel and keep them in sync.

But this all looks like a pretty substantial amount of work, and
given the low level of user demand for improving the performance of
set operations, it seems to belong fairly far down the to-do list.
So I'm not going to tackle it now.  Barring objection, I'll clean up
yesterday's patch a bit more and commit it.
        regards, tom lane

pgsql-hackers by date:

From: Aidan Van Dyk
Date: 08 November 2010, 14:00:25
Subject: Re: Protecting against unexpected zero-pages: proposal

From: Pavel Stehule
Date: 08 November 2010, 14:25:05
Subject: proposal: plpgsql - iteration over fields of rec or row variable

Re: UNION ALL has higher cost than inheritance - Mailing list pgsql-hackers

Previous

Next