Re: Using indices for UNION. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Using indices for UNION.
Date
Msg-id 20131119.192415.119467380.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Using indices for UNION.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Using indices for UNION.
List pgsql-hackers
Thank you for looking in detail for this patch, and giving
thoughtful advices.

> I'm aware that you said you were going to refactor this, but I took a
> quick look through it anyway.  I don't think mere refactoring is going
> to make me happy with it :-(.

Ouch.. Anyway I've refactored this patch altogether with the
another 'unique index' patch and re-splitted into three
parts. One is the 'unique-index stuff' and the second is 'Adding
pathkeys and unique info into struct Plan' patch and the third is
'maily-in-prepunion stuff'. They will be sent in other messages
although for the scarce chance to be picked up :-)

> The basic problem here, as well as with some of the hackery that's
> been proposed recently in planner.c, is that we can't really get much
> further with improving this layer of the planner until we bite the
> bullet and convert this layer to work with Paths not finished Plans.

It is right but hard to go. I would be able to put my little
power on the issue but the planner hardly seems to be get
refactored gradually.

> Look at what you're proposing here: do a complete planning cycle on
> the subquery (underneath which both ordered and unordered Paths will
> be considered, and one or the other will get thrown away).  Then do
> *another* complete planning cycle with ordered output forced.  Then
> compare costs of the results.

Exactly.

> This is hugely inefficient, and it produces far-from-ideal
> results too, because you're forced to make a decision right
> there on which subquery plan to use; you can't make the
> globally optimal choice after considering costs of all the
> subqueries.

Umm. Yes, I know I've done it in somewhat brute way and it costs
a little too much if the subqueries of UNION is rather large, but
I suppose it comparably small to the whole execution time for
most cases.

Putting it aside, from the viewpoint of global-optimizations,
current planner also doesn't do such optimizations.  Moreover it
doesn't consider sorted plans possibly available and
effective. Ignoring the additional planning cost (:-), for the
UNION queries with LIMITS on large partitioned tables with
apropriate indexes (mmm. I suppose it is not so narrow gate..),
the query runtime should be extremely shortend.

>  What we need to do is fix things so we can get multiple Paths
> out of the subquery planning step, some ordered and some not.
> Then we could construct Paths for both the brute force and
> merge-append styles of computing the UNION result, and finally
> choose the cheapest Path at the top level.

Agreed with it at a glance. It sounds quite reasonable. I've been
a bit worried about that the paths and plans seem to be fused in
some extent in grouping_planner, and prepunion
(plan_set_operations) is jumping over path-generation phase to
make plans directly. (And about that I pushed it more on the
way..)

> The same goes for hacking around in grouping_planner.  That function
> is well past the point of collapsing of its own weight; we need
> to invest some effort in refactoring before we can afford to add
> much more complexity there.  And I think the logical way to refactor
> is to rewrite the code in a Path-generation-and-comparison style.
> (Actually, grouping_planner would need to be fixed first, since
> that's what would have to produce the Paths we're hoping to compare
> in prepunion.)

It seems necessary but also seems far far away and hard to
go.

> I had hoped to make some progress on that rewrite this past summer,
> but ended up with no time to work on it :-(.  There's going to be
> a lot of boring infrastructure work before we see much in the way
> of user-visible improvement, I'm afraid, so it's kind of hard to
> make time for it.

Anyway, because I've happened to pass too close by the issue, I
will consider on that for some time (although I would hardly come
up with spiffy solution in a short time.)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Rohit Goyal
Date:
Subject: Call flow of btinsert(PG_FUNCTION_ARGS)
Next
From: Peter Geoghegan
Date:
Subject: Re: Call flow of btinsert(PG_FUNCTION_ARGS)