Re: Using indices for UNION. - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Using indices for UNION. |
Date | |
Msg-id | 20131119.192415.119467380.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Using indices for UNION. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Using indices for UNION.
|
List | pgsql-hackers |
Thank you for looking in detail for this patch, and giving thoughtful advices. > I'm aware that you said you were going to refactor this, but I took a > quick look through it anyway. I don't think mere refactoring is going > to make me happy with it :-(. Ouch.. Anyway I've refactored this patch altogether with the another 'unique index' patch and re-splitted into three parts. One is the 'unique-index stuff' and the second is 'Adding pathkeys and unique info into struct Plan' patch and the third is 'maily-in-prepunion stuff'. They will be sent in other messages although for the scarce chance to be picked up :-) > The basic problem here, as well as with some of the hackery that's > been proposed recently in planner.c, is that we can't really get much > further with improving this layer of the planner until we bite the > bullet and convert this layer to work with Paths not finished Plans. It is right but hard to go. I would be able to put my little power on the issue but the planner hardly seems to be get refactored gradually. > Look at what you're proposing here: do a complete planning cycle on > the subquery (underneath which both ordered and unordered Paths will > be considered, and one or the other will get thrown away). Then do > *another* complete planning cycle with ordered output forced. Then > compare costs of the results. Exactly. > This is hugely inefficient, and it produces far-from-ideal > results too, because you're forced to make a decision right > there on which subquery plan to use; you can't make the > globally optimal choice after considering costs of all the > subqueries. Umm. Yes, I know I've done it in somewhat brute way and it costs a little too much if the subqueries of UNION is rather large, but I suppose it comparably small to the whole execution time for most cases. Putting it aside, from the viewpoint of global-optimizations, current planner also doesn't do such optimizations. Moreover it doesn't consider sorted plans possibly available and effective. Ignoring the additional planning cost (:-), for the UNION queries with LIMITS on large partitioned tables with apropriate indexes (mmm. I suppose it is not so narrow gate..), the query runtime should be extremely shortend. > What we need to do is fix things so we can get multiple Paths > out of the subquery planning step, some ordered and some not. > Then we could construct Paths for both the brute force and > merge-append styles of computing the UNION result, and finally > choose the cheapest Path at the top level. Agreed with it at a glance. It sounds quite reasonable. I've been a bit worried about that the paths and plans seem to be fused in some extent in grouping_planner, and prepunion (plan_set_operations) is jumping over path-generation phase to make plans directly. (And about that I pushed it more on the way..) > The same goes for hacking around in grouping_planner. That function > is well past the point of collapsing of its own weight; we need > to invest some effort in refactoring before we can afford to add > much more complexity there. And I think the logical way to refactor > is to rewrite the code in a Path-generation-and-comparison style. > (Actually, grouping_planner would need to be fixed first, since > that's what would have to produce the Paths we're hoping to compare > in prepunion.) It seems necessary but also seems far far away and hard to go. > I had hoped to make some progress on that rewrite this past summer, > but ended up with no time to work on it :-(. There's going to be > a lot of boring infrastructure work before we see much in the way > of user-visible improvement, I'm afraid, so it's kind of hard to > make time for it. Anyway, because I've happened to pass too close by the issue, I will consider on that for some time (although I would hardly come up with spiffy solution in a short time.) regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: