Re: Ordered Partitioned Table Scans - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Ordered Partitioned Table Scans |
Date | |
Msg-id | CAKJS1f-BS=3GmMD50riFQOsbWMkOgjW7=DrFRh-wmS3PQsXYOw@mail.gmail.com Whole thread Raw |
In response to | Re: Ordered Partitioned Table Scans (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
Re: Ordered Partitioned Table Scans
(David Rowley <david.rowley@2ndquadrant.com>)
Re: Ordered Partitioned Table Scans (Julien Rouhaud <rjuju123@gmail.com>) |
List | pgsql-hackers |
Thanks for looking at this. On 28 October 2018 at 03:49, Julien Rouhaud <rjuju123@gmail.com> wrote: > I just had a look at your patch. I see that you implemented only a > subset of the possible optimizations (only the case for range > partitionoing without subpartitions). This has been previously > discussed, but we should be able to do similar optimization for list > partitioning if there's no interleaved values, and also for some cases > of multi-level partitioning. I had thought about these cases but originally had thought they would be more complex to implement than I could justify. On review, I've found some pretty cheap ways to handle both sub-partitions and for LIST partitioned tables. Currently, with LIST partitioned tables I've coded it to only allow the optimisation if there's no DEFAULT partition and all partitions are defined with exactly 1 Datum. This guarantees that there are no interleaved values, but it'll just fail to optimise cases like FOR VALUES IN(1,2) + FOR VALUES In(3,4). The reason that I didn't go to the trouble of the additional checks was that I don't really want to add any per-partition overhead to this. If RelOptInfo had a Bitmapset of live partitions then we could just check the partitions that survived pruning. Amit Langote has a pending patch which does that and some other useful stuff, so maybe we can delay fixing that until the dust settles a bit in that area. Amit and I are both working hard to remove all these per-partition overheads. I imagine he'd also not be in favour of adding code that does something for all partitions when we've pruned down to just 1. I've personally no objection to doing the required additional processing for the non-pruned partitions only. We could also then fix the case where we disable the optimisation if there's a DEFAULT partition without any regards to if it's been pruned or not. > Concerning the implementation, there's at least one issue: it assumes > that each subpath of a range-partitioned table will be ordered, with > is not guaranteed. You need to to generate explicit Sort nodes nodes > (in the same order as the query_pathkey) for partitions that don't > have an ordered path and make sure that this path is used in the > Append. Here's a simplistic case showing the issue (sorry, the > partition names are poorly chosen): Thanks for noticing this. I had been thrown off due to the fact that Paths are never actually created for these sorts. On looking further I see that we do checks during createplan to see if the path is suitability sorted and just create a sort node if it's not. This seems to go against the whole point of paths, but I'm not going to fight for changing it, so I've just done the Append the same way as MergeAppend handles it. > Also, if a LIMIT is specified, it should be able to give better > estimates, at least if there's no qual. For instance: > > EXPLAIN SELECT * FROM simple ORDER BY id LIMIT 10; > QUERY PLAN > > > -------------------------------------------------------------------------------------------------------> > Limit (cost=0.00..0.35 rows=10 width=15) > -> Append (cost=0.00..7705.56 rows=219999 width=15) > -> Seq Scan on simple_0_1 (cost=0.00..309.00 rows=20000 width=15) > -> Index Scan using simple_1_2_id_idx on simple_1_2 > (cost=0.29..3148.28 rows=99999 width=14) > -> Index Scan using simple_2_3_id_idx on simple_2_3 > (cost=0.29..3148.29 rows=100000 width=16) > (5 rows) > > In this case, we should estimate that the SeqScan (or in a corrected > version the Sort) node should not return more than 10 rows, and each > following partition should be scanned at all, and cost each path > accordingly. I think that this is quite important, for instance to > make sure that natively sorted Append is chosen over a MergeAppend > when there are some subpath with explicit sorts, because with the > Append we probably won't have to execute all the sorts if the previous > partition scans returned enough rows. In my patch, I'm not adding any additional paths. I'm just adding an Append instead of a MergeAppend. For what you're talking about the limit only needs to be passed into any underlying Sort so that it can become a top-N sort. This is handled already in create_limit_path(). Notice in the plan you pasted above that the limit has a lower total cost than its Append subnode. That's because create_limit_path() weighted the Limit total cost based on the row count of the limit and its subpath. ... 7705.56 / 219999 * 10 = ~0.35. > FWIW, both those cases were handled (probably with some bugs though) > in the previous patches Ronan and I sent some time ago. Also, I did > not forget about this feature, I planned to work on it in hope to have > it included in pg12. However, I won't have a lot of time to work on > it before December. I apologise for not noticing your patch. I only went as far as checking the November commitfest to see if anything existed already and I found nothing there. I have time to work on this now, so likely it's better if I continue, just in case your time in December does not materialise. v2 of the patch is attached. I've not had time yet to give it a throughout post write review, but on first look it seems okay. The known limitations are: * Disables the optimisation even if the DEFAULT partition is pruned. * Disables the optimisation if LIST partitioned tables have any partitions allowing > 1 value. * Fails to optimise UNION ALLs with partitioned tables. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: