Re: Ordered Partitioned Table Scans - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Ordered Partitioned Table Scans |
Date | |
Msg-id | CAKJS1f8Oey0qPq80Xk7thhtOK2mPJ_pTFnnc6ARYX3T3cDo0GQ@mail.gmail.com Whole thread Raw |
In response to | Ordered Partitioned Table Scans (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Ordered Partitioned Table Scans
|
List | pgsql-hackers |
On Wed, 27 Mar 2019 at 21:24, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Noticed a typo. > > + * multiple subpaths then we can't make guarantees about the > + * order tuples in those subpaths, so we must leave the > > order of tuples? I'll fix that. Thanks. > Again, sorry if this was discussed, but I got curious about why > partitions_are_ordered() thinks it can say true even for an otherwise > ordered list-partitioned table, but containing a null-only partition, > which is *always* scanned last. If a query says ORDER BY partkey NULLS > FIRST, then it's not alright to proceed with assuming partitions are > ordered even if partitions_are_ordered() said so. If it's *always* scanned last then it's fine for ORDER BY partkey NULLS LAST. If they have ORDER BY partkey NULLS FIRST then we won't match on the pathkeys. Or if they do ORDER BY partkey DESC NULLS FIRST, then we're also fine, since we reverse the order of the subpaths list in that case. ORDER BY partkey DESC NULLS LAST is not okay, and we don't optimise that since it won't match the pathkeys we generate in build_partition_pathkeys(). > Related, why does build_partition_pathkeys() pass what it does for > nulls_first parameter of make_pathkey_from_sortinfo()? > > cpathkey = make_pathkey_from_sortinfo(root, > keyCol, > NULL, > partscheme->partopfamily[i], > partscheme->partopcintype[i], > partscheme->partcollation[i], > ScanDirectionIsBackward(scandir), > ==> ScanDirectionIsBackward(scandir), > 0, > partrel->relids, > false); > > I think null values are almost always placed in the last partition, unless > the null-accepting list partition also accepts some other non-null value. > I'm not sure exactly how we can determine the correct value to pass here, > but what's there in the patch now doesn't seem to be it. The code looks okay to me. It'll generate pathkeys like ORDER BY partkey NULLS LAST for forward scans and ORDER BY partkey DESC NULLS FIRST for backwards scans. Can you explain what cases you think the code gets wrong? Here's a preview of the actual and expected behaviour: # explain (costs off) select * from listp order by a asc nulls last; QUERY PLAN ------------------------------------------------------------ Append -> Index Only Scan using listp1_a_idx on listp1 -> Index Only Scan using listp2_a_idx on listp2 -> Index Only Scan using listp_null_a_idx on listp_null (4 rows) # explain (costs off) select * from listp order by a asc nulls first; -- not optimised QUERY PLAN ------------------------------------ Sort Sort Key: listp1.a NULLS FIRST -> Append -> Seq Scan on listp1 -> Seq Scan on listp2 -> Seq Scan on listp_null (6 rows) # explain (costs off) select * from listp order by a desc nulls first; -- subpath list is simply reversed in this case. QUERY PLAN --------------------------------------------------------------------- Append -> Index Only Scan Backward using listp_null_a_idx on listp_null -> Index Only Scan Backward using listp2_a_idx on listp2 -> Index Only Scan Backward using listp1_a_idx on listp1 (4 rows) # explain (costs off) select * from listp order by a desc nulls last; -- not optimised QUERY PLAN -------------------------------------- Sort Sort Key: listp1.a DESC NULLS LAST -> Append -> Seq Scan on listp1 -> Seq Scan on listp2 -> Seq Scan on listp_null (6 rows) We could likely improve the two cases that are not optimized by putting the NULL partition in the correct place in the append subpaths, but for now, we don't really have an efficient means to identify which subpath that is. I've not looked at your partition planning improvements patch for a while to see if you're storing a Bitmapset of the non-pruned partitions in RelOptInfo. Something like that would allow us to make this better. Julien and I have talked about other possible cases to optimise if we have that. e.g if the default partition is pruned then we can optimise a RANGE partitioned table with a default. So there's definitely more to be done on this. I think there's a general consensus that what we're doing in the patch already is enough to be useful. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: