Thread: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

[HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Ronan Dunklau

Date:

20 March 2017, 13:31:10

Hello,

With native partitioning landing in Postgres 10, we (Julien Rouhaud and 
myself) had the idea for the 
following simple optimisation. This simple optimisation comes from a real use 
case.

===== Proposal =====

With range partitioning, we guarantee that each partition contains non-
overlapping values. Since we know the range allowed for each partition, it is 
possible to sort them according to the partition key (as is done already for 
looking up partitions).

Thus, we ca generate sorted Append plans instead of MergeAppend when sorting 
occurs on the partition key.

For example, consider the following model and query:


CREATE TABLE parent (id int) PARTITION BY RANGE (id);
CREATE TABLE child2 PARTITION OF parent FOR VALUES FROM (10) TO (20);
CREATE TABLE child1 PARTITION OF parent FOR VALUES FROM (1) TO (10);
CREATE INDEX ON child1(id);
CREATE INDEX ON child2(id);



EXPLAIN (COSTS OFF) SELECT * FROM parent ORDER BY id desc;

                          QUERY PLAN                          
--------------------------------------------------------------
 Merge Append
   Sort Key: parent.id DESC
   ->  Sort
         Sort Key: parent.id DESC
         ->  Seq Scan on parent
   ->  Index Only Scan Backward using child2_id_idx on child2
   ->  Index Only Scan Backward using child1_id_idx on child

We can guarantee that every value found in child2 will have a greater id than 
any value found in child1. Thus, we could generate a plan like this:

                          QUERY PLAN                           
---------------------------------------------------------------
 Append
   Sort Key: child2.id DESC
   ->  Index Only Scan Backward using child2_id_idx1 on child2
   ->  Index Only Scan Backward using child1_id_idx1 on child1

Skipping the merge step altogether.

This has the added benefit of providing an "early stop": if we add a limit to 
the query, we will only scan the indexes that are necessary for fetching the 
required amount of tuples. This is especially useful if we add a WHERE clause 
not covered by the index with the limit. Consider the following example, with 
a table "webvisits" partitioned by a ts colum. If we want to retrieve the 
latest 100 hits from a specific ip:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM webvisits WHERE ipaddr = 
'93.184.216.34' ORDER BY ts DESC LIMIT 10;

                                                            

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..32.32 rows=10 width=104) (actual time=0.080..0.109 rows=10 
loops=1)
   Buffers: shared hit=7
   ->  Append  (cost=0.43..401009.65 rows=125740 width=104) (actual 
time=0.078..0.105 rows=10 loops=1)
         Sort Key: webvisits_201712.ts DESC
         Buffers: shared hit=7
         ->  Index Scan Backward using webvisits_201712_ipaddr_ts_idx on 
webvisits_201712  (cost=0.43..133617.70 rows=41901 width=104) (actual 
time=0.077..0.101 rows=10 loops=1)
               Index Cond: (ipaddr = '93.184.216.34'::inet)
               Buffers: shared hit=7
         ->  Index Scan Backward using webvisits_201711_ipaddr_ts_idx on 
webvisits_201711  (cost=0.43..131514.18 rows=41243 width=104) (never executed)
               Index Cond: (ipaddr = '93.184.216.34'::inet)
         ->  Index Scan Backward using webvisits_201710_ipaddr_ts_idx on 
webvisits_201710  (cost=0.43..135801.71 rows=42587 width=104) (never executed)
             ........

We have developed a proof-of-concept implementation for this optimisation.

For sub partitioning, intermediate Append nodes can be generated.

Consider the following examples:


CREATE TABLE parentparent (c1 int, c2 int) PARTITION BY range (c1);

-- Subpartition only on second column
CREATE TABLE parent1 PARTITION OF parentparent FOR VALUES FROM (1) TO (10)
PARTITION BY range (c2);

-- Subpartition on both columns
CREATE TABLE parent2 PARTITION OF parentparent FOR VALUES FROM (10) TO (20)
PARTITION BY range (c1, c2);

CREATE TABLE child11 PARTITION OF parent1 FOR VALUES FROM (1) TO (10);
CREATE TABLE child12 PARTITION OF parent1 FOR VALUES FROM (10) TO (20);
CREATE TABLE child21 PARTITION OF parent2 FOR VALUES FROM (10, 1) TO (15, 10);
CREATE TABLE child22 PARTITION OF parent2 FOR VALUES FROM (15, 10) TO (20, 
20);

EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1;
              QUERY PLAN               
---------------------------------------
 Append
   Sort Key: parent1.c1
   ->  Sort
         Sort Key: parent1.c1
         ->  Append
               ->  Seq Scan on parent1
               ->  Seq Scan on child11
               ->  Seq Scan on child12
   ->  Append
         Sort Key: child21.c1
         ->  Sort
               Sort Key: child21.c1
               ->  Seq Scan on child21
         ->  Sort
               Sort Key: child22.c1
               ->  Seq Scan on child22


EXPLAIN (COSTS OFF) SELECT * FROM parentparent ORDER BY c1, c2;
                   QUERY PLAN                   
------------------------------------------------
 Append
   Sort Key: parent1.c1, parent1.c2
   ->  Sort
         Sort Key: parent1.c1, parent1.c2
         ->  Append
               ->  Seq Scan on parent1
               ->  Seq Scan on child11
               ->  Seq Scan on child12
   ->  Append
         Sort Key: child21.c1, child21.c2
         ->  Sort
               Sort Key: child21.c1, child21.c2
               ->  Seq Scan on child21
         ->  Sort
               Sort Key: child22.c1, child22.c2
               ->  Seq Scan on child22



===== Patch design =====

First, we don't really know what we're doing :). But this is a proof of 
concept, and if there is interest in this patch, let us know how we should 
have done things and we're willing to rewrite it entirely if needed.

==== Overview ====

In the optimiser, we generate PathKeys representing the two sort orders by 
wich partition can be ordered (ASC and DESC), only for "native" range-based 
partitioned tables, and store them in RelOptInfo associated with every parent 
table, along with a list of OIDs corresponding to the partitions.

Then, when the time comes to generate append paths, we add another step which 
tries to match the query_pathkeys to those of the partition, and generate 
another append node with the sorted paths for each partitions, in the expected 
order, and a pathkey to the append path itself to indicate its already sorted.

==== Known Problems with the patch ====

Once again, keep in mind it's a proof-of-concept  

- It is in conflict with the existing patch designed to avoid adding the 
parent partitions
to the append node. As such, it will almost certainly need a full rewrite to 
adapt to that since that is done in this patch in a probably less than ideal 
fashion.
- As of now, the patch doesn't try to match the partition pathkey to 
mergejoinable pathkeys. 
- maybe a new node should be created altogether, instead of porting most of 
the MergeAppend struct fields to the basic Append ?
- the way we store the PathKeys and partitions OID directly in RelOptInfo is 
probably wrong
- the major impact on existing queries is the fact that to handle sub-
partitioning, RangeTblEntries representing intermediate append nodes are added 
(but just in the case of native partitioning, regular inheritance is not 
affected). See updated prepunion.c for what that means. Those RTE are then 
ignored 
when generating the real Append nodes.
- new regression tests have not been written yet, and existing ones haven't 
been "fixed" to take into account the different partition ordering: in case of 
sub partitioning, the order is now "depth-first" rather than "breadth-first" 
like it was earlier.
- no documentation has been added, although we don't really know where it 
should go
- the patch lacks a lot of comments
- the key displayed in EXPLAIN output comes from the first partition, not the 
parent.
- maybe a "real" SortPath should be generated, instead of generating Sort
nodes 
out of nowhere when creating the plan. This has been done to conform to what 
MergeAppend already did.

===== Questions =====

Is there interest for such an optimization ?
Should we work from the patch proposed to remove parent tables already ?
Should there be a new Node for such a "SortedAppend" operation or is it fine 
keeping it with the Append node already ? Should that instead be an 
optimization of the MergeAppend Node ?
What is or is not acceptable with regards to storing things in RelOptInfo 
(among others) ?

More generally, any kind of feedback that will help us get on the right track 
will be appreciated.


-- 
Ronan Dunklau & Julien Rouhaud
http://dalibo.com - http://dalibo.org
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

sorted_append_nodes_for_partitioning.patch

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Robert Haas

Date:

20 March 2017, 17:52:03

On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:
> With range partitioning, we guarantee that each partition contains non-
> overlapping values. Since we know the range allowed for each partition, it is
> possible to sort them according to the partition key (as is done already for
> looking up partitions).
>
> Thus, we ca generate sorted Append plans instead of MergeAppend when sorting
> occurs on the partition key.

Great idea.  This is too late for v10 at this point, but please add it
to the next CommitFest so we don't forget about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Ronan Dunklau

Date:

20 March 2017, 18:03:45

On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com> 
wrote:
> > With range partitioning, we guarantee that each partition contains non-
> > overlapping values. Since we know the range allowed for each partition, it
> > is possible to sort them according to the partition key (as is done
> > already for looking up partitions).
> > 
> > Thus, we ca generate sorted Append plans instead of MergeAppend when
> > sorting occurs on the partition key.
> 
> Great idea.  This is too late for v10 at this point, but please add it
> to the next CommitFest so we don't forget about it.

I know it is too late, and thought that it was too early to add it to the 
commitfest properly since so many design decisions should be discussed. Thanks 
for the feedback, I added it.


-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Ashutosh Bapat

Date:

24 March 2017, 10:14:03

On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:
> On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
>> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com>
> wrote:
>> > With range partitioning, we guarantee that each partition contains non-
>> > overlapping values. Since we know the range allowed for each partition, it
>> > is possible to sort them according to the partition key (as is done
>> > already for looking up partitions).
>> >
>> > Thus, we ca generate sorted Append plans instead of MergeAppend when
>> > sorting occurs on the partition key.
>>
>> Great idea.  This is too late for v10 at this point, but please add it
>> to the next CommitFest so we don't forget about it.
>
> I know it is too late, and thought that it was too early to add it to the
> commitfest properly since so many design decisions should be discussed. Thanks
> for the feedback, I added it.

Thanks for working on this. I was also thinking about the same.

I will try to get back to review/work with this for v11. Mean time, I
am working on partition-wise joins [1]. In those patches, I have added
a structure called PartitionScheme, which represents how a relation is
partitioned. For base relations it's mostly copy of PartitionDesc and
PartitionKey, but then it bubbles up across join, with each
partitioned join getting relevant partitioning scheme. If you could
base your design such that is uses PartitionScheme, it could be used
for joins and probably when Jeevan's patch for partition-wise
aggregate [2] comes along, it can be used with grouping.

[1] https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQvQTwzQOGczq_sw%40mail.gmail.com
[2] http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Proposal] Make the optimiser aware of partitions ordering

From

Ronan Dunklau

Date:

05 April 2017, 15:00:54

On vendredi 24 mars 2017 08:14:03 CEST Ashutosh Bapat wrote:
> On Mon, Mar 20, 2017 at 8:33 PM, Ronan Dunklau <ronan.dunklau@dalibo.com> 
wrote:
> > On lundi 20 mars 2017 15:52:03 CET Robert Haas wrote:
> >> On Mon, Mar 20, 2017 at 6:31 AM, Ronan Dunklau <ronan.dunklau@dalibo.com>
> > 
> > wrote:
> >> > With range partitioning, we guarantee that each partition contains non-
> >> > overlapping values. Since we know the range allowed for each partition,
> >> > it
> >> > is possible to sort them according to the partition key (as is done
> >> > already for looking up partitions).
> >> > 
> >> > Thus, we ca generate sorted Append plans instead of MergeAppend when
> >> > sorting occurs on the partition key.
> >> 
> >> Great idea.  This is too late for v10 at this point, but please add it
> >> to the next CommitFest so we don't forget about it.
> > 
> > I know it is too late, and thought that it was too early to add it to the
> > commitfest properly since so many design decisions should be discussed.
> > Thanks for the feedback, I added it.
> 
> Thanks for working on this. I was also thinking about the same.
> 
> I will try to get back to review/work with this for v11. Mean time, I
> am working on partition-wise joins [1]. In those patches, I have added
> a structure called PartitionScheme, which represents how a relation is
> partitioned. For base relations it's mostly copy of PartitionDesc and
> PartitionKey, but then it bubbles up across join, with each
> partitioned join getting relevant partitioning scheme. If you could
> base your design such that is uses PartitionScheme, it could be used
> for joins and probably when Jeevan's patch for partition-wise
> aggregate [2] comes along, it can be used with grouping.
> 
> [1]
> https://www.postgresql.org/message-id/CAFjFpRcMWwepj-Do1otxQ-GApGPSZ1FmH7YQ
> vQTwzQOGczq_sw%40mail.gmail.com [2]
> http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg308861.html

Thank you. I should probably wait until your patch is finalised before 
spending too much time on it, and I could probably also leverage the 
incremental sort patch if there is progress on it.




-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Re: [HACKERS] [Proposal] Make the optimiser aware of partitionsordering

From

Peter Eisentraut

Date:

31 August 2017, 06:30:58

On 3/20/17 11:03, Ronan Dunklau wrote:
>> Great idea.  This is too late for v10 at this point, but please add it
>> to the next CommitFest so we don't forget about it.
> I know it is too late, and thought that it was too early to add it to the 
> commitfest properly since so many design decisions should be discussed. Thanks 
> for the feedback, I added it.

This patch no longer applies.  Please send an update.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

21 September 2017, 01:00:01

On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 3/20/17 11:03, Ronan Dunklau wrote:
>>> Great idea.  This is too late for v10 at this point, but please add it
>>> to the next CommitFest so we don't forget about it.
>> I know it is too late, and thought that it was too early to add it to the
>> commitfest properly since so many design decisions should be discussed. Thanks
>> for the feedback, I added it.
>
> This patch no longer applies.  Please send an update.

Thanks for the reminder, and sorry for very very late answer.  PFA
rebased patch on current head.

I unfortunately didn't have time yet to read carefully the newly added
infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and
following), so this patch is still using its own copy of partitions
list.  I hope I can work on it shortly, but I prefer to send the
rebased version now, since it's still a POC with knowns problems and
questions, and I'll more than welcome any guidance with it.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

sorted_append_v2.diff

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Ashutosh Bapat

Date:

21 September 2017, 11:52:45

On Thu, Sep 21, 2017 at 3:30 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Thu, Aug 31, 2017 at 5:30 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 3/20/17 11:03, Ronan Dunklau wrote:
>>>> Great idea.  This is too late for v10 at this point, but please add it
>>>> to the next CommitFest so we don't forget about it.
>>> I know it is too late, and thought that it was too early to add it to the
>>> commitfest properly since so many design decisions should be discussed. Thanks
>>> for the feedback, I added it.
>>
>> This patch no longer applies.  Please send an update.
>
>
> Thanks for the reminder, and sorry for very very late answer.  PFA
> rebased patch on current head.
>
> I unfortunately didn't have time yet to read carefully the newly added
> infrastructure (30833ba154e0c1106d61e3270242dc5999a3e4f3 and
> following), so this patch is still using its own copy of partitions
> list.  I hope I can work on it shortly, but I prefer to send the
> rebased version now, since it's still a POC with knowns problems and
> questions, and I'll more than welcome any guidance with it.
>
>

With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
RelOptInfos of partitions in the RelOptInfo of partitioned table. For
range partitioned table without default partition, they are arranged
in the ascending order of partition bounds. This patch may avoid
MergeAppend if the sort keys match partition keys by creating an
Append path by picking sorted paths from partition RelOptInfos in
order. You may use slightly modified version of
add_paths_to_append_rel().

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

21 September 2017, 12:13:23

On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
> RelOptInfos of partitions in the RelOptInfo of partitioned table. For
> range partitioned table without default partition, they are arranged
> in the ascending order of partition bounds. This patch may avoid
> MergeAppend if the sort keys match partition keys by creating an
> Append path by picking sorted paths from partition RelOptInfos in
> order. You may use slightly modified version of
> add_paths_to_append_rel().


Yes, I just saw this commit this morning, and this is exactly what I
was missing, thanks for the pointer and the patch!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

22 September 2017, 22:34:11

On Thu, Sep 21, 2017 at 11:13 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
>> RelOptInfos of partitions in the RelOptInfo of partitioned table. For
>> range partitioned table without default partition, they are arranged
>> in the ascending order of partition bounds. This patch may avoid
>> MergeAppend if the sort keys match partition keys by creating an
>> Append path by picking sorted paths from partition RelOptInfos in
>> order. You may use slightly modified version of
>> add_paths_to_append_rel().
>
>
> Yes, I just saw this commit this morning, and this is exactly what I
> was missing, thanks for the pointer and the patch!

PFA v3 of the patch, once again rebased and that now use all the newly
available infrastructure.

I also added a check that a sorted append can't be generated when a
default partition has been declared.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

sorted_append_v3.diff

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Robert Haas

Date:

22 September 2017, 22:58:42

On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> PFA v3 of the patch, once again rebased and that now use all the newly
> available infrastructure.
>
> I also added a check that a sorted append can't be generated when a
> default partition has been declared.

I just took a quick look at this and was kind of surprised to see that
it didn't look much like what I would expect.

What I would expect is:

1. The PartitionDescData grows a new flag indicating whether
partitions can be regarded as being in bound order.  This is true for
range partitions without a default partition, list partitions without
a default partition and without interleaved bounds, and maybe other
cases we want to worry about later.  We set this flag correctly when
we build the PartitionDesc and propagate it into the RelOptInfo for
the partitioned table when we set up its other partitioning details.

2. generate_mergeappend_paths() gets renamed to
generate_sorted_append_paths().  At the top of the function, it checks
whether the above flag is set; if not, give up on this optimization.
Otherwise, figure out what the set of pathkeys that correspond to the
partition key would look like.  Call these the
partition_order_pathkeys.

3. For each set of possible pathkeys, it checks whether the proposed
pathkeys are equal to (or an initial prefix of) the
partition_order_pathkeys.

4. If so, instead of calling create_merge_append_path(), it calls
create_append_path().

5. create_append_path() gets the proposed pathkeys via a new List
*pathkeys argument and sticks them on the path.

And that's it.  The extensive patch does a lot of other stuff, like
whacking around the structure of AppendPath vs. MergeAppendPath, and
I'm not sure why we need or want any of that.  I might be missing
something, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

22 September 2017, 23:23:53

On Fri, Sep 22, 2017 at 9:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> PFA v3 of the patch, once again rebased and that now use all the newly
>> available infrastructure.
>>
>> I also added a check that a sorted append can't be generated when a
>> default partition has been declared.
>
> I just took a quick look at this and was kind of surprised to see that
> it didn't look much like what I would expect.
>
> What I would expect is:
>[...]

Thanks a lot for the pointers, that's really helpful!

>  The extensive patch does a lot of other stuff, like
> whacking around the structure of AppendPath vs. MergeAppendPath, and
> I'm not sure why we need or want any of that.  I might be missing
> something, though.

That was one of the first question we had with the initial POC.  The
reason is that this "sorted append" is not using a merge algorithm but
just appending partitions in the right order, so it looked like we
could either create a new SortedAppend node, or use current Append
node and allow it to return sorted data.  We chose the 2nd solution,
and ended up with a lot of duplicated code (all the code for the
ordering), so we tried to have Append and MergeAppend share this code.
I'm not at all satisfied with current shape, but I'm still not sure on
what node to use for this.  Do you have any idea on this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Robert Haas

Date:

23 September 2017, 00:09:09

On Fri, Sep 22, 2017 at 4:23 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> That was one of the first question we had with the initial POC.  The
> reason is that this "sorted append" is not using a merge algorithm but
> just appending partitions in the right order, so it looked like we
> could either create a new SortedAppend node, or use current Append
> node and allow it to return sorted data.  We chose the 2nd solution,
> and ended up with a lot of duplicated code (all the code for the
> ordering), so we tried to have Append and MergeAppend share this code.
> I'm not at all satisfied with current shape, but I'm still not sure on
> what node to use for this.  Do you have any idea on this?

During planning, *every* node has a list of pathkeys associated with
it which represent the presumed output ordering.  You can support
ordered append paths without changing any data structures at all; it's
just a matter of putting pathkeys into an AppendPath.

The reason why MergeAppend has extra stuff in the node (numCols,
sortColIdx, etc.) is so that it can actually perform the sort at
runtime.  But this feature requires no runtime support -- that's kinda
the point -- so there's no need for it to carry that information, or
to add any new nodes.

At least, IIUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

23 September 2017, 13:29:05

On Fri, Sep 22, 2017 at 11:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> During planning, *every* node has a list of pathkeys associated with
> it which represent the presumed output ordering.  You can support
> ordered append paths without changing any data structures at all; it's
> just a matter of putting pathkeys into an AppendPath.
>

I totally agree.

> The reason why MergeAppend has extra stuff in the node (numCols,
> sortColIdx, etc.) is so that it can actually perform the sort at
> runtime.  But this feature requires no runtime support -- that's kinda
> the point -- so there's no need for it to carry that information, or
> to add any new nodes.
>
> At least, IIUC.

That's true, but numCols, sortColdIdx etc are also used to display the
sort key in an explain.  If an append can return sorted data, it
should also display the sort information, so I think these fields are
still required in an Append node.

If it's ok to only add these fields in an Append node for the explain
part, I'll revert the Append/MergeAppend refactoring and do some other
cleanup as you advised earlier.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Robert Haas

Date:

26 September 2017, 15:56:08

On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> That's true, but numCols, sortColdIdx etc are also used to display the
> sort key in an explain.  If an append can return sorted data, it
> should also display the sort information, so I think these fields are
> still required in an Append node.

I don't think so.  An index scan doesn't output that information, nor
does a nested loop which inherits a sort order from its outer path.  I
think the rule is that a plan node which takes steps to get the data
into a certain order might want to output something about that, but a
plan node which somehow gets that ordering without taking any explicit
action doesn't print anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Julien Rouhaud

Date:

26 September 2017, 20:59:25

On Tue, Sep 26, 2017 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>> That's true, but numCols, sortColdIdx etc are also used to display the
>> sort key in an explain.  If an append can return sorted data, it
>> should also display the sort information, so I think these fields are
>> still required in an Append node.
>
> I don't think so.  An index scan doesn't output that information, nor
> does a nested loop which inherits a sort order from its outer path.  I
> think the rule is that a plan node which takes steps to get the data
> into a certain order might want to output something about that, but a
> plan node which somehow gets that ordering without taking any explicit
> action doesn't print anything.

Oh, ok that indeed makes sense.  As I said I'll remove all the useless
noise and send an updated patch.  Thanks again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

From

Michael Paquier

Date:

29 November 2017, 05:14:05

On Wed, Sep 27, 2017 at 2:59 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Tue, Sep 26, 2017 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Sat, Sep 23, 2017 at 6:29 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>> That's true, but numCols, sortColdIdx etc are also used to display the
>>> sort key in an explain.  If an append can return sorted data, it
>>> should also display the sort information, so I think these fields are
>>> still required in an Append node.
>>
>> I don't think so.  An index scan doesn't output that information, nor
>> does a nested loop which inherits a sort order from its outer path.  I
>> think the rule is that a plan node which takes steps to get the data
>> into a certain order might want to output something about that, but a
>> plan node which somehow gets that ordering without taking any explicit
>> action doesn't print anything.
>
> Oh, ok that indeed makes sense.  As I said I'll remove all the useless
> noise and send an updated patch.  Thanks again.

Waiting for a patch update for two months now, so marked as returned
with feedback.
-- 
Michael