Re: Ordered Partitioned Table Scans - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Ordered Partitioned Table Scans
Date
Msg-id 29278.1540998105@localhost
Whole thread Raw
In response to Re: Ordered Partitioned Table Scans  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Ordered Partitioned Table Scans  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> wrote:

> On 31 October 2018 at 13:05, David Rowley <david.rowley@2ndquadrant.com> wrote:
> >>> On 28 October 2018 at 03:49, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >> I've registered as a reviewer.   I still didn't have a deep look at
> >> the patch yet, but thanks a lot for working on it!
> >
> > Thanks for signing up to review.  I need to send another revision of
> > the patch to add a missing call to truncate_useless_pathkeys(). Will
> > try to do that today.
>
> I've attached a patch that ...

I've picked this one when looking around what I could review.

* As for the logic, I found generate_mergeappend_paths() to be the most
  interesting part:

Imagine table partitioned by "i", so "partition_pathkeys" is {i}.

partition 1:

i | j
--+--
0 | 0
1 | 1
0 | 1
1 | 0

partition 2:

i | j
--+--
3 | 0
2 | 0
2 | 1
3 | 1

Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
ordering of the subpaths should not change the way tuples are split into
partitions.

Obviously a problem is if "partition_pathkeys" and "pathkeys" lists start with
different items. To propose more generic rule, I used this example of
range-partitioned table, where "i" and "j" are the partitioning keys:

partition 1:

 i | j | k
---+---+---
 0 | 0 | 1
 0 | 0 | 0

partition 2:

 i | j | k
---+---+---
 0 | 1 | 0
 0 | 1 | 1

If the output "pathkey" is {i, k}, then the Append path makes rows of both
partitions interleave:

 i | j | k
---+---+---
 0 | 0 | 0
 0 | 1 | 0
 0 | 0 | 1
 0 | 1 | 1

So in general I think the restriction is that no valid position of "pathkeys"
and "partition_pathkeys" may differ. Or in other words: the shorter of the 2
pathkey lists must be contained in the longer one. Does it make sense to you?


Another problem I see is that build_partition_pathkeys() continues even if it
fails to create a pathkey for some partitioning column. In the example above
it would mean that the table can have "partition_pathkeys" equal to {j}
instead of {i, j} on some EC-related conditions. However such a key does not
correspond to reality - this is easier to imagine if another partition is
considered.

partition 3:

 i | j | k
---+---+---
 1 | 0 | 1
 1 | 0 | 0

So I think no "partition_pathkeys" should be generated in that case. On the
other hand, if the function returned the part of the list it could construct
so far, it'd be wrong because such incomplete pathkeys could pass the checks I
proposed above for reasons unrelated to the partitioning scheme.


The following comments are mostly on coding:

* Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
  this sentence in the header comment:

Note: If changing this, see build_partition_pathkeys()

However I could not find other use besides that in
RelationBuildPartitionDesc().

* create_append_path():

    /*
     * Apply query-wide LIMIT if known and path is for sole base relation.
     * (Handling this at this low level is a bit klugy.)
     */
    if (root != NULL && pathkeys != NULL &&
        bms_equal(rel->relids, root->all_baserels))
        pathnode->limit_tuples = root->limit_tuples;
    else
        pathnode->limit_tuples = -1.0;

  I think this optimization is not specific to AppendPath / MergeAppendPath,
  so it could be moved elsewhere (as a separate patch of course). But
  specifically for AppendPath, why do we have to test pathkeys? The pathkeys
  of the AppendPath do not necessarily describe the order of the set to which
  LIMIT is applied, so their existence should not be important here.

* If pathkeys is passed, shouldn't create_append_path() include the
  cost_sort() of subpaths just like create_merge_append_path() does?  And if
  so, then create_append_path() and create_merge_append_path() might
  eventually have some common code (at least for the subpath processing) to be
  put into a separate function.

* Likewise, create_append_plan() / create_merge_append_plan() are going to be
  more similar then before, so some refactoring could also make sense.

Although it's not too much code, I admit the patch is not trivial, so I'm
curious about your opinion.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Typo in xlog.c comment?
Next
From: Andrew Dunstan
Date:
Subject: Re: WIP Patch: Add a function that returns binary JSONB as a bytea