Re: Ordered Partitioned Table Scans - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Ordered Partitioned Table Scans
Date
Msg-id 4591.1541063138@localhost
Whole thread Raw
In response to Re: Ordered Partitioned Table Scans  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Ordered Partitioned Table Scans
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> wrote:

> On 1 November 2018 at 04:01, Antonin Houska <ah@cybertec.at> wrote:
> > * As for the logic, I found generate_mergeappend_paths() to be the most
> >   interesting part:
> >
> > Imagine table partitioned by "i", so "partition_pathkeys" is {i}.
> >
> > partition 1:
> >
> > i | j
> > --+--
> > 0 | 0
> > 1 | 1
> > 0 | 1
> > 1 | 0
> >
> > partition 2:
> >
> > i | j
> > --+--
> > 3 | 0
> > 2 | 0
> > 2 | 1
> > 3 | 1
> >
> > Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
> > ordering of the subpaths should not change the way tuples are split into
> > partitions.
> >
> > ...
>
> I understand what you're saying. I just don't understand what you
> think is wrong with the patch in this area.

I think these conditions are too restrictive:

    /*
     * Determine if these pathkeys match the partition order, or reverse
     * partition order.  It can't match both, so only go to the trouble of
     * checking the reverse order when it's not in ascending partition
     * order.
     */
    partition_order = pathkeys_contained_in(pathkeys,
                        partition_pathkeys);
    partition_order_desc = !partition_order &&
                pathkeys_contained_in(pathkeys,
                            partition_pathkeys_desc);


In the example above I wanted to show that your new feature should work even
if "pathkeys" is not contained in "partition_pathkeys".

> > Another problem I see is that build_partition_pathkeys() continues even if it
> > fails to create a pathkey for some partitioning column. In the example above
> > it would mean that the table can have "partition_pathkeys" equal to {j}
> > instead of {i, j} on some EC-related conditions. However such a key does not
> > correspond to reality - this is easier to imagine if another partition is
> > considered.
> >
> > partition 3:
> >
> >  i | j | k
> > ---+---+---
> >  1 | 0 | 1
> >  1 | 0 | 0
> >
> > So I think no "partition_pathkeys" should be generated in that case. On the
> > other hand, if the function returned the part of the list it could construct
> > so far, it'd be wrong because such incomplete pathkeys could pass the checks I
> > proposed above for reasons unrelated to the partitioning scheme.
>
> Oops. That's a mistake. We should return what we have so far if we
> can't make one of the pathkeys. Will fix.

I think no "partition_pathkeys" should be created in this case, but before we
can discuss this in detail there needs to be an agreement on the evaluation of
the relationship between "pathkeys" and "partition_pathkeys", see above.

> > The following comments are mostly on coding:
> >
> > * Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
> >   this sentence in the header comment:
> >
> > Note: If changing this, see build_partition_pathkeys()
> >
> > However I could not find other use besides that in
> > RelationBuildPartitionDesc().
>
> While the new code does not call those directly, the new code does
> depend on the sort order of the partitions inside the PartitionDesc,
> which those functions are responsible for. Perhaps there's a better
> way to communicate that.

I pointed this out because I suspect that changes of these functions would
affect more features, not only the one you're trying to implement.

> > * If pathkeys is passed, shouldn't create_append_path() include the
> >   cost_sort() of subpaths just like create_merge_append_path() does?  And if
> >   so, then create_append_path() and create_merge_append_path() might
> >   eventually have some common code (at least for the subpath processing) to be
> >   put into a separate function.
>
> It does. It's just done via the call to cost_append().

ok, I missed that.

> > * Likewise, create_append_plan() / create_merge_append_plan() are going to be
> >   more similar then before, so some refactoring could also make sense.
> >
> > Although it's not too much code, I admit the patch is not trivial, so I'm
> > curious about your opinion.
>
> I think the costing code is sufficiently different to warant not
> sharing more. For example, the startup costing is completely
> different. Append can start on the startup cost of the first subpath,
> but MergeAppend takes the sum of the startup cost of all subpaths.

ok

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Hooks to Modify Execution Flow and Query Planner
Next
From: Hubert Zhang
Date:
Subject: Re: Is there way to detect uncommitted 'new table' in pg_class?