Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200328213050.byyeuozylnxv2bhf@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Sat, Mar 28, 2020 at 10:19:04AM -0400, James Coleman wrote:
>On Fri, Mar 27, 2020 at 10:58 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> ...
>>
>> The more I look at pathkeys_useful_for_ordering() the more I think the
>> get_useful_pathkeys_for_relation() function should look more like it
>> than the postgres_fdw one ...
>
>If we go down that path (and indeed this is actually implied by
>removing the volatile check too), doesn't that really just mean (at
>least for now) that get_useful_pathkeys_for_relation goes away
>entirely and we effectively use root->query_pathkeys directly? The
>only thing your lose them is the check that each eclass has a member
>in the rel. But that check probably wasn't quite right anyway: at
>least for incremental sort (maybe not regular sort), I think we
>probably don't necessarily care that the entire list has members in
>the rel, but rather that some prefix does, right? The idea there would
>be that we could create a gather merge here that supplies a partial
>ordering (that prefix of query_pathkeys) and then above it the planner
>might place another incremental sort (say above a join), or perhaps
>even a join above that would actually be sufficient (since many joins
>are capable of providing an ordering anyway).
>
>I've attached (added prefix .txt to avoid the cfbot assuming this is a
>full series) an idea in that direction to see if we're thinking along
>the same route. You'll want to apply and view with `git diff -w`
>probably.
>

Hmmm, I'm not sure the patch is quite correct.

Firstly, I suggest we don't remove get_useful_pathkeys_for_relation
entirely, because that allows us to add more useful pathkeys in the
future (even if we don't consider pathkeys useful for merging now).
We could also do the EC optimization in the function, return NIL and
not loop over partial_pathlist at all. That's cosmetic issue, though.

More importantly, I'm not sure this makes sense:

   /* consider incremental sort for interesting orderings */
   useful_pathkeys = truncate_useless_pathkeys(root, rel, subpath->pathkeys);

   ...

   is_sorted = pathkeys_common_contained_in(useful_pathkeys,
                                            subpath->pathkeys,
                                            &presorted_keys);

I mean, useful_pathkeys is bound to be a sublist of subpath->pathkeys,
right? So how could this ever return is_sorted=false?

The whole point is to end up with query_pathkeys (or whatever pathkeys
we deem useful), but this does not do that.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Next
From: Ranier Vilela
Date:
Subject: [PATCH] Redudant initilization