Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From James Coleman
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id CAAaqYe977wd4cyQno_pNBSzM4=2S8rzdxAE9kre2Vzh1JYMYpQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
List pgsql-hackers
On Sat, Jul 20, 2019 at 11:25 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
...
> >My current line of investigation is whether we need to do anything in
> >the parallel portion of create_ordered_paths(). I noticed that the
> >first-pass patch adding generate_useful_gather_paths() modified that
> >section but wasn't actually adding any new gather-merge paths (just
> >bare incremental sort paths). That seems pretty clearly just a
> >prototype miss, so I modified the prototype to build gather-merge
> >paths instead (as a side note that change seems to fix an oddity I was
> >seeing where plans would include a parallel index scan node even
> >though they weren't parallel plans). While the resulting plan for
> >something like:
> >
>
> Yes, that seems to be a bug. The block above it clealy has a gather
> merge nodes, so this one should too.
>
> >explain analyze select * from t where t.a in (1,2,3,4,5,6) order by
> >t.a, t.b limit 50;
> >
> >changes cost (to be cheaper) ever so slightly with the gather-merge
> >addition to create_ordered_paths(), the plan itself is otherwise
> >identical (including row estimates):
> >
> >Limit
> >  -> Gather Merge
> >       -> Incremental Sort
> >          -> Parallel Index Scan
> >
> >(Note: I'm forcing parallel plans here with: set
> >max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0;
> >set parallel_tuple_cost=0; set parallel_setup_cost=0; set
> >min_parallel_index_scan_size=0;)
> >
> >I can't seem to come up with a case where adding these gather-merge
> >paths in create_ordered_paths() isn't entirely duplicative of paths
> >already created by generate_useful_gather_paths() as called from
> >apply_scanjoin_target_to_paths() -- which I _think_ makes sense given
> >that both apply_scanjoin_target_to_paths() and create_ordered_paths()
> >are called by grouping_planner().
> >
> >Can you think of a case I'm missing here that would make it valuable
> >to generate new parallel plans in create_ordered_paths()?
> >
>
> Good question. Not sure. I think such path would need to do something on
> a relation that is neither a join nor a scan - in which case the path
> should not be created by apply_scanjoin_target_to_paths().
>
> So for example a query like this:
>
>   SELECT
>       a, b, sum(expensive_function(c))
>   FROM
>       t
>   GROUP BY a,b
>   ORDER BY a,sum(...) LIMIT 10;
>
> should be able to produce a plan like this:
>
>   -> Limit
>       -> Gather Merge
>         -> Incremental Sort (pathkeys: a, sum)
>           -> Group Aggregate
>              a, b, sum(expensive_function(c))
>             -> Index Scan (pathkeys: a, b)
>
> or something like that, maybe. I haven't tried this, though. The other
> question is whether such queries are useful in practice ...

Hmm, when I step through on that example input_rel->partial_pathlist
!= NIL is false, so we don't even attempt to consider any extra
parallel paths in create_ordered_paths(). Nonetheless we get a
parallel plan, but with a different shape:

 Limit
   ->  Incremental Sort
         Sort Key: a, (sum(expensive_function(c)))
         Presorted Key: a
         ->  Finalize GroupAggregate
               Group Key: a, b
               ->  Gather Merge
                     Workers Planned: 4
                     ->  Partial GroupAggregate
                           Group Key: a, b
                           ->  Sort
                                 Sort Key: a, b
                                 ->  Parallel Seq Scan on t

(or if I disable seqscan then the sort+seq scan above becomes inc sort
+ index scan)

To be honest, I don't think I understand how you would get a plan like
that anyway since the index here only has "a" and so can't provide
(pathkeys: a, b).

Perhaps there's something I'm still missing though.

James Coleman



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: James Coleman
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)