Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20190720165951.m6adwbqi26bsfbjb@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Sat, Jul 20, 2019 at 12:21:01PM -0400, James Coleman wrote: >On Sat, Jul 20, 2019 at 12:12 PM James Coleman <jtc331@gmail.com> wrote: >> >> On Sat, Jul 20, 2019 at 11:25 AM Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >> ... >> > >My current line of investigation is whether we need to do anything in >> > >the parallel portion of create_ordered_paths(). I noticed that the >> > >first-pass patch adding generate_useful_gather_paths() modified that >> > >section but wasn't actually adding any new gather-merge paths (just >> > >bare incremental sort paths). That seems pretty clearly just a >> > >prototype miss, so I modified the prototype to build gather-merge >> > >paths instead (as a side note that change seems to fix an oddity I was >> > >seeing where plans would include a parallel index scan node even >> > >though they weren't parallel plans). While the resulting plan for >> > >something like: >> > > >> > >> > Yes, that seems to be a bug. The block above it clealy has a gather >> > merge nodes, so this one should too. >> > >> > >explain analyze select * from t where t.a in (1,2,3,4,5,6) order by >> > >t.a, t.b limit 50; >> > > >> > >changes cost (to be cheaper) ever so slightly with the gather-merge >> > >addition to create_ordered_paths(), the plan itself is otherwise >> > >identical (including row estimates): >> > > >> > >Limit >> > > -> Gather Merge >> > > -> Incremental Sort >> > > -> Parallel Index Scan >> > > >> > >(Note: I'm forcing parallel plans here with: set >> > >max_parallel_workers_per_gather=4; set min_parallel_table_scan_size=0; >> > >set parallel_tuple_cost=0; set parallel_setup_cost=0; set >> > >min_parallel_index_scan_size=0;) >> > > >> > >I can't seem to come up with a case where adding these gather-merge >> > >paths in create_ordered_paths() isn't entirely duplicative of paths >> > >already created by generate_useful_gather_paths() as called from >> > >apply_scanjoin_target_to_paths() -- which I _think_ makes sense given >> > >that both apply_scanjoin_target_to_paths() and create_ordered_paths() >> > >are called by grouping_planner(). >> > > >> > >Can you think of a case I'm missing here that would make it valuable >> > >to generate new parallel plans in create_ordered_paths()? >> > > >> > >> > Good question. Not sure. I think such path would need to do something on >> > a relation that is neither a join nor a scan - in which case the path >> > should not be created by apply_scanjoin_target_to_paths(). >> > >> > So for example a query like this: >> > >> > SELECT >> > a, b, sum(expensive_function(c)) >> > FROM >> > t >> > GROUP BY a,b >> > ORDER BY a,sum(...) LIMIT 10; >> > >> > should be able to produce a plan like this: >> > >> > -> Limit >> > -> Gather Merge >> > -> Incremental Sort (pathkeys: a, sum) >> > -> Group Aggregate >> > a, b, sum(expensive_function(c)) >> > -> Index Scan (pathkeys: a, b) >> > >> > or something like that, maybe. I haven't tried this, though. The other >> > question is whether such queries are useful in practice ... >> >> Hmm, when I step through on that example input_rel->partial_pathlist >> != NIL is false, so we don't even attempt to consider any extra >> parallel paths in create_ordered_paths(). Nonetheless we get a >> parallel plan, but with a different shape: >> >> Limit >> -> Incremental Sort >> Sort Key: a, (sum(expensive_function(c))) >> Presorted Key: a >> -> Finalize GroupAggregate >> Group Key: a, b >> -> Gather Merge >> Workers Planned: 4 >> -> Partial GroupAggregate >> Group Key: a, b >> -> Sort >> Sort Key: a, b >> -> Parallel Seq Scan on t >> >> (or if I disable seqscan then the sort+seq scan above becomes inc sort >> + index scan) >> >> To be honest, I don't think I understand how you would get a plan like >> that anyway since the index here only has "a" and so can't provide >> (pathkeys: a, b). >> >> Perhaps there's something I'm still missing though. I wasn't stricly adhering to the example we used before, and I imagined there would be an index on (a,b). Sorry if that wasn't clear. > >Also just realized I don't think (?) we can order by the sum inside a >gather-merge -- at least not without having another sort above the >parallel portion? Or is the group aggregate able to also provide >ordering on the final sum after aggregating the partial sums? > Yes, you're right - an extra sort node would be necessary. But I don't think that changes the idea behind that example. The question is whether the extra sorts below the gather would be cheaper than doing a large sort on top of it - but I don't see why wouldn't that be the case, and if we only need a couple of rows from the beginning ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: