Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20190721113422.hjubqs45xxpnldn7@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Sat, Jul 20, 2019 at 07:37:08PM -0400, James Coleman wrote: >> .. >> >> Yes, you're right - an extra sort node would be necessary. But I don't >> think that changes the idea behind that example. The question is whether >> the extra sorts below the gather would be cheaper than doing a large sort >> on top of it - but I don't see why wouldn't that be the case, and if we >> only need a couple of rows from the beginning ... > >Ah, I see. Right now we get: > > Limit > -> Incremental Sort > Sort Key: a, (sum(expensive_function(c))) > Presorted Key: a > -> Finalize GroupAggregate > Group Key: a, b > -> Gather Merge > Workers Planned: 2 > -> Partial GroupAggregate > Group Key: a, b > -> Parallel Index Scan using t_a_b on t > >even with the parallel additions to create_ordered_paths() -- that >addition doesn't actually add any new parallel paths because >input_rel->partial_pathlist != NIL is false (I'm not sure why yet), so >if we want (if I understand correctly) something more like: > Well, in this particular case it's fairly simple, I think. We only call the create_ordered_paths() from the grouping planner, on the upper relation that represents the result of the GROUP BY. But that means it has to see only the finalized result (after Finalize GroupAggregate). So it can't see partial aggregation or any other partial path. So in this case it seems guaranteed (partial_pathlist == NIL). So maybe we should not be looking at GROUP BY queries, which probably can't hit this particular code path at all - we need a different type of upper relation. For example UNION ALL should hit this code, I think. So maybe try select * from t union all select * from t order by a, b limit 10; and that will hit this condition with partial_pathlist != NIL. I don't know if such queries can benefit from incremental sort, though. There are other upper relations too: typedef enum UpperRelationKind { UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */ UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if * any */ UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */ UPPERREL_WINDOW, /* result of window functions, if any */ UPPERREL_DISTINCT, /* result of "SELECT DISTINCT", if any */ UPPERREL_ORDERED, /* result of ORDER BY, if any */ UPPERREL_FINAL /* result of any remaining top-level actions */ /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */ } UpperRelationKind; >I'm still struggling to understand though how another incremental sort >below the gather-merge would actually be able to help us. For one I'm >not sure it would be less expensive, That's a good question. I think in general we agree that if we can get the gather merge to sort the data the way the operation above the gather merge (which is the first operation that can't operate in parallel mode), that's probably beneficial. So this pattern seems reasonable: -> something -> non-parallel operation -> gather merge -> incremental sort -> something And it's likely faster, especially when the parts above this can leverage the lower startup cost. Say, when there's an explicit LIMIT. I think the question is where exactly do we add the incremental sort. It's quite possible some of the places we modified are redundant, at least for some queries. Not sure. >but more importantly I'm not sure how we could do that and maintain >correctness. Wouldn't a per-worker sum not actually be useful in >sorting since it has no predictable impact on the ordering of the total >sum? Yes, you're right - that wouldn't be correct. The reason why I've been thinking about such examples because distributed databases do make such things in some cases, and we might do that too with partitioning. Consider a simple example create table t (a int, b int, c int) partition by hash (a); create table t0 partition of t for values with (modulus 4, remainder 0); create table t1 partition of t for values with (modulus 4, remainder 1); create table t2 partition of t for values with (modulus 4, remainder 2); create table t3 partition of t for values with (modulus 4, remainder 3); insert into t select mod(i,1000), i, i from generate_series(1,1000000) s(i); analyze t; select a, count(b) from t group by a order by a, count(b) limit 10; In this case we might do a plan similar to what I proposed, assuming each worker gets to execute on a different partition (because then we know each worker will see distinct groups, thanks to the partitioning). But AFAIK we don't do such optimizations yet, so it's probably just a distraction. >Describing that got me thinking of similar cases where ordering of the >partial aggregate would (in theory) be a correct partial sort for the >total ordering, and it seems like min() and max() would be. So I ran >the same experiment with that instead of sum(), but, you guessed it, >input_rel->partial_pathlist != NIL is false again, so we don't add any >parallel paths in create_ordered_paths(). > Right. That's because with aggregation, grouping planner only sees the total result, not the partial paths. We need different upper rel to exercise that code path. >I'm leaning towards thinking considering parallel incremental sorts in >create_ordered_paths() won't add value. But I also feel like this >whole project has me jumping into the deep end of the pool again (this >time the planner), so I'm still picking up a lot of pieces for how all >this fits together, and as such I don't have a great intuitive grasp >yet of how this particular part of the planning process maps to the >kind of queries and plans we consider. All that to say: if you have >further thoughts, I'm happy to look into it, but right now I'm not >seeing anything. > Understood. FWIW I'm not particularly familiar with this code (or which places are supposed to work together), so I definitely agree it may be overwhelming. Especially when it's only a part of a larger patch. I wonder if we're approaching this wrong. Maybe we should not reverse engineer queries for the various places, but just start with a set of queries that we want to optimize, and then identify which places in the planner need to be modified. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: