Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20190625232205.ssecrwplyb7faxbb@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Tue, Jun 25, 2019 at 04:53:40PM -0400, James Coleman wrote: > >Unrelated: if you or someone else you know that's more familiar with >the parallel code, I'd be interested in their looking at the patch at >some point, because I have a suspicion it might not be operating in >parallel ever (either that or I don't know how to trigger it), but I'm >not really familiar with that stuff at all currently. :) > That's an interesting question. I don't think plans like this would be very interesting: Limit -> Incremental Sort -> Gather Merge -> Index Scan because most of the extra cost would be paid in the leader anyway. So I'm not all that surprised those paths are not generated (I might be wrong and those plans would be interesting, though). But I think something like this would be quite beneficial: Limit -> Gather Merge -> Incremental Sort -> Index Scan So I've looked into that, and the reason seems fairly simple - when generating the Gather Merge paths, we only look at paths that are in partial_pathlist. See generate_gather_paths(). And we only have sequential + index paths in partial_pathlist, not incremental sort paths. IMHO we can do two things: 1) modify generate_gather_paths to also consider incremental sort for each sorted path, similarly to what create_ordered_paths does 2) modify build_index_paths to also generate an incremental sort path for each index path IMHO (1) is the right choice here, because it automatically does the trick for all other types of ordered paths, not just index scans. So, something like the attached patch, which gives me plans like this: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.86..2.85 rows=100000 width=12) (actual time=3.726..233.249 rows=100000 loops=1) -> Gather Merge (cost=0.86..120.00 rows=5999991 width=12) (actual time=3.724..156.802 rows=100000 loops=1) Workers Planned: 2 Workers Launched: 2 -> Incremental Sort (cost=0.84..100.00 rows=2499996 width=12) (actual time=0.563..164.438 rows=33910 loops=3) Sort Key: a, b Presorted Key: a Sort Method: quicksort Memory: 27kB Sort Groups: 389 Worker 0: Sort Method: quicksort Memory: 27kB Groups: 1295 Worker 1: Sort Method: quicksort Memory: 27kB Groups: 1241 -> Parallel Index Scan using t_a_idx on t (cost=0.43..250612.29 rows=2499996 width=12) (actual time=0.027..128.518rows=33926 loops=3) Planning Time: 68559.695 ms Execution Time: 285.245 ms (14 rows) This is not the whole story, though - there seems to be some costing issue, because even with the parallel costs set to 0, I only get such plans after I tweak the cost in the patch like this: subpath->total_cost = 100.0; path->path.total_cost = 120.0; When I don't do that, the gather merge gets with total cost 1037485, and it gets beaten by this plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1.05..152.75 rows=100000 width=12) (actual time=0.234..374.492 rows=100000 loops=1) -> Incremental Sort (cost=1.05..9103.09 rows=5999991 width=12) (actual time=0.232..316.210 rows=100000 loops=1) Sort Key: a, b Presorted Key: a Sort Method: quicksort Memory: 27kB Sort Groups: 2863 -> Index Scan using t_a_idx on t (cost=0.43..285612.24 rows=5999991 width=12) (actual time=0.063..240.248rows=100003 loops=1) Planning Time: 53743.858 ms Execution Time: 403.379 ms (9 rows) I suspect it's related to the fact that for the Gather Merge plan we don't have the information about the number of rows, while for the incremental sort we have it. But clearly 9103.09 is not total cost for all 6M rows the incremental sort is expected to produce (because that has to be higher than 285612, which is the cost of the index scan). So it seems like the total cost of the incremental sort is ~546000, because (100000 / 6000000) * 546000 = 9133 so close to the total cost of the incremental sort. But then 100000.0 / 6000000 * 9133 = 152 so it seems we actually do the linear approximation twice. That seems pretty bogus, IMO. And indeed, if I remove this part from cost_incremental_sort: if (limit_tuples > 0 && limit_tuples < input_tuples) { output_tuples = limit_tuples; output_groups = floor(output_tuples / group_tuples) + 1; } then it behaves kinda reasonable: explain select * from t order by a, b limit 100000; QUERY PLAN ----------------------------------------------------------------------------------------- Limit (cost=1.05..9103.12 rows=100000 width=12) -> Incremental Sort (cost=1.05..546124.41 rows=5999991 width=12) Sort Key: a, b Presorted Key: a -> Index Scan using t_a_idx on t (cost=0.43..285612.24 rows=5999991 width=12) (5 rows) set parallel_tuple_cost = 0; set parallel_setup_cost = 0; explain select * from t order by a, b limit 100000; QUERY PLAN -------------------------------------------------------------------------------------------------------- Limit (cost=0.86..6775.63 rows=100000 width=12) -> Gather Merge (cost=0.86..406486.25 rows=5999991 width=12) Workers Planned: 2 -> Incremental Sort (cost=0.84..343937.44 rows=2499996 width=12) Sort Key: a, b Presorted Key: a -> Parallel Index Scan using t_a_idx on t (cost=0.43..250612.29 rows=2499996 width=12) (7 rows) But I'm not going to claim those are total fixes, it's the minimum I needed to do to make this particular type of plan work. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: