Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200329031437.yvhogo25wdijwpqd@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Sat, Mar 28, 2020 at 10:47:49PM -0400, James Coleman wrote: >On Sat, Mar 28, 2020 at 6:59 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> >> Hi, >> >> Attached is my take on simplification of the useful pathkeyes thing. It >> keeps the function, but it truncates query_pathkeys to only members with >> EC members in the relation. I think that's essentially the optimization >> you've proposed. > >Thanks. I've included that in the patch series in this email (as a >separate patch) with a few additional comments. > Thanks. >I've also noticed that the enabled_incrementalsort check in >generate_useful_gather_paths seemed broken, because it returned us out >of the function before creating either a plain gather merge (if >already sorted) or an explicit sort path. I've included a patch that >moves it to the if block that actually builds the incremental sort >path. > Hmmm, that's probably right. The reason why the GUC check was right after generate_gather_paths is that the intent to disable all the useful-pathkeys business, essentially reducing it back to plain generate_gather_paths. But I think you're right that's wrong, because it might lead to strange behavior when the GUC switches between plans without any incremental sort nodes - setting it to 'true' might end up picking GM on top of plain Sort, for example. >> >> ... >> >> 1) Missing worker identification (Worker #). > >Fixed. > >> 2) Missing method for workers (we have it for the leader, though). > >Fixed. Since we can't have pointers in the parallel shared memory >space, we can't store the sort methods used in a list. To accomplish >the same goal, I've assigned the TuplesortMethod enum entries uique >bit positions, and store the methods used in a bitmask. > OK, makes sense. >> 3) I'm not sure why the lable is "Methods" instead of "Sort Method", and >> why it's in parenthesis. > >I've removed the parentheses. It's labeled "Methods" since there can >be more than one (different batches could use different methods). I've >updated this to properly use singular/plural depending on the number >of methods used. > I'm a bit confused. How do I know which batch used which method? Is it actually worth printing in explain analyze? Maybe only print it in the verbose mode? >> 4) Not sure having two lines for each worker is a great idea. > >I've left these in for now because the lines are already very long >(much, much longer than the worker lines in a standard sort node). >This is largely because we're trying to summarize many sort batches, >while standard sort nodes only have to give the exact stats from a >single batch. > >See the example output later in the email. > OK >> 5) I'd probably prefer having multiple labels for avg/max memory values, >> instead of (avg) and (max) notes. Also, I think we use "peak" in this >> context instead of "max". > >Updated. > OK >Here's the current output: > > Limit (cost=1887419.20..1889547.68 rows=10000 width=8) (actual >time=13218.403..13222.519 rows=10000 loo >ps=1) > -> Gather Merge (cost=1887419.20..19624748.03 rows=83333360 >width=8) (actual time=13218.401..13229.7 >50 rows=10000 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Incremental Sort (cost=1886419.17..10005010.55 >rows=41666680 width=8) (actual time=13208.00 >4..13208.586 rows=4425 loops=3) > Sort Key: a, b > Presorted Key: a > Full-sort Groups: 1 Sort Method: quicksort Memory: >avg=28kB peak=28kB > Presorted Groups: 1 Sort Method: top-N heapsort Memory: >avg=1681kB peak=1681kB > Worker 0: Full-sort Groups: 1 Sort Method: quicksort >Memory: avg=28kB peak=28kB > Presorted Groups: 1 Sort Method: top-N heapsort >Memory: avg=1680kB peak=1680kB > Worker 1: Full-sort Groups: 1 Sort Method: quicksort >Memory: avg=28kB peak=28kB > Presorted Groups: 1 Sort Method: top-N heapsort >Memory: avg=1682kB peak=1682kB > -> Parallel Index Scan using index_s_a on s >(cost=0.57..4967182.06 rows=41666680 width=8 >) (actual time=0.455..11730.878 rows=6666668 loops=3) > >James Looks reasonable. Did you try it in other output formats - json/yaml? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: