Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200328011913.btjuwwkhieav3aod@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Fri, Mar 27, 2020 at 12:51:34PM -0400, James Coleman wrote: >In a previous email I'd summarized remaining TODOs I'd found. Here's >an updated listed with several resolved. > >Resolved: > >2. Not marked in the patch, but in nodeIncrementalSort.c >ExecIncrementalSort() I wonder if perhaps we should move the algorithm >discussion comments up to the file header comment. On the other hand, >I suppose it could be valuable to leave the the file header comment >more high level about the mathematical properties of incremental sort >rather than discussing the details of the hybrid mode. > >I've decided to do this, and the attached patch series includes the change. > It's a bit tough to find the right balance what to put into the header comment and what should go to function comments, but this seems mostly reasonable. I wouldn't use the double-tab indentation and the copyright notices should stay at the top. >3. nodeIncrementalSort.c ExecIncrementalSort() in the main for loop: > * TODO: do we need to check for interrupts inside these loops or > * will the outer node handle that? > >It seems like what we have is sufficient, given that the nodes (and >sort) we rely on have their own calls. The one place where someone >might make an argument otherwise would be in the mode transition >function where we copy tuples from the full sort state to the >presorted sort state. If this is a problem, let me know, and I'll >change it, but I'm proceeding under the assumption for now that it's >not. > I think what we have now is sufficient. >4. nodeIncrementalSort.c ExecReScanIncrementalSort: This whole chunk >is suspect. I've mentioned previously I don't have a great mental >model of how rescan works and its invariants (IIRC someone said it was >about moving around a result set in a cursor). Regardless I'm pretty >sure this code just doesn't work correctly. Additionally the sort_Done >variable is poorly named; it probably would make more sense to call it >something like "scanHasBegun". I'm waiting to change it though until >cleaning up this code more holistically. > >Fixed, as described in previous email. > >6. regress/expected/incremental_sort.out: >-- TODO if an analyze happens here the plans might change; should we >-- solve by inserting extra rows or by adding a GUC that would somehow >-- forcing the time of plan we expect. > >I've decided this doesn't seem to be a real issue, so, comment removed. > OK >7. Not listed as a comment in the patch, but I need to modify the >testing for analyze output to parse out the memory/disk stats to the >tests are stable. > >Included in the attached patch series. I use plpgsql to munge out the >space kB numbers. I also discovered two bugs in the JSON output along >the way and fixed those (memory and disk need to be output separate; >disk was using the wrong "space type" enum). Finally I also use >plpgsql to check a few invariants (for now just that max space is >greater than or equal to the average. > OK >8. optimizer/path/allpaths.c get_useful_pathkeys_for_relation: >* XXX At the moment this can only ever return a list with a single element, >* because it looks at query_pathkeys only. So we might return the pathkeys >* directly, but it seems plausible we'll want to consider other orderings >* in the future. > >I think we just leave this in as a comment. > Fine with me. As a side note here, I'm wondering if this (determining useful pathkeys) can be made a bit smarter by looking both at query_pathkeys and pathkeys useful for merging, similarly to what truncate_useless_pathkeys() does. But that can be seen as an improvement of what we do now. >9. optimizer/path/allpaths.c get_useful_pathkeys_for_relation: >* Considering query_pathkeys is always worth it, because it might let us >* avoid a local sort. > >That originally was a copy from the fdw code, but since the two >functions have diverged (Is that concerning? I could be confusing, but >isn't a compilation problem) I didn't move the function. > I think it's OK the two functions diverged, it's simply because the FDW one needs to check other things too. But I might rework this once I look closer at truncate_useless_pathkeys. >I did notice though that find_em_expr_for_rel() is wholesale copied >(and unchanged) from the fdw code, so I moved it to equivclass.c so >both places can share it. > +1 > >Still remaining: > >1. src/backend/optimizer/util/pathnode.c add_partial_path() >* XXX Perhaps we could do this only when incremental sort is enabled, >* and use the simpler version (comparing just total cost) otherwise? > >I don't have a strong opinion here. It doesn't seem like a significant >difference in terms of cost? > >5. planner.c create_ordered_paths: >* XXX This only looks at sort_pathkeys. I wonder if it needs to look at the >* other pathkeys (grouping, ...) like generate_useful_gather_paths. > >10. optimizer/path/allpaths.c generate_useful_gather_paths: >* XXX I wonder if we need to consider adding a projection here, as >* create_ordered_paths does. > >11. In the same function as the above: >* XXX Can't we skip this (maybe only for the cheapest partial path) >* when the path is already sorted? Then it's likely duplicate with >* the path created by generate_gather_paths. > >12. In the same function as the above: >* XXX This is not redundant with the gather merge path created in >* generate_gather_paths, because that merely preserves ordering of >* the cheapest partial path, while here we add an explicit sort to >* get match the useful ordering. > >13. planner.c create_ordered_paths: >* XXX This is probably duplicate with the paths we already generate >* in generate_useful_gather_paths in apply_scanjoin_target_to_paths. > >Tomas, any chance you could take a look at the above XXX/questions? I >believe all of them that remain relate to the planner patches. > Yes, I'll take a look over the weekend. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: