Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200402024746.5wlwk6q7eiv5faux@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
Responses |
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
|
List | pgsql-hackers |
On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote: >On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra ><tomas.vondra@2ndquadrant.com> wrote: >> ... >> I've realized the way get_useful_pathkeys_for_relation() is coded kinda >> works against the fastpath we added when comparing pathkeys. That >> depends on comparing pointers to the list, but we've been building new >> lists (and then returned those) which defeats the optimization. Attached >> is a patch that returns the original list in most cases (and only >> creates a copy when really necessary). This might also save a few cycles >> on bulding the new list, of course. >> >> I've done a bunch of read-only pgbench tests with fairly small scales (1 >> and 10). First with the built-in read-only transaction, and also with a >> simple custom query doing an order-by. And I did this both on the >> default schema and with a bunch of extra indexes. The script I used to >> run this is attached, along with a summary of results. >> >> There are results for master and v40 and v50 patches (the v50 also >> includes the extra patch fixing get_useful_pathkeys_for_relation). >> >> Overall, I'm happy with those results - the v50 seems to be within 1% of >> master, in both directions. This very much seems like a noise. >> >> I still want to do a bit more review of the costing / tuplesort changes, >> which I plan to do tomorrow. If that goes well, I plan to start >> committing this. So please if you think this is not ready or wants more > >I think we need to either implement this or remove the comment: >* XXX I wonder if we need to consider adding a projection here, as >* create_ordered_paths does. >in generate_useful_gather_paths(). > Yeah. I think we don't need the projection here. My reasoning is that if we don't need it in generate_gather_paths(), we don't need it here. >In the same function we have the following code: >/* > * When the partial path is already sorted, we can just add a gather > * merge on top, and we're done - no point in adding explicit sort. > * > * XXX Can't we skip this (maybe only for the cheapest partial path) > * when the path is already sorted? Then it's likely duplicate with > * the path created by generate_gather_paths. > */ >if (is_sorted) >{ > path = create_gather_merge_path(root, rel, subpath, rel->reltarget, > > subpath->pathkeys, NULL, rowsp); > > add_path(rel, &path->path); > continue; >} > >looking at the relevant loop in generate_gather_paths: >/* > * For each useful ordering, we can consider an order-preserving Gather > * Merge. > */ >foreach(lc, rel->partial_pathlist) >{ > Path *subpath = (Path *) lfirst(lc); > GatherMergePath *path; > > if (subpath->pathkeys == NIL) > continue; > > rows = subpath->rows * subpath->parallel_workers; > path = create_gather_merge_path(root, rel, subpath, rel->reltarget, > > subpath->pathkeys, NULL, rowsp); > add_path(rel, &path->path); >} > >I believe we can eliminate the block entirely in >generate_useful_gather_paths(). Here's my reasoning: all paths for >which is_sorted is true must necessarily have pathkeys, and since we >already add a gather merge for every subpath with pathkeys, we've >already added gather merge paths for all of these. > >I've included a patch to change this, but let me know if the reasoning >isn't sound. > Good catch! I think you're correct - we don't need to generate this path, and we can just skip that partial path entirely. >We can also remove the XXX on this comment (in the same function): >* XXX This is not redundant with the gather merge path created in >* generate_gather_paths, because that merely preserves ordering of >* the cheapest partial path, while here we add an explicit sort to >* get match the useful ordering. > >because of this code in generate_gather_paths(): >cheapest_partial_path = linitial(rel->partial_pathlist); >rows = > cheapest_partial_path->rows * cheapest_partial_path->parallel_workers; >simple_gather_path = (Path *) > create_gather_path(root, rel, cheapest_partial_path, rel->reltarget, > NULL, rowsp); >add_path(rel, simple_gather_path); > >but we can cleanup the comment a bit: fix the grammar issue in the >last line and fix the reference to gather merge path (it's a gather >path). > >I've included that in the same patch. > OK, makes sense. >I also noticed that in create_incremental_sort_path we have this: >/* XXX comparison_cost shouldn't be 0? */ >but I guess that's part of what you're reviewing tomorrow. > Right, one of the bits. >> time for a review, let me know. I'm not yet sure if I'll commit this as >> a single change, or in three separate commits. > >I don't love the idea of committing it as a single patch, but at least >the first two I think probably go together. Otherwise we're >introducing a "fix" with no proven impact that will slow down planning >(even if only in a small way) only to intend to condition that on a >GUC in the next commit. > >But I think you could potentially make an argument for keeping the >additional paths separate...but it's not absolutely necessary IMO. > OK. I've been actually wondering whether to move the add_partial_path after the main patch, for exactly this reason. >> James, can you review the proposed extra fix and merge the fixes into >> the main patches? > >I've reviewed it, and it looks correct, so merged into the main series. > >Summary: >The attached series includes a couple of XXX fixes or comment cleanup >as noted above. I believe there are two more XXXs that needs to be >answered before we merge ("do we need to consider adding a projection" >and "what is the comparison cost for incremental sort"). > Thanks! regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: