Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200402024746.5wlwk6q7eiv5faux@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Wed, Apr 01, 2020 at 10:09:20PM -0400, James Coleman wrote:
>On Wed, Apr 1, 2020 at 5:42 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> ...
>> I've realized the way get_useful_pathkeys_for_relation() is coded kinda
>> works against the fastpath we added when comparing pathkeys. That
>> depends on comparing pointers to the list, but we've been building new
>> lists (and then returned those) which defeats the optimization. Attached
>> is a patch that returns the original list in most cases (and only
>> creates a copy when really necessary). This might also save a few cycles
>> on bulding the new list, of course.
>>
>> I've done a bunch of read-only pgbench tests with fairly small scales (1
>> and 10). First with the built-in read-only transaction, and also with a
>> simple custom query doing an order-by. And I did this both on the
>> default schema and with a bunch of extra indexes. The script I used to
>> run this is attached, along with a summary of results.
>>
>> There are results for master and v40 and v50 patches (the v50 also
>> includes the extra patch fixing get_useful_pathkeys_for_relation).
>>
>> Overall, I'm happy with those results - the v50 seems to be within 1% of
>> master, in both directions. This very much seems like a noise.
>>
>> I still want to do a bit more review of the costing / tuplesort changes,
>> which I plan to do tomorrow. If that goes well, I plan to start
>> committing this. So please if you think this is not ready or wants more
>
>I think we need to either implement this or remove the comment:
>* XXX I wonder if we need to consider adding a projection here, as
>* create_ordered_paths does.
>in generate_useful_gather_paths().
>

Yeah. I think we don't need the projection here. My reasoning is that if
we don't need it in generate_gather_paths(), we don't need it here.

>In the same function we have the following code:
>/*
> * When the partial path is already sorted, we can just add a gather
> * merge on top, and we're done - no point in adding explicit sort.
> *
> * XXX Can't we skip this (maybe only for the cheapest partial path)
> * when the path is already sorted? Then it's likely duplicate with
> * the path created by generate_gather_paths.
> */
>if (is_sorted)
>{
>        path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
>
> subpath->pathkeys, NULL, rowsp);
>
>        add_path(rel, &path->path);
>        continue;
>}
>
>looking at the relevant loop in generate_gather_paths:
>/*
> * For each useful ordering, we can consider an order-preserving Gather
> * Merge.
> */
>foreach(lc, rel->partial_pathlist)
>{
>        Path       *subpath = (Path *) lfirst(lc);
>        GatherMergePath *path;
>
>        if (subpath->pathkeys == NIL)
>                continue;
>
>        rows = subpath->rows * subpath->parallel_workers;
>        path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
>
> subpath->pathkeys, NULL, rowsp);
>        add_path(rel, &path->path);
>}
>
>I believe we can eliminate the block entirely in
>generate_useful_gather_paths(). Here's my reasoning: all paths for
>which is_sorted is true must necessarily have pathkeys, and since we
>already add a gather merge for every subpath with pathkeys, we've
>already added gather merge paths for all of these.
>
>I've included a patch to change this, but let me know if the reasoning
>isn't sound.
>

Good catch! I think you're correct - we don't need to generate this
path, and we can just skip that partial path entirely.

>We can also remove the XXX on this comment (in the same function):
>* XXX This is not redundant with the gather merge path created in
>* generate_gather_paths, because that merely preserves ordering of
>* the cheapest partial path, while here we add an explicit sort to
>* get match the useful ordering.
>
>because of this code in generate_gather_paths():
>cheapest_partial_path = linitial(rel->partial_pathlist);
>rows =
>        cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
>simple_gather_path = (Path *)
>        create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
>                                           NULL, rowsp);
>add_path(rel, simple_gather_path);
>
>but we can cleanup the comment a bit: fix the grammar issue in the
>last line and fix the reference to gather merge path (it's a gather
>path).
>
>I've included that in the same patch.
>

OK, makes sense.

>I also noticed that in create_incremental_sort_path we have this:
>/* XXX comparison_cost shouldn't be 0? */
>but I guess that's part of what you're reviewing tomorrow.
>

Right, one of the bits.

>> time for a review, let me know. I'm not yet sure if I'll commit this as
>> a single change, or in three separate commits.
>
>I don't love the idea of committing it as a single patch, but at least
>the first two I think probably go together. Otherwise we're
>introducing a "fix" with no proven impact that will slow down planning
>(even if only in a small way) only to intend to condition that on a
>GUC in the next commit.
>
>But I think you could potentially make an argument for keeping the
>additional paths separate...but it's not absolutely necessary IMO.
>

OK. I've been actually wondering whether to move the add_partial_path
after the main patch, for exactly this reason.

>> James, can you review the proposed extra fix and merge the fixes into
>> the main patches?
>
>I've reviewed it, and it looks correct, so merged into the main series.
>
>Summary:
>The attached series includes a couple of XXX fixes or comment cleanup
>as noted above. I believe there are two more XXXs that needs to be
>answered before we merge ("do we need to consider adding a projection"
>and "what is the comparison cost for incremental sort").
>

Thanks!


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)
Next
From: Amit Kapila
Date:
Subject: Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)