Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20200405130110.2bdxaz4ed5jx2eae@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (James Coleman <jtc331@gmail.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Thu, Apr 02, 2020 at 09:40:45PM -0400, James Coleman wrote:
>On Thu, Apr 2, 2020 at 8:46 PM James Coleman <jtc331@gmail.com> wrote:
>>
>> On Thu, Apr 2, 2020 at 8:20 PM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>> > ...
>> > 5) Overall, I think the costing is OK. I'm sure we'll find cases that
>> > will need improvements, but that's fine. However, we now have
>> >
>> > - cost_tuplesort (used to be cost_sort)
>> > - cost_full_sort
>> > - cost_incremental_sort
>> > - cost_sort
>> >
>> > I find it a bit confusing that we have cost_sort and cost_full_sort. Why
>> > don't we just keep using the dummy path in label_sort_with_costsize?
>> > That seems to be the only external caller outside costsize.c. Then we
>> > could either make cost_full_sort static or get rid of it entirely.
>>
>> This another area of the patch I haven't really modified.
>
>See attached for a cleanup of this; it removed cost_fullsort so
>label_sort_with_costsize is back to how it was.
>
>I've directly merged this into the patch series; if you'd like to see
>the diff I can send that along.
>

Thanks. Attached is v54 of the patch, with some minor changes. The main
two changes are in add_partial_path_precheck(), firstly to also consider
startup_cost, as discussed before. The second change (in 0003) is a bit
of an experiment to make add_partial_precheck() cheaper by only calling
compare_pathkeys after checking the costs first (which should be cheaper
than the function call). add_path_precheck already does it in that order
anyway.

I noticed is that compare_path_costs_fuzzily and add_path_precheck both
check consider_startup/consider_param_startup to decide whether to look
at startup_cost. add_partial_path_precheck probably should od that too.


Right now I'm running a battery of benchmarks to see if/how this affects
planner performance. Initially the results were rather noisy, but after
pinning the processes to processes (using taskset) and fixing frequency
(using cpupower) it's much better. The intermediate results seem pretty
fine (the results are withing 0.5% of the master, in both directions).
I'll share the final results.

Overall, I think this is pretty close to committable, and I'm planning
to get it committed on Monday unless someone objects.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench - add pseudo-random permutation function
Next
From: Mikael Kjellström
Date:
Subject: Re: backup manifests and contemporaneous buildfarm failures