Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20190909215538.42hrv6j4z5ztcft4@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Wed, Sep 04, 2019 at 09:17:10PM +0200, Tomas Vondra wrote:
>On Wed, Sep 04, 2019 at 11:37:48AM +0200, Rafia Sabih wrote:
>>On Tue, 30 Jul 2019 at 02:17, Tomas Vondra <tomas.vondra@2ndquadrant.com>
>>wrote:
>>
>>>On Sun, Jul 21, 2019 at 01:34:22PM +0200, Tomas Vondra wrote:
>>>>
>>>> ...
>>>>
>>>>I wonder if we're approaching this wrong. Maybe we should not reverse
>>>>engineer queries for the various places, but just start with a set of
>>>>queries that we want to optimize, and then identify which places in the
>>>>planner need to be modified.
>>>>
>>>
>>>I've decided to do a couple of experiments, trying to make my mind about
>>>which modified places matter to diffrent queries. But instead of trying
>>>to reverse engineer the queries, I've taken a different approach - I've
>>>compiled a list of queries that I think are sensible and relevant, and
>>>then planned them with incremental sort enabled in different places.
>>>
>>>I don't have any clear conclusions at this point - it does show some of
>>>the places don't change plan for any of the queries, although there may
>>>be some additional query where it'd make a difference.
>>>
>>>But I'm posting this mostly because it might be useful. I've initially
>>>planned to move changes that add incremental sort paths to separate
>>>patches, and then apply/skip different subsets of those patches. But
>>>then I realized there's a better way to do this - I've added a bunch of
>>>GUCs, one for each such place. This allows doing this testing without
>>>having to rebuild repeatedly.
>>>
>>>I'm not going to post the patch(es) with extra GUCs here, because it'd
>>>just confuse the patch tester, but it's available here:
>>>
>>>  https://github.com/tvondra/postgres/tree/incremental-sort-20190730
>>>
>>>There are 10 GUCs, one for each place in planner where incremental sort
>>>paths are constructed. By default all those are set to 'false' so no
>>>incremental sort paths are built. If you do
>>>
>>>  SET devel_create_ordered_paths = on;
>>>
>>>it'll start creating the paths in non-parallel in create_ordered_paths.
>>>Then you may enable devel_create_ordered_paths_parallel to also consider
>>>parallel paths, etc.
>>>
>>>The list of queries (synthetic, but hopefully sufficiently realistic)
>>>and a couple of scripts to collect the plans is in this repository:
>>>
>>>  https://github.com/tvondra/incremental-sort-tests-2
>>>
>>>There's also a spreadsheet with a summary of results, with a visual
>>>representation of which GUCs affect which queries.
>>>
>>Wow, that sounds like an elaborate experiment. But where is this
>>spreadsheet you mentioned ?
>>
>
>It seems I forgot to push the commit containing the spreadsheet with
>results. I'll fix that tomorrow.
>

OK, I've pushed the commit with the spreadsheet. The single sheet lists
the synthetic queries, and hashes of plans with different flags enables
(parallel query, force incremental sort, and the new developer GUCs
mentioned before). Only a single developer flag is set to true (or none
of them).

The columns at the end simply say whether the plan differs from the plan
generated by master (no patches). TRUE means "same as master" while
FALSE means "different plan.

The "patched" column means all developer GUCs disabled, so it's expected
to produce the same plan as master (and it is). And then there's one
column for each developer GUC. If the column is just TRUE it means the
GUC does not affect any of the synthetic queries. There are 4 of them:

- devel_add_paths_to_grouping_rel_parallel
- devel_create_partial_grouping_paths
- devel_gather_grouping_paths
- devel_standard_join_search

The places controlled by those GUCs are either useless, or the query
affected by them is not included in the list of queries.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Replication & recovery_min_apply_delay
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] CLUSTER command progress monitor