Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date
Msg-id 20190709121125.5bpaxgtd5wl7aghy@development
Whole thread Raw
In response to Re: [PATCH] Incremental sort (was: PoC: Partial sort)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [PATCH] Incremental sort (was: PoC: Partial sort)
List pgsql-hackers
On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:
>
> ...
>
>Notice that cost of the second plan is almost double the first one. That
>means 0004 does not even generate the first plan, i.e. there are cases
>where we don't try to add the explicit sort before passing the path to
>generate_gather_paths().
>
>And I think I know why is that - while gather_grouping_paths() tries to
>add explicit sort below the gather merge, there are other places that
>call generate_gather_paths() that don't do that. In this case it's
>probably apply_scanjoin_target_to_paths() which simply builds
>
>  parallel (seq|index) scan + gather merge
>
>and that's it. The problem is likely the same - the code does not know
>which pathkeys are "interesting" at that point. We probably need to
>teach planner to do this.
>

I've looked into this again, and yes - that's the reason. I've added
generate_useful_gather_paths() which is a wrapper on top of
generate_gather_paths(). It essentially does what 0001 patch did directly
in generate_gather_paths() so it's more like create_grouping_paths().

And that does the trick - we now generate the cheaper paths, and I don't
see any crashes in regression tests etc.

I still suspect we already have code doing similar checks whether pathkeys
might be useful somewhere. I've looked into pathkeys.c and elsewhere but
no luck.

Attached is a slightly modified patch series:

1) 0001 considers incremental sort paths in various places (adds the new
generate_useful_gather_paths and modifies places calling create_sort_path)

2) 0002 and 0003 are fixes I mentioned before

3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
tries to nudge the optimizer into using incremental sort by essentially
making it free (i.e. using startup/total costs of the subpath). I've found
this useful when trying to force incremental sorts into plans where it may
not be the best strategy.

I won't have time to hack on this over the next ~2 weeks, but I'll try to
respond to questions when possible.

>
>FWIW tweaking all the create_sort_path() places to also consider adding
>incremental sort is a bit tedious and invasive, and it almost doubles
>the amount of repetitive code. It's OK for experiment like this, but we
>should try handling this in a nicer way (move to a separate function
>that does both, or something like that).
>

This definitely needs more work. We need to refactor it in some way, e.g.
have a function that would consider both explicit sort (on the cheapest
path) and incremental sort (on all paths), and call it from all those
places. Haven't tried it, though.

There's also a couple more places where we do create_sort_path() and don't
consider incremental sort yet - window functions, distinct etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Next
From: Thomas Munro
Date:
Subject: Re: [HACKERS] Cached plans and statement generalization