Re: MergeAppend could consider sorting cheapest child path - Mailing list pgsql-hackers

From Alexander Pyhalov
Subject Re: MergeAppend could consider sorting cheapest child path
Date
Msg-id 1b95a0d224dbb6884c64b44b8995da32@postgrespro.ru
Whole thread Raw
In response to Re: MergeAppend could consider sorting cheapest child path  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
Andrei Lepikhov писал(а) 2025-04-29 16:52:
> On 4/25/25 17:13, Andrei Lepikhov wrote:
>> On 4/25/25 11:16, Alexander Pyhalov wrote:
>>> Usually, sorted cheapest_total_path will be cheaper than sorted 
>>> fractional/startup path at least by startup cost (as after sorting it 
>>> includes total_cost of input path). But we ignore this case when 
>>> selecting cheapest_startup and cheapest_fractional subpaths. As 
>>> result selected cheapest_startup and cheapest_fractional can be not 
>>> cheapest for startup or selecting a fraction of rows.
>> I don't know what you mean by that. The cheapest_total_path is 
>> considered when we chose optimal cheapest_total path. The same works 
>> for the fractional path - get_cheapest_fractional_path gives us the 
>> most optimal fractional path and probes cheapest_total_path too.
>> As above, not sure about min-startup case for now. I can imagine 
>> MergeAppend over sophisticated subquery: non-sorted includes highly 
>> parameterised JOINs and the alternative (with pathkeys) includes 
>> HashJoin, drastically increasing startup cost. It is only a theory, of 
>> course. So, lets discover how min-startup works.
> After a second thought I have caught your idea. I agree that for a 
> fractional path it have no sense to choose any other path except a 
> cheapest total one.
> There are the modified patch in the attachment.
> 
> Also, to be more objective, I propose to use examples in argumentation 
> - something like in attached test2.sql script.

Hi.
I've looked through new patch and found minor inconsistencies in 
get_cheapest_path_for_pathkeys_ext() and 
get_cheapest_fractional_path_for_pathkeys_ext().

In get_cheapest_fractional_path_for_pathkeys_ext() we check that 
base_path is not NULL
        path = get_cheapest_fractional_path_for_pathkeys(rel->pathlist, 
pathkeys,
                                                                          
                                required_outer, fraction);

        base_path = rel->cheapest_total_path;

        /* Stop here if the path doesn't satisfy necessary conditions */
        if (!base_path || !bms_is_subset(PATH_REQ_OUTER(base_path), 
required_outer))
                return path;

But it seems, base_path can't be NULL (as add_paths_to_append_rel() is 
called after set_rel_pathlist() for childrels).
However,  path can. Can we do these two functions 
get_cheapest_path_for_pathkeys_ext() and 
get_cheapest_fractional_path_for_pathkeys_ext()
more similar?

Also we check base_path for required_outer and require_parallel_safe, 
but if cheapest path for pathkeys is NULL, these checks are not 
performed. Luckily, they seen to be no-op anyway due to 
cheapest_total->param_info == NULL and function arguments being NULL 
(required_outer) and false (require_parallel_safe). Should we do 
something about this? Don't know, perhaps, remove these misleading 
arguments?

Now, if we return cheapest_total_path from 
get_cheapest_fractional_path_for_pathkeys_ext() if cheapest paths for 
pathkeys don't exist, do the following lines


                                 /*
                                  * If we found no path with matching 
pathkeys, use the
                                  * cheapest total path instead.
                                  *
                                  * XXX We might consider partially 
sorted paths too (with an
                                  * incremental sort on top). But we'd 
have to build all the
                                  * incremental paths, do the costing 
etc.
                                  */
                                 if (!cheapest_fractional)
                                         cheapest_fractional = 
cheapest_total;

become no-op? And we do return non-null path from  
get_cheapest_fractional_path_for_pathkeys_ext(), as it seems we return 
either cheapest_total_path or cheapest fractional path from 
get_cheapest_fractional_path_for_pathkeys_ext().

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Next
From: Tom Lane
Date:
Subject: Re: allow changing autovacuum_max_workers without restarting