Re: make add_paths_to_append_rel aware of startup cost - Mailing list pgsql-hackers

From Andy Fan
Subject Re: make add_paths_to_append_rel aware of startup cost
Date
Msg-id CAKU4AWoksFSs_6K5ZCzz68Oq+3uZb5g6sb8-dXJ6cChgYQm-yw@mail.gmail.com
Whole thread Raw
In response to Re: make add_paths_to_append_rel aware of startup cost  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: make add_paths_to_append_rel aware of startup cost
List pgsql-hackers
Hi David,
  Thanks for taking a look at this!

On Fri, Sep 15, 2023 at 3:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 7 Sept 2023 at 04:37, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Currently add_paths_to_append_rel overlooked the startup cost for creating
> append path, so it may have lost some optimization chances.  After a glance,
> the following 4 identifiers can be impacted.

> - We shouldn't do the optimization if there are still more tables to join,
>   the reason is similar to has_multiple_baserels(root) in
>   set_subquery_pathlist. But the trouble here is we may inherit multiple
>   levels to build an appendrel, so I have to keep the 'top_relids' all the
>   time and compare it with PlannerInfo.all_baserels. If they are the same,
>   then it is the case we want to optimize.

I think you've likely gone to the trouble of trying to determine if
there are any joins pending because you're considering using a cheap
startup path *instead* of the cheapest total path and you don't want
to do that when some join will cause all the rows to be read thus
making the plan more expensive if a cheap startup path was picked.

Yes, that's true.  However it is not something we can't resolve, one
of the solutions is just like what I did in the patch.  but currently the
main stuff which confuses me is if it is the right thing to give up the
optimization if it has more tables to join (just like set_subquery_pathlist
did). 
 
Instead of doing that, why don't you just create a completely new
AppendPath containing all the cheapest_startup_paths and add that to
the append rel. You can optimise this and only do it when
rel->consider_startup is true.

Does the attached do anything less than what your patch does?

We can work like this,  but  there is one difference from what
my current patch does. It is cheapest_startup_path vs cheapest
fraction path.  For example if we have the following 3 paths with
all of the estimated rows is 100 and the tuple_fraction is 10. 

Path 1:  startup_cost = 60,  total_cost = 80  -- cheapest total cost. 
Path 2:  startup_cost = 10,  total_cost = 1000  -- cheapest startup cost
Path 3:  startup_cost = 20,  total_cost = 90  -- cheapest fractional cost 

So with the patch you propose,  Path 1 & Path 3 are chosen to build
append path.  but with my current patch,  Only path 3 is kept.  IIUC, 
path 3 should be the best one in this case.

We might also argue why Path 3 is kept in the first place (the children
level), I think pathkey might be one option.  and even path 3 is 
discarded somehow, I think only if it is the best one, we should
keep it ideally. 

Another tiny factor of this is your propose isn't consistent with
what set_subquery_pathlist which uses cheapest fractional cost
and my proposal isn't consistent plain rel which uses cheapest
startup cost.  We can't say which one is better, though. 

If my above analysis is correct,  I think the best way to handle this
is if there is no more tables to join, we use cheapest fraction cost
for all the kinds of relations, including plain relation, append rel,
subquery and so on.  If we have more tables to join,  we use
cheapest startup cost.  On the implementation side, I want to use
RelOptInfo.tuple_fraction instead of RelOptInfo.consider_startup. 
tuple_fraction = -1 means startup cost should not be considered.
tuple_fraction = 0 means cheapest startup cost should be used. 
tuple_franction > 0 means cheapest fraction cost should be used.

I still don't pay enough attention to consider_param_startup in
RelOptInfo,  I'm feeling the above strategy will not generate
too much overhead to the planner for now while it can provides
a better plan sometimes. 

--
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Yogesh Sharma
Date:
Subject: Re: Have better wording for snapshot file reading failure
Next
From: Tom Lane
Date:
Subject: Re: JSON Path and GIN Questions