On Mon, 18 Sept 2023 at 22:55, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Here is an updated version to show what's in my mind.
My current thoughts on this are that the fractional cost part adds
quite a bit more complexity than the minimalistic approach of just
also considering the cheapest startup path.
There's also quite a bit I don't like about the extra code you've added:
1. RelOptInfo.tuple_fraction is not given a default value in locations
where we do makeNode(RelOptInfo);
2. This is very poorly documented and badly named. Also seems to have
a typo "stopper"
+ /* Like order by, group by, distinct and so. */
+ bool has_stoper_op;
With that, nobody has a hope of knowing if some new operation should
set that value to true or false.
I think it needs to define the meaning, which I think (very roughly)
is "does the query require any additional upper-planner operations
which could require having to read more tuples from the final join
relation than the number of tuples which are read from the final upper
rel."
3. get_fractional_path_cost() goes to the trouble of setting
total_rows then does not use it.
4. I don't see why it's ok to take the total_rows from the first Path
in the list in get_cheapest_fractional_path_ext(). What if another
Path has some other value?
But overall, I'm more inclined to just go with the more simple "add a
cheap unordered startup append path if considering cheap startup
plans" version. I see your latest patch does both. So, I'd suggest two
patches as I do see the merit in keeping this simple and cheap. If we
can get the first part in and you still find cases where you're not
getting the most appropriate startup plan based on the tuple fraction,
then we can reconsider what extra complexity we should endure in the
code based on the example query where we've demonstrated the planner
is not choosing the best startup path appropriate to the given tuple
fraction.
David