Home > mailing lists

Thread: Considering fractional paths in Append node

Considering fractional paths in Append node

From

Nikita Malakhov

Date:

16 October 2024, 21:24:10

Hi hackers!

A colleague of mine, Andrei Lepikhov, has found interesting behavior

in path cost calculation for Append node - when evaluating the cheapest

path it does not take into account fractional path costs.

We've prepared a patch that forces add_paths_to_append_rel function

to consider non-parametrized fractional path costs.

The effect is easily seen in one of standard PG tests:

Vanilla (current master):

explain analyze
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
union all
(values(1)) limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1 loops=1)
-> Append (cost=0.00..2415.09 rows=11 width=4) (actual time=6.308..6.310 rows=1 loops=1)
-> Nested Loop (cost=0.00..2415.03 rows=10 width=4) (actual time=6.307..6.308 rows=1 loops=1)
Join Filter: (t1.tenthous = t2.tenthous)
Rows Removed by Join Filter: 4210
-> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000 width=8) (actual time=0.004..0.057 rows=422 loops=1)
-> Materialize (cost=0.00..470.05 rows=10 width=4) (actual time=0.000..0.014 rows=10 loops=422)
Storage: Memory Maximum Storage: 17kB
-> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..5.535 rows=10 loops=1)
Filter: (thousand = 0)
Rows Removed by Filter: 9990
-> Result (cost=0.00..0.01 rows=1 width=4) (never executed)
Planning Time: 0.324 ms
Execution Time: 6.336 ms
(14 rows)

Patched, the same test:

explain analyze
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
union all
(values(1)) limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1 loops=1)
-> Append (cost=0.29..1383.12 rows=11 width=4) (actual time=0.104..0.105 rows=1 loops=1)
-> Nested Loop (cost=0.29..1383.05 rows=10 width=4) (actual time=0.104..0.104 rows=1 loops=1)
-> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..0.076 rows=1 loops=1)
Filter: (thousand = 0)
Rows Removed by Filter: 421
-> Index Scan using tenk1_thous_tenthous on tenk1 t1 (cost=0.29..91.30 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1)
Index Cond: (tenthous = t2.tenthous)
-> Result (cost=0.00..0.01 rows=1 width=4) (never executed)
Planning Time: 0.334 ms
Execution Time: 0.130 ms
(11 rows)

Hope this optimization could be useful.

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: Considering fractional paths in Append node

From

Andy Fan

Date:

17 October 2024, 03:05:41

Nikita Malakhov <hukutoc@gmail.com> writes:

Helll Nikita,

> Hi hackers!
>
> Sorry, I've forgot to attach the patch itself. Please check it out.

Could you check if [1] is related to this subject? I think the hard part
would be which tuple_fraction to use during adding
add_paths_to_append_rel since root->tuple_fraction is on subquery level,
while the add_paths_to_append_rel is only on RelOptInfo level.

[1]
https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com

-- 
Best Regards
Andy Fan

Re: Considering fractional paths in Append node

From

Andrei Lepikhov

Date:

17 October 2024, 06:57:50

On 10/17/24 07:05, Andy Fan wrote:
> Nikita Malakhov <hukutoc@gmail.com> writes:
> 
> Helll Nikita,
> 
>> Hi hackers!
>>
>> Sorry, I've forgot to attach the patch itself. Please check it out.
> 
> Could you check if [1] is related to this subject? I think the hard part
> would be which tuple_fraction to use during adding
> add_paths_to_append_rel since root->tuple_fraction is on subquery level,
> while the add_paths_to_append_rel is only on RelOptInfo level.
> 
> [1]
> https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com
Yes, this thread is connected to the code Nikita has proposed.
It is almost related to partitioned cases, of course.
I'm not sure about partitionwise joins - it looks overcomplicated for 
the moment. We just see cases when SeqScan is preferred to IndexScan. 
One logical way is to add into the Append node an alternative fractional 
path and, if at the top level some Limit, IncrementalSort or another 
fractional-friendly node will request only a small part of the data, the 
optimiser will have the option to choose the fractional path.

-- 
regards, Andrei Lepikhov

Re: Considering fractional paths in Append node

From

Nikita Malakhov

Date:

17 October 2024, 14:10:04

Hi,

Andy, your words make sense, but to me it seems that in add_paths_to_append_rel

we have no other options than tuple_fraction from root, and rows (if any) in paths

we take into account, please correct me if I am wrong.

Thank you!

Also, on top of that I have an idea of pruning unnecessary partitions

in generate_orderedappend_paths() when we have a valid LIMIT value.

I'm currently checking if it is working correctly in multiple cases,

so 'll send it after we deal with this issue.

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: Considering fractional paths in Append node

From

Andy Fan

Date:

18 October 2024, 02:36:37

Nikita Malakhov <hukutoc@gmail.com> writes:

> Hi,
>
> Andy, your words make sense, but to me it seems that in add_paths_to_append_rel
> we have no other options than tuple_fraction from root, and rows (if any) in paths
> we take into account, please correct me if I am wrong.

One of the option might be applying your logic only if we can prove the
tuple_fraction from root is same as the tuple_fraction, similar with
what I did before. But it is proved that is too complex. 

> Thank you!
>
> Also, on top of that I have an idea of pruning unnecessary partitions
> in generate_orderedappend_paths() when we have a valid LIMIT value.
> I'm currently checking if it is working correctly in multiple cases,
> so 'll send it after we deal with this issue.

-- 
Best Regards
Andy Fan

Re: Considering fractional paths in Append node

From

Andy Fan

Date:

18 October 2024, 03:54:48

Nikita Malakhov <hukutoc@gmail.com> writes:


> The effect is easily seen in one of standard PG tests:
> Vanilla (current master):
> explain analyze
> select t1.unique1 from tenk1 t1
> inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
>    union all
> (values(1)) limit 1;
>                                                           QUERY PLAN
  
 
>     
>
------------------------------------------------------------------------------------------------------------------------------
>
>  Limit  (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1 loops=1)
>    ->  Append  (cost=0.00..2415.09 rows=11 width=4) (actual time=6.308..6.310 rows=1 loops=1)
>          ->  Nested Loop  (cost=0.00..2415.03 rows=10 width=4) (actual time=6.307..6.308 rows=1 loops=1)
>                Join Filter: (t1.tenthous = t2.tenthous)
>                Rows Removed by Join Filter: 4210
>                ->  Seq Scan on tenk1 t1  (cost=0.00..445.00 rows=10000 width=8) (actual time=0.004..0.057 rows=422
> loops=1)
>                ->  Materialize  (cost=0.00..470.05 rows=10 width=4) (actual time=0.000..0.014 rows=10 loops=422)
>                      Storage: Memory  Maximum Storage: 17kB
>                      ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..5.535 rows=10
> loops=1)
>                            Filter: (thousand = 0)
>                            Rows Removed by Filter: 9990
>          ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
>  Planning Time: 0.324 ms
>  Execution Time: 6.336 ms
> (14 rows)
>
> Patched, the same test:
> explain analyze
> select t1.unique1 from tenk1 t1
> inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
>    union all
> (values(1)) limit 1;
>                                                                     QUERY PLAN
  
 
>                         
>
--------------------------------------------------------------------------------------------------------------------------------------------------
>
>  Limit  (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1 loops=1)
>    ->  Append  (cost=0.29..1383.12 rows=11 width=4) (actual time=0.104..0.105 rows=1 loops=1)
>          ->  Nested Loop  (cost=0.29..1383.05 rows=10 width=4) (actual time=0.104..0.104 rows=1 loops=1)
>                ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..0.076 rows=1
loops=1)
>                      Filter: (thousand = 0)
>                      Rows Removed by Filter: 421
>                ->  Index Scan using tenk1_thous_tenthous on tenk1 t1  (cost=0.29..91.30 rows=1 width=8) (actual
> time=0.026..0.026 rows=1 loops=1)
>                      Index Cond: (tenthous = t2.tenthous)
>          ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
>  Planning Time: 0.334 ms
>  Execution Time: 0.130 ms
> (11 rows)
>

This is a nice result. After some more thoughts, I'm feeling the startup
cost calculation on seq scan looks a more important one to address. 

Bad Plan: Append  (cost=0.00..2415.09 ..) shows us the "startup cost" is 0.
Good plan: Append  (cost=0.29..1383.12 ..) show us the "startup cost" is
0.29.

The major reason of this is we calculate the "startup cost" for
"SeqScan" and "Index scan" with different guidances. For the "Index
scan", the startup cost is "the cost to retrieve the first tuple",
however for "SeqScan", it is not, as we can see the startup cost for
query "SELECT * FROM tenk2 WHERE thousand = 0" has a 0 startup_cost. 

In my understading, "startup cost" means the cost to retrieve the first
tuple *already*, but at [1], Tom said:

"""
I think that things might work out better if we redefined the startup
cost as "estimated cost to retrieve the first tuple", rather than its
current very-squishy definition as "cost to initialize the scan". 
"""

The above statement makes me confused. If we take the startup cost as
cost to retrieve cost for the first tuple, we can do the below quick hack,

@@ -355,8 +355,8 @@ cost_seqscan(Path *path, PlannerInfo *root,
     }
 
     path->disabled_nodes = enable_seqscan ? 0 : 1;
-    path->startup_cost = startup_cost;
     path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
+    path->startup_cost = startup_cost +   (cpu_run_cost + disk_run_cost) * (1 - path->rows / baserel->tuples);
 }

We get plan:

regression=# explain                   
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 union all select 1 limit 1
;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Limit  (cost=470.12..514.00 rows=1 width=4)
   ->  Append  (cost=470.12..952.79 rows=11 width=4)
         ->  Hash Join  (cost=470.12..952.73 rows=10 width=4)
               Hash Cond: (t1.tenthous = t2.tenthous)
               ->  Seq Scan on tenk1 t1  (cost=0.00..445.00 rows=10000 width=8)
               ->  Hash  (cost=470.00..470.00 rows=10 width=4)
                     ->  Seq Scan on tenk2 t2  (cost=469.53..470.00 rows=10 width=4)
                           Filter: (thousand = 0)
         ->  Result  (cost=0.00..0.01 rows=1 width=4)
(9 rows)

set enable_hashjoin to off;

regression=# explain                    
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 union all select 1 limit 1
;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Limit  (cost=469.81..542.66 rows=1 width=4)
   ->  Append  (cost=469.81..1271.12 rows=11 width=4)
         ->  Nested Loop  (cost=469.81..1271.05 rows=10 width=4)
               ->  Seq Scan on tenk2 t2  (cost=469.53..470.00 rows=10 width=4)
                     Filter: (thousand = 0)
               ->  Index Scan using tenk1_thous_tenthous on tenk1 t1  (cost=0.29..80.09 rows=1 width=8)
                     Index Cond: (tenthous = t2.tenthous)
         ->  Result  (cost=0.00..0.01 rows=1 width=4)
(8 rows)

Looks we still have some other stuff to do, but we have seen the desired
plan has a closer cost to estimated best plan than before. 


[1]
https://www.postgresql.org/message-id/flat/3783591.1721327902%40sss.pgh.pa.us#09d6471fc59b35fa4aca939e49943c2c 
-- 
Best Regards
Andy Fan

Re: Considering fractional paths in Append node

From

Andrei Lepikhov

Date:

18 October 2024, 04:42:26

On 10/18/24 07:54, Andy Fan wrote:
> Nikita Malakhov <hukutoc@gmail.com> writes:
>> The effect is easily seen in one of standard PG tests:
> """
> I think that things might work out better if we redefined the startup
> cost as "estimated cost to retrieve the first tuple", rather than its
> current very-squishy definition as "cost to initialize the scan".
> """
> The above statement makes me confused. If we take the startup cost as
> cost to retrieve cost for the first tuple, we can do the below quick hack,
Promising way to go. Of course even in that case IndexScan usually gives 
way to SeqScan (because of the additional heap fetch). And only 
IndexOnlyScan may overcome it. Moreover, SeqScan first tuple cost is 
contradictory issue - who knows, how much tuples it will filter before 
the first tuple will be produced?

> 
> Looks we still have some other stuff to do, but we have seen the desired
> plan has a closer cost to estimated best plan than before.
But our patch is about some different stuff: adding one more Append 
strategy (and, as I realised recently, one promising MergeAppend too) we 
give a chance upper fraction-friendly node to decide which plan is 
better. Right now, AFAIK, only LIMIT node can profit from that (maybe 
IncrementalSort if we include MergeAppend). But it may open a door to 
improve other nodes too.

-- 
regards, Andrei Lepikhov

Re: Considering fractional paths in Append node

From

Nikita Malakhov

Date:

18 October 2024, 18:40:06

Hi!

Andy, one quick question - what do you think on using root->limit_tuples as a guidance on how many

rows we have to consider in plans cost?

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: Considering fractional paths in Append node

From

Andy Fan

Date:

19 October 2024, 03:03:57

Nikita Malakhov <hukutoc@gmail.com> writes:

Hi Nikita,

> Hi!
>
> Andy, one quick question - what do you think on using
> root->limit_tuples as a guidance on how many 
> rows we have to consider in plans cost?

Within my exprerience, the committer probabaly dislikes the idea of
"ignoring  the difference of tuple_fraction between subquery level and
RelOptInfo level" and the committers dislikes a complex soluation like
what I did in [1]. Maybe you can try to figure out completed and simple
soluation to make everyone happy, but I have no idea about on this right
now. 

[1]
https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com

-- 
Best Regards
Andy Fan

Re: Considering fractional paths in Append node

From

Nikita Malakhov

Date:

28 October 2024, 22:50:26

Hi,

Andy, thank you, I've checked this thread out along with run-time partition pruning.

I've spend some time hovering on the tuple_fraction field usage and would disagree

with you on this topic - it is already used on the RelOptInfo level later on, in

generate_orderedappend_paths()

I mean the following piece:

if (root->tuple_fraction > 0)
{
double path_fraction = (1.0 / root->tuple_fraction);
Path cheapest_consider_fraction;

cheapest_fractional = get_cheapest_fractional_path_for_pathkeys(childrel->pathlist, pathkeys, NULL, path_fraction);

...

function, so it does not seem incorrect to use its value for a single relation in subquery -

I agree that we do not have accurate estimation at this level, but we could use the one

we already have.

I've also tried hard to find an example where this patch could break something,

but without success.

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: Considering fractional paths in Append node

From

Andy Fan

Date:

29 October 2024, 02:38:31

Nikita Malakhov <hukutoc@gmail.com> writes:

> Hi,
>
> Andy, thank you, I've checked this thread out along with run-time
> partition pruning.
I'm not sure the relationshipe between this topic and run-time
partitioin pruning..

> I've spend some time hovering on the tuple_fraction field usage and would disagree
> with you on this topic - it is already used on the RelOptInfo level later on, in
> generate_orderedappend_paths()

Looks you are right that root->tuple_fraction has been used on RelOptInfo
level in the generate_orderedappend_paths(). But we also tried to
use not in the RelOptInfo level such as set_subquery_pathlist. See..

"""
/*
 * We can safely pass the outer tuple_fraction down to the subquery if the
 * outer level has no joining, aggregation, or sorting to do. Otherwise
 * we'd better tell the subquery to plan for full retrieval. (XXX This
 * could probably be made more intelligent ...)
 */
"""

I'm not sure the "more intelligent" would be just use it directly. 

So I'm not saying we can't do this, just that the facts are:
(a).  root->tuple_fraction are not exactly same as RelOptInfo's
tuple_fraction.
(b).  We have used root->tuple_fraction in RelOptInfo in some cases and
also tried to not use it in some other case (and only use it under some
situation similar like what I did before).

Looks different committers have different opinion on this. 

-- 
Best Regards
Andy Fan

Re: Considering fractional paths in Append node

From

Andrei Lepikhov

Date:

06 December 2024, 09:48:48

On 11/2/24 01:18, Nikita Malakhov wrote:
> I've corrected failing test and created a patch at Commitfest:
> https://commitfest.postgresql.org/51/5361/ <https:// 
> commitfest.postgresql.org/51/5361/>
I have played around with this feature, which looks promising for such a 
tiny change. It provides a 'bottom boundary' recommendation for 
appending subpaths, participating in the 'fractional branch' of paths.
As I see it works consistently with the plans, created for plain tables 
filled with similar data.
According to the proposal to change SeqScan logic, IMO, Andy is right. 
But it is a separate improvement because it wouldn't work in the case of 
LIMIT 10 or 100, as the newly added regression tests demonstrate.
I think this feature gives sensible profit for partitionwise paths. 
Pushing this knowledge into subpaths could help postgres_fdw to reduce 
network traffic.

-- 
regards, Andrei Lepikhov

Re: Considering fractional paths in Append node

From

Alexander Korotkov

Date:

02 March, 11:46:09

Hi, Andy!

On Fri, Oct 18, 2024 at 3:55 AM Andy Fan <zhihuifan1213@163.com> wrote:
>
> Nikita Malakhov <hukutoc@gmail.com> writes:
>
>
> > The effect is easily seen in one of standard PG tests:
> > Vanilla (current master):
> > explain analyze
> > select t1.unique1 from tenk1 t1
> > inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
> >    union all
> > (values(1)) limit 1;
> >                                                           QUERY PLAN
> >
> >
------------------------------------------------------------------------------------------------------------------------------
> >
> >  Limit  (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1 loops=1)
> >    ->  Append  (cost=0.00..2415.09 rows=11 width=4) (actual time=6.308..6.310 rows=1 loops=1)
> >          ->  Nested Loop  (cost=0.00..2415.03 rows=10 width=4) (actual time=6.307..6.308 rows=1 loops=1)
> >                Join Filter: (t1.tenthous = t2.tenthous)
> >                Rows Removed by Join Filter: 4210
> >                ->  Seq Scan on tenk1 t1  (cost=0.00..445.00 rows=10000 width=8) (actual time=0.004..0.057 rows=422
> > loops=1)
> >                ->  Materialize  (cost=0.00..470.05 rows=10 width=4) (actual time=0.000..0.014 rows=10 loops=422)
> >                      Storage: Memory  Maximum Storage: 17kB
> >                      ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..5.535
rows=10
> > loops=1)
> >                            Filter: (thousand = 0)
> >                            Rows Removed by Filter: 9990
> >          ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
> >  Planning Time: 0.324 ms
> >  Execution Time: 6.336 ms
> > (14 rows)
> >
> > Patched, the same test:
> > explain analyze
> > select t1.unique1 from tenk1 t1
> > inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
> >    union all
> > (values(1)) limit 1;
> >                                                                     QUERY PLAN
> >
> >
--------------------------------------------------------------------------------------------------------------------------------------------------
> >
> >  Limit  (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1 loops=1)
> >    ->  Append  (cost=0.29..1383.12 rows=11 width=4) (actual time=0.104..0.105 rows=1 loops=1)
> >          ->  Nested Loop  (cost=0.29..1383.05 rows=10 width=4) (actual time=0.104..0.104 rows=1 loops=1)
> >                ->  Seq Scan on tenk2 t2  (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..0.076 rows=1
loops=1)
> >                      Filter: (thousand = 0)
> >                      Rows Removed by Filter: 421
> >                ->  Index Scan using tenk1_thous_tenthous on tenk1 t1  (cost=0.29..91.30 rows=1 width=8) (actual
> > time=0.026..0.026 rows=1 loops=1)
> >                      Index Cond: (tenthous = t2.tenthous)
> >          ->  Result  (cost=0.00..0.01 rows=1 width=4) (never executed)
> >  Planning Time: 0.334 ms
> >  Execution Time: 0.130 ms
> > (11 rows)
> >
>
> This is a nice result. After some more thoughts, I'm feeling the startup
> cost calculation on seq scan looks a more important one to address.
>
> Bad Plan: Append  (cost=0.00..2415.09 ..) shows us the "startup cost" is 0.
> Good plan: Append  (cost=0.29..1383.12 ..) show us the "startup cost" is
> 0.29.
>
> The major reason of this is we calculate the "startup cost" for
> "SeqScan" and "Index scan" with different guidances. For the "Index
> scan", the startup cost is "the cost to retrieve the first tuple",
> however for "SeqScan", it is not, as we can see the startup cost for
> query "SELECT * FROM tenk2 WHERE thousand = 0" has a 0 startup_cost.
>
> In my understading, "startup cost" means the cost to retrieve the first
> tuple *already*, but at [1], Tom said:
>
> """
> I think that things might work out better if we redefined the startup
> cost as "estimated cost to retrieve the first tuple", rather than its
> current very-squishy definition as "cost to initialize the scan".
> """
>
> The above statement makes me confused. If we take the startup cost as
> cost to retrieve cost for the first tuple, we can do the below quick hack,
>
> @@ -355,8 +355,8 @@ cost_seqscan(Path *path, PlannerInfo *root,
>         }
>
>         path->disabled_nodes = enable_seqscan ? 0 : 1;
> -       path->startup_cost = startup_cost;
>         path->total_cost = startup_cost + cpu_run_cost + disk_run_cost;
> +       path->startup_cost = startup_cost +   (cpu_run_cost + disk_run_cost) * (1 - path->rows / baserel->tuples);
>  }

You can check I've already proposed similar change years before.
https://www.postgresql.org/message-id/CAPpHfdvfDAXPXhFQT3Ww%3DkZ4tpyAsD07_oz8-fh0%3Dp6eckEBKQ%40mail.gmail.com
It appears that according to the current model startup_cost is not the
cost of the first tuple.  It should be the cost of preparation work,
while extraction of tuples should be distributed uniformly through
total_cost - startup_cost.

------
Regards,
Alexander Korotkov
Supabase

Re: Considering fractional paths in Append node

From

Alexander Korotkov

Date:

02 March, 11:53:24

Hi, Andrei!

On Fri, Dec 6, 2024 at 10:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 12/6/24 13:48, Andrei Lepikhov wrote:
> > On 11/2/24 01:18, Nikita Malakhov wrote:
> >> I've corrected failing test and created a patch at Commitfest:
> >> https://commitfest.postgresql.org/51/5361/ <https://
> >> commitfest.postgresql.org/51/5361/>
> > I have played around with this feature, which looks promising for such a
> > tiny change. It provides a 'bottom boundary' recommendation for
> > appending subpaths, participating in the 'fractional branch' of paths.
> > As I see it works consistently with the plans, created for plain tables
> > filled with similar data.
> > According to the proposal to change SeqScan logic, IMO, Andy is right.
> > But it is a separate improvement because it wouldn't work in the case of
> > LIMIT 10 or 100, as the newly added regression tests demonstrate.
> > I think this feature gives sensible profit for partitionwise paths.
> > Pushing this knowledge into subpaths could help postgres_fdw to reduce
> > network traffic.
> >
> See the attached patch: regression tests added; *_ext function removed -
> I think we wouldn't back-patch it into minor releases.

Thank you for revising this patch.  I've a couple of questions for you.
1. You revise get_cheapest_fractional_path() to reject parametrized
paths in all the cases.  Are you sure this wouldn't affect other
callers of this function?
2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
do you think we could limit this to some simple cases?  For instance,
check that RelOptInfo is the final result relation for given root.

Links.
https://www.postgresql.org/message-id/871q0fvbje.fsf%40163.com

------
Regards,
Alexander Korotkov
Supabase

Re: Considering fractional paths in Append node

From

Andrei Lepikhov

Date:

03 March, 14:04:06

On 2/3/2025 09:53, Alexander Korotkov wrote:
> Hi, Andrei!
> 
> On Fri, Dec 6, 2024 at 10:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>
>> On 12/6/24 13:48, Andrei Lepikhov wrote:
>>> On 11/2/24 01:18, Nikita Malakhov wrote:
>>>> I've corrected failing test and created a patch at Commitfest:
>>>> https://commitfest.postgresql.org/51/5361/ <https://
>>>> commitfest.postgresql.org/51/5361/>
>>> I have played around with this feature, which looks promising for such a
>>> tiny change. It provides a 'bottom boundary' recommendation for
>>> appending subpaths, participating in the 'fractional branch' of paths.
>>> As I see it works consistently with the plans, created for plain tables
>>> filled with similar data.
>>> According to the proposal to change SeqScan logic, IMO, Andy is right.
>>> But it is a separate improvement because it wouldn't work in the case of
>>> LIMIT 10 or 100, as the newly added regression tests demonstrate.
>>> I think this feature gives sensible profit for partitionwise paths.
>>> Pushing this knowledge into subpaths could help postgres_fdw to reduce
>>> network traffic.
>>>
>> See the attached patch: regression tests added; *_ext function removed -
>> I think we wouldn't back-patch it into minor releases.
> 
> Thank you for revising this patch.  I've a couple of questions for you.
> 1. You revise get_cheapest_fractional_path() to reject parametrized
> paths in all the cases.  Are you sure this wouldn't affect other
> callers of this function?
Yes, in my opinion, it may not affect any caller for now. As you can 
see, the routine is used for upper-rels only. Such RelOptInfos can't 
contain parameterised paths for now. I already attempted to change the 
parameterisation code and let queries pull parameterisation from 
subqueries and subplans, but it is too invasive. It seems this logic 
will stay as it is for a long time.
I added a comment to explain this logic.
What's more, since all subpaths of an Append must have the same 
parameterisation, we simplify the case and just don't discover this option.

> 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
> do you think we could limit this to some simple cases?  For instance,
> check that RelOptInfo is the final result relation for given root.
I believe that using tuple_fraction is not an issue. Instead, it serves 
as a flag that allows the upper-level optimisation to consider 
additional options. The upper-level optimiser has more variants to 
examine and will select the optimal path based on the knowledge 
available at that level. Therefore, we're not introducing a mistake 
here; we're simply adding extra work in the narrow case. However, having 
only the bottom-up planning process, I don't see how we could avoid this 
additional workload.

-- 
regards, Andrei Lepikhov

Attachment

v2-0001-Teach-Append-to-consider-tuple_fraction-when-accu.patch

Re: Considering fractional paths in Append node

From

Alena Rybakina

Date:

03 March, 14:17:18

Hi! Thank you for your work on this subject.

I agree with your code but one phrase in commit message was confusing 
for me:

"This change is dedicated to more active usage of IndexScan and 
parameterised
NestLoop paths in partitioned cases under an Append node, as it already 
works
with plain tables."

As I understood this thread is about the optimization that allows 
considering Index Scan if we need the limited number of rows and
later the commit message contains it. I didn't understand completely the 
sentence above to be honest.


-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Considering fractional paths in Append node

From

Alena Rybakina

Date:

03 March, 19:34:04

On 03.03.2025 14:17, Alena Rybakina wrote:
> Hi! Thank you for your work on this subject.
>
> I agree with your code but one phrase in commit message was confusing 
> for me:
>
> "This change is dedicated to more active usage of IndexScan and 
> parameterised
> NestLoop paths in partitioned cases under an Append node, as it 
> already works
> with plain tables."
>
> As I understood this thread is about the optimization that allows 
> considering Index Scan if we need the limited number of rows and
> later the commit message contains it. I didn't understand completely 
> the sentence above to be honest.
>
>
Sorry for noise, I understood it and I think everything is fine there.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Considering fractional paths in Append node

From

Alexander Korotkov

Date:

05 March, 05:27:20

On Mon, Mar 3, 2025 at 1:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> On 2/3/2025 09:53, Alexander Korotkov wrote:
> > Hi, Andrei!
> >
> > On Fri, Dec 6, 2024 at 10:13 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>
> >> On 12/6/24 13:48, Andrei Lepikhov wrote:
> >>> On 11/2/24 01:18, Nikita Malakhov wrote:
> >>>> I've corrected failing test and created a patch at Commitfest:
> >>>> https://commitfest.postgresql.org/51/5361/ <https://
> >>>> commitfest.postgresql.org/51/5361/>
> >>> I have played around with this feature, which looks promising for such a
> >>> tiny change. It provides a 'bottom boundary' recommendation for
> >>> appending subpaths, participating in the 'fractional branch' of paths.
> >>> As I see it works consistently with the plans, created for plain tables
> >>> filled with similar data.
> >>> According to the proposal to change SeqScan logic, IMO, Andy is right.
> >>> But it is a separate improvement because it wouldn't work in the case of
> >>> LIMIT 10 or 100, as the newly added regression tests demonstrate.
> >>> I think this feature gives sensible profit for partitionwise paths.
> >>> Pushing this knowledge into subpaths could help postgres_fdw to reduce
> >>> network traffic.
> >>>
> >> See the attached patch: regression tests added; *_ext function removed -
> >> I think we wouldn't back-patch it into minor releases.
> >
> > Thank you for revising this patch.  I've a couple of questions for you.
> > 1. You revise get_cheapest_fractional_path() to reject parametrized
> > paths in all the cases.  Are you sure this wouldn't affect other
> > callers of this function?
> Yes, in my opinion, it may not affect any caller for now. As you can
> see, the routine is used for upper-rels only. Such RelOptInfos can't
> contain parameterised paths for now. I already attempted to change the
> parameterisation code and let queries pull parameterisation from
> subqueries and subplans, but it is too invasive. It seems this logic
> will stay as it is for a long time.
> I added a comment to explain this logic.
> What's more, since all subpaths of an Append must have the same
> parameterisation, we simplify the case and just don't discover this option.

Makes sense.  I have tries to add Assert(!path->param_info) to
get_cheapest_fractional_path() without the patch, and tests get
passed.

> > 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
> > do you think we could limit this to some simple cases?  For instance,
> > check that RelOptInfo is the final result relation for given root.
> I believe that using tuple_fraction is not an issue. Instead, it serves
> as a flag that allows the upper-level optimisation to consider
> additional options. The upper-level optimiser has more variants to
> examine and will select the optimal path based on the knowledge
> available at that level. Therefore, we're not introducing a mistake
> here; we're simply adding extra work in the narrow case. However, having
> only the bottom-up planning process, I don't see how we could avoid this
> additional workload.

Yes, but if we can assume root->tuple_fraction applies to result of
Append, it's strange we apply the same tuple fraction to all the child
rels.  Latter rels should less likely be used at all and perhaps
should have less tuple_fraction.

------
Regards,
Alexander Korotkov
Supabase

Re: Considering fractional paths in Append node

From

Andrei Lepikhov

Date:

05 March, 09:31:39

On 5/3/2025 03:27, Alexander Korotkov wrote:
> On Mon, Mar 3, 2025 at 1:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
>>> do you think we could limit this to some simple cases?  For instance,
>>> check that RelOptInfo is the final result relation for given root.
>> I believe that using tuple_fraction is not an issue. Instead, it serves
>> as a flag that allows the upper-level optimisation to consider
>> additional options. The upper-level optimiser has more variants to
>> examine and will select the optimal path based on the knowledge
>> available at that level. Therefore, we're not introducing a mistake
>> here; we're simply adding extra work in the narrow case. However, having
>> only the bottom-up planning process, I don't see how we could avoid this
>> additional workload.
> 
> Yes, but if we can assume root->tuple_fraction applies to result of
> Append, it's strange we apply the same tuple fraction to all the child
> rels.  Latter rels should less likely be used at all and perhaps
> should have less tuple_fraction.
Of course, it may happen. But I'm not sure it is a common rule.
Using LIMIT, we usually select data according to specific clauses. 
Imagine, we need TOP-100 ranked goods. Appending partitions of goods, we 
will use the index on the 'rating' column. But who knows how top-rated 
goods are spread across partitions? Maybe a single partition contains 
all of them? So, we need to select 100 top-rated goods from each partition.
Hence, applying the same limit to each partition seems reasonable, right?

-- 
regards, Andrei Lepikhov

Re: Considering fractional paths in Append node

From

Alexander Korotkov

Date:

05 March, 14:20:04

On Wed, Mar 5, 2025 at 8:32 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> On 5/3/2025 03:27, Alexander Korotkov wrote:
> > On Mon, Mar 3, 2025 at 1:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> >>> 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
> >>> do you think we could limit this to some simple cases?  For instance,
> >>> check that RelOptInfo is the final result relation for given root.
> >> I believe that using tuple_fraction is not an issue. Instead, it serves
> >> as a flag that allows the upper-level optimisation to consider
> >> additional options. The upper-level optimiser has more variants to
> >> examine and will select the optimal path based on the knowledge
> >> available at that level. Therefore, we're not introducing a mistake
> >> here; we're simply adding extra work in the narrow case. However, having
> >> only the bottom-up planning process, I don't see how we could avoid this
> >> additional workload.
> >
> > Yes, but if we can assume root->tuple_fraction applies to result of
> > Append, it's strange we apply the same tuple fraction to all the child
> > rels.  Latter rels should less likely be used at all and perhaps
> > should have less tuple_fraction.
> Of course, it may happen. But I'm not sure it is a common rule.
> Using LIMIT, we usually select data according to specific clauses.
> Imagine, we need TOP-100 ranked goods. Appending partitions of goods, we
> will use the index on the 'rating' column. But who knows how top-rated
> goods are spread across partitions? Maybe a single partition contains
> all of them? So, we need to select 100 top-rated goods from each partition.
> Hence, applying the same limit to each partition seems reasonable, right?

Ok, I didn't notice add_paths_to_append_rel() is used for MergeAppend
as well.  I thought again about regular Append.  If can have required
number of rows from the first few children relations, the error of
tuple fraction shouldn't influence plans much, and other children
relations wouldn't be used at all.  But if we don't, we unlikely get
prediction of selectivity accurate enough to predict which exact
children relations are going to be used.  So, usage root tuple
fraction for every child relation would be safe.  So, this approach
makes sense to me.

------
Regards,
Alexander Korotkov
Supabase

Re: Considering fractional paths in Append node

From

Alexander Korotkov

Date:

09 March, 14:44:58

On Wed, Mar 5, 2025 at 1:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, Mar 5, 2025 at 8:32 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > On 5/3/2025 03:27, Alexander Korotkov wrote:
> > > On Mon, Mar 3, 2025 at 1:04 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > >>> 2. As usage of root->tuple_fraction RelOptInfo it has been criticized,
> > >>> do you think we could limit this to some simple cases?  For instance,
> > >>> check that RelOptInfo is the final result relation for given root.
> > >> I believe that using tuple_fraction is not an issue. Instead, it serves
> > >> as a flag that allows the upper-level optimisation to consider
> > >> additional options. The upper-level optimiser has more variants to
> > >> examine and will select the optimal path based on the knowledge
> > >> available at that level. Therefore, we're not introducing a mistake
> > >> here; we're simply adding extra work in the narrow case. However, having
> > >> only the bottom-up planning process, I don't see how we could avoid this
> > >> additional workload.
> > >
> > > Yes, but if we can assume root->tuple_fraction applies to result of
> > > Append, it's strange we apply the same tuple fraction to all the child
> > > rels.  Latter rels should less likely be used at all and perhaps
> > > should have less tuple_fraction.
> > Of course, it may happen. But I'm not sure it is a common rule.
> > Using LIMIT, we usually select data according to specific clauses.
> > Imagine, we need TOP-100 ranked goods. Appending partitions of goods, we
> > will use the index on the 'rating' column. But who knows how top-rated
> > goods are spread across partitions? Maybe a single partition contains
> > all of them? So, we need to select 100 top-rated goods from each partition.
> > Hence, applying the same limit to each partition seems reasonable, right?
>
> Ok, I didn't notice add_paths_to_append_rel() is used for MergeAppend
> as well.  I thought again about regular Append.  If can have required
> number of rows from the first few children relations, the error of
> tuple fraction shouldn't influence plans much, and other children
> relations wouldn't be used at all.  But if we don't, we unlikely get
> prediction of selectivity accurate enough to predict which exact
> children relations are going to be used.  So, usage root tuple
> fraction for every child relation would be safe.  So, this approach
> makes sense to me.

I've slightly revised the commit message and comments.  I'm going to
push this if no objections.

------
Regards,
Alexander Korotkov
Supabase

Attachment

v3-0001-Teach-Append-to-consider-tuple_fraction-when-accu.patch

Re: Considering fractional paths in Append node

From

Nikita Malakhov

Date:

09 March, 21:26:47

Hi!

No objections. Alexander, thank you!

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/