Re: Considering fractional paths in Append node - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Considering fractional paths in Append node |
Date | |
Msg-id | CAPpHfdvJk1=2Wagn=E4cGQVL=mSux=pgXeGcupwpYQSuLMxrhg@mail.gmail.com Whole thread Raw |
In response to | Re: Considering fractional paths in Append node (Andy Fan <zhihuifan1213@163.com>) |
List | pgsql-hackers |
Hi, Andy! On Fri, Oct 18, 2024 at 3:55 AM Andy Fan <zhihuifan1213@163.com> wrote: > > Nikita Malakhov <hukutoc@gmail.com> writes: > > > > The effect is easily seen in one of standard PG tests: > > Vanilla (current master): > > explain analyze > > select t1.unique1 from tenk1 t1 > > inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 > > union all > > (values(1)) limit 1; > > QUERY PLAN > > > > ------------------------------------------------------------------------------------------------------------------------------ > > > > Limit (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1 loops=1) > > -> Append (cost=0.00..2415.09 rows=11 width=4) (actual time=6.308..6.310 rows=1 loops=1) > > -> Nested Loop (cost=0.00..2415.03 rows=10 width=4) (actual time=6.307..6.308 rows=1 loops=1) > > Join Filter: (t1.tenthous = t2.tenthous) > > Rows Removed by Join Filter: 4210 > > -> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=10000 width=8) (actual time=0.004..0.057 rows=422 > > loops=1) > > -> Materialize (cost=0.00..470.05 rows=10 width=4) (actual time=0.000..0.014 rows=10 loops=422) > > Storage: Memory Maximum Storage: 17kB > > -> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..5.535 rows=10 > > loops=1) > > Filter: (thousand = 0) > > Rows Removed by Filter: 9990 > > -> Result (cost=0.00..0.01 rows=1 width=4) (never executed) > > Planning Time: 0.324 ms > > Execution Time: 6.336 ms > > (14 rows) > > > > Patched, the same test: > > explain analyze > > select t1.unique1 from tenk1 t1 > > inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 > > union all > > (values(1)) limit 1; > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------- > > > > Limit (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1 loops=1) > > -> Append (cost=0.29..1383.12 rows=11 width=4) (actual time=0.104..0.105 rows=1 loops=1) > > -> Nested Loop (cost=0.29..1383.05 rows=10 width=4) (actual time=0.104..0.104 rows=1 loops=1) > > -> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..0.076 rows=1 loops=1) > > Filter: (thousand = 0) > > Rows Removed by Filter: 421 > > -> Index Scan using tenk1_thous_tenthous on tenk1 t1 (cost=0.29..91.30 rows=1 width=8) (actual > > time=0.026..0.026 rows=1 loops=1) > > Index Cond: (tenthous = t2.tenthous) > > -> Result (cost=0.00..0.01 rows=1 width=4) (never executed) > > Planning Time: 0.334 ms > > Execution Time: 0.130 ms > > (11 rows) > > > > This is a nice result. After some more thoughts, I'm feeling the startup > cost calculation on seq scan looks a more important one to address. > > Bad Plan: Append (cost=0.00..2415.09 ..) shows us the "startup cost" is 0. > Good plan: Append (cost=0.29..1383.12 ..) show us the "startup cost" is > 0.29. > > The major reason of this is we calculate the "startup cost" for > "SeqScan" and "Index scan" with different guidances. For the "Index > scan", the startup cost is "the cost to retrieve the first tuple", > however for "SeqScan", it is not, as we can see the startup cost for > query "SELECT * FROM tenk2 WHERE thousand = 0" has a 0 startup_cost. > > In my understading, "startup cost" means the cost to retrieve the first > tuple *already*, but at [1], Tom said: > > """ > I think that things might work out better if we redefined the startup > cost as "estimated cost to retrieve the first tuple", rather than its > current very-squishy definition as "cost to initialize the scan". > """ > > The above statement makes me confused. If we take the startup cost as > cost to retrieve cost for the first tuple, we can do the below quick hack, > > @@ -355,8 +355,8 @@ cost_seqscan(Path *path, PlannerInfo *root, > } > > path->disabled_nodes = enable_seqscan ? 0 : 1; > - path->startup_cost = startup_cost; > path->total_cost = startup_cost + cpu_run_cost + disk_run_cost; > + path->startup_cost = startup_cost + (cpu_run_cost + disk_run_cost) * (1 - path->rows / baserel->tuples); > } You can check I've already proposed similar change years before. https://www.postgresql.org/message-id/CAPpHfdvfDAXPXhFQT3Ww%3DkZ4tpyAsD07_oz8-fh0%3Dp6eckEBKQ%40mail.gmail.com It appears that according to the current model startup_cost is not the cost of the first tuple. It should be the cost of preparation work, while extraction of tuples should be distributed uniformly through total_cost - startup_cost. ------ Regards, Alexander Korotkov Supabase
pgsql-hackers by date: