Re: make add_paths_to_append_rel aware of startup cost - Mailing list pgsql-hackers

From David Rowley
Subject Re: make add_paths_to_append_rel aware of startup cost
Date
Msg-id CAApHDvogv5MBw6JV82Yb6gAa0mp64HQtX--TSK_wZOn9x2fP4Q@mail.gmail.com
Whole thread Raw
In response to Re: make add_paths_to_append_rel aware of startup cost  (Andy Fan <zhihuifan1213@163.com>)
Responses Re: make add_paths_to_append_rel aware of startup cost
Re: make add_paths_to_append_rel aware of startup cost
List pgsql-hackers
On Thu, 15 Feb 2024 at 21:42, Andy Fan <zhihuifan1213@163.com> wrote:
> I found the both plans have the same cost, I can't get the accurate
> cause of this after some hours research, but it is pretty similar with
> 7516056c584e3, so I uses a similar strategy to stable it. is it
> acceptable?

It's pretty hard to say.  I can only guess why this test would be
flapping like this. I see it's happened before on mylodon, so probably
not a cosmic ray.  It's not like add_path() chooses a random path when
the costs are the same, so I wondered if something similar is going on
here that was going on that led to f03a9ca4. In particular, see [1].

On master, I set a breakpoint in try_nestloop_path() to break on
"outerrel->relid==1 && innerrel->relid==2". I see the total Nested
Loop cost comes out the same with the join order reversed.

Which is:

 ->  Nested Loop  (cost=0.00..1500915.00 rows=10000 width=4)

Doing the same with your patch applied, I get:

->  Nested Loop  (cost=0.00..600925.00 rows=4000 width=4)

and forcing the join order to swap with the debugger, I see:

->  Nested Loop  (cost=0.00..600940.00 rows=4000 width=4)

So there's a difference now, but it's quite small. If it was a problem
like we had on [1], then since tenk1 and tenk2 have 345 pages (on my
machine), if relpages is down 1 or 2 pages, we'll likely get more of a
costing difference than 600925 vs 600940.

If I use:

explain
select t1.unique1 from tenk1 t1
inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0
union all
(values(1)) limit 1;

I get:

->  Nested Loop  (cost=0.00..2415.03 rows=10 width=4)

and with the join order reversed, I get:

 ->  Nested Loop  (cost=0.00..2440.00 rows=10 width=4)

I'd be more happy using this one as percentage-wise, the cost
difference is much larger.  I don't quite have the will to go through
proving what the actual problem is here. I think [1] already proved
the relpages problem can (or could) happen.

I checked that the t2.thounsand = 0 query still tests the cheap
startup paths in add_paths_to_append_rel() and it does. If I flip
startup_subpaths_valid to false in the debugger, the plan flips to:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=470.12..514.00 rows=1 width=4)
   ->  Append  (cost=470.12..952.79 rows=11 width=4)
         ->  Hash Join  (cost=470.12..952.73 rows=10 width=4)
               Hash Cond: (t1.tenthous = t2.tenthous)
               ->  Seq Scan on tenk1 t1  (cost=0.00..445.00 rows=10000 width=8)
               ->  Hash  (cost=470.00..470.00 rows=10 width=4)
                     ->  Seq Scan on tenk2 t2  (cost=0.00..470.00
rows=10 width=4)
                           Filter: (thousand = 0)
         ->  Result  (cost=0.00..0.01 rows=1 width=4)

So, if nobody has any better ideas, I'm just going to push the " and
t2.thousand = 0" adjustment.

David

[1] https://www.postgresql.org/message-id/4174.1563239552%40sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Memory consumed by paths during partitionwise join planning
Next
From: Robert Haas
Date:
Subject: Re: Add system identifier to backup manifest