Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Date
Msg-id 414317.1700158568@sss.pgh.pa.us
Whole thread Raw
In response to Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
List pgsql-hackers
Richard Guo <guofenglinux@gmail.com> writes:
> I think the second plan (patched) makes more sense.  In the first plan
> (unpatched), the HashAggregate node actually does not reduce the the
> number of rows because it groups by 'unique1', but planner does not know
> that because it lacks statistics for Vars referencing the CTE.

Yeah.  It's faster in reality too:

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=692.29..692.30 rows=1 width=8) (actual time=15.186..15.188 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.028..0.754rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Nested Loop  (cost=225.28..409.50 rows=5000 width=0) (actual time=3.652..14.733 rows=10000 loops=1)
         ->  HashAggregate  (cost=225.00..227.00 rows=200 width=4) (actual time=3.644..4.510 rows=10000 loops=1)
               Group Key: x.unique1
               Batches: 1  Memory Usage: 929kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.030..1.932 rows=10000 loops=1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..0.90 rows=1 width=4) (actual time=0.001..0.001
rows=1loops=10000) 
               Index Cond: (unique1 = x.unique1)
               Heap Fetches: 0
 Planning Time: 0.519 ms
 Execution Time: 15.479 ms
(14 rows)

vs

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1028.07..1028.08 rows=1 width=8) (actual time=4.578..4.579 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.751rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Hash Semi Join  (cost=325.28..732.78 rows=10000 width=0) (actual time=2.706..4.305 rows=10000 loops=1)
         Hash Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.676rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Hash  (cost=200.00..200.00 rows=10000 width=4) (actual time=2.655..2.655 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.012..1.963 rows=10000 loops=1)
 Planning Time: 0.504 ms
 Execution Time: 4.821 ms
(13 rows)

Now, what you get if you remove MATERIALIZED is faster yet:

regression=# explain analyze with x as (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=715.57..715.58 rows=1 width=8) (actual time=2.681..2.682 rows=1 loops=1)
   ->  Merge Semi Join  (cost=0.57..690.57 rows=10000 width=0) (actual time=0.016..2.408 rows=10000 loops=1)
         Merge Cond: (a.unique1 = b.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.696rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.655rows=10000 loops=1) 
               Heap Fetches: 0
 Planning Time: 0.160 ms
 Execution Time: 2.718 ms
(9 rows)

I poked into that and found that the reason we don't get a mergejoin
with the materialized CTE is that the upper planner invocation doesn't
know that the CTE's output is sorted, so it thinks a separate sort
step would be needed.

So you could argue that there's more to do here, but I'm hesitant
to go further.  Part of the point of MATERIALIZED is to be an
optimization fence, so breaking down that fence is something to be
wary of.  Maybe we shouldn't even take this patch --- but on
balance I think it's an OK compromise.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: trying again to get incremental backup
Next
From: Robert Haas
Date:
Subject: Re: trying again to get incremental backup