Home > mailing lists

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Date	November 16, 2023 21:16:08
Msg-id	414317.1700158568@sss.pgh.pa.us Whole thread Raw
In response to	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
List	pgsql-hackers

Tree view

Richard Guo <guofenglinux@gmail.com> writes:
> I think the second plan (patched) makes more sense.  In the first plan
> (unpatched), the HashAggregate node actually does not reduce the the
> number of rows because it groups by 'unique1', but planner does not know
> that because it lacks statistics for Vars referencing the CTE.

Yeah.  It's faster in reality too:

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                  QUERY PLAN
                      

----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=692.29..692.30 rows=1 width=8) (actual time=15.186..15.188 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.028..0.754rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Nested Loop  (cost=225.28..409.50 rows=5000 width=0) (actual time=3.652..14.733 rows=10000 loops=1)
         ->  HashAggregate  (cost=225.00..227.00 rows=200 width=4) (actual time=3.644..4.510 rows=10000 loops=1)
               Group Key: x.unique1
               Batches: 1  Memory Usage: 929kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.030..1.932 rows=10000 loops=1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..0.90 rows=1 width=4) (actual time=0.001..0.001
rows=1loops=10000) 
               Index Cond: (unique1 = x.unique1)
               Heap Fetches: 0
 Planning Time: 0.519 ms
 Execution Time: 15.479 ms
(14 rows)

vs

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1028.07..1028.08 rows=1 width=8) (actual time=4.578..4.579 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.751rows=10000 loops=1) 
           Heap Fetches: 0
   ->  Hash Semi Join  (cost=325.28..732.78 rows=10000 width=0) (actual time=2.706..4.305 rows=10000 loops=1)
         Hash Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.011..0.676rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Hash  (cost=200.00..200.00 rows=10000 width=4) (actual time=2.655..2.655 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.012..1.963 rows=10000 loops=1)
 Planning Time: 0.504 ms
 Execution Time: 4.821 ms
(13 rows)

Now, what you get if you remove MATERIALIZED is faster yet:

regression=# explain analyze with x as (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
                                                                    QUERY PLAN
                          

--------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=715.57..715.58 rows=1 width=8) (actual time=2.681..2.682 rows=1 loops=1)
   ->  Merge Semi Join  (cost=0.57..690.57 rows=10000 width=0) (actual time=0.016..2.408 rows=10000 loops=1)
         Merge Cond: (a.unique1 = b.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.696rows=10000 loops=1) 
               Heap Fetches: 0
         ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..270.29 rows=10000 width=4) (actual
time=0.007..0.655rows=10000 loops=1) 
               Heap Fetches: 0
 Planning Time: 0.160 ms
 Execution Time: 2.718 ms
(9 rows)

I poked into that and found that the reason we don't get a mergejoin
with the materialized CTE is that the upper planner invocation doesn't
know that the CTE's output is sorted, so it thinks a separate sort
step would be needed.

So you could argue that there's more to do here, but I'm hesitant
to go further.  Part of the point of MATERIALIZED is to be an
optimization fence, so breaking down that fence is something to be
wary of.  Maybe we shouldn't even take this patch --- but on
balance I think it's an OK compromise.

            regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 16 November 2023, 20:50:43
Subject: Re: trying again to get incremental backup

From: Robert Haas
Date: 16 November 2023, 22:36:11
Subject: Re: trying again to get incremental backup

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 - Mailing list pgsql-hackers

Previous

Next