Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct - Mailing list pgsql-bugs

From Andres Freund
Subject Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Date
Msg-id 201201120153.02587.andres@anarazel.de
Whole thread Raw
In response to Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote:
>  (In cases where we know it's unique, something like this
> could be reasonable, but I believe get_variable_numdistinct already
> accounts for such cases.)
One of those case which looks relatively easy is that CTEs currently work as a
kind of 'statistics barrier' here. Specifically I wonder why:

test_raptelan=# EXPLAIN WITH foo AS (SELECT * FROM b WHERE id < 5000) SELECT *
FROM a WHERE a.id IN (SELECT id FROM foo);
                                  QUERY PLAN
------------------------------------------------------------------------------
 Nested Loop  (cost=302.02..1876.30 rows=2550000 width=11)
   CTE foo
     ->  Index Scan using b_pkey on b  (cost=0.00..184.06 rows=5243 width=10)
           Index Cond: (id < 5000)
   ->  HashAggregate  (cost=117.97..119.97 rows=200 width=4)
         ->  CTE Scan on foo  (cost=0.00..104.86 rows=5243 width=4)
   ->  Index Scan using a_pkey on a  (cost=0.00..7.85 rows=1 width=11)
         Index Cond: (id = foo.id)


plans differently than

test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT id FROM b WHERE
id < 5000 OFFSET 0);
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Merge Semi Join  (cost=560.41..17426.03 rows=5243 width=11)
   Merge Cond: (a.id = b.id)
   ->  Index Scan using a_pkey on a  (cost=0.00..160013.81 rows=5100000
width=11)
   ->  Sort  (cost=560.40..573.51 rows=5243 width=4)
         Sort Key: b.id
         ->  Limit  (cost=0.00..184.06 rows=5243 width=4)
               ->  Index Only Scan using b_pkey on b  (cost=0.00..184.06
rows=5243 width=4)
                     Index Cond: (id < 5000)


Couldn't the CTE pass a vardata from inside to the outside?

Andres

pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Next
From: Tom Lane
Date:
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct