Thread: Have we tried to treat CTE as SubQuery in planner?

Have we tried to treat CTE as SubQuery in planner?

From
Andy Fan
Date:
Hi: 

Take the following example: 

insert into cte1  select i, i from generate_series(1, 1000000)i;
create index on cte1(a);

explain 
with cte1 as  (select * from cte1)
select * from c where a = 1;

It needs to do seq scan on the above format, however it is pretty
quick if we change the query to  
select * from (select * from cte1) c where a = 1;  

I know how we treat cte and subqueries differently currently,
I just don't know why we can't treat cte as a subquery, so lots of
subquery related technology can apply to it.  Do we have any 
discussion about this?  

Thanks

--
Best Regards
Andy Fan

Re: Have we tried to treat CTE as SubQuery in planner?

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> Take the following example:

> insert into cte1  select i, i from generate_series(1, 1000000)i;
> create index on cte1(a);

> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;

> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;

This example seems both confused and out of date.  Since we changed
the rules on materializing CTEs (in 608b167f9), I get

regression=# create table c as select i as a, i from generate_series(1, 1000000)i;
SELECT 1000000
regression=# create index on c(a);
CREATE INDEX
regression=# explain
regression-# with cte1 as (select * from c)
regression-# select * from cte1 where a = 1;
                                QUERY PLAN
--------------------------------------------------------------------------
 Bitmap Heap Scan on c  (cost=95.17..4793.05 rows=5000 width=8)
   Recheck Cond: (a = 1)
   ->  Bitmap Index Scan on c_a_idx  (cost=0.00..93.92 rows=5000 width=0)
         Index Cond: (a = 1)
(4 rows)

            regards, tom lane



Re: Have we tried to treat CTE as SubQuery in planner?

From
Julien Rouhaud
Date:
On Sat, Nov 14, 2020 at 2:04 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi:
>
> Take the following example:
>
> insert into cte1  select i, i from generate_series(1, 1000000)i;
> create index on cte1(a);
>
> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;
>
> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;
>
> I know how we treat cte and subqueries differently currently,
> I just don't know why we can't treat cte as a subquery, so lots of
> subquery related technology can apply to it.  Do we have any
> discussion about this?

See https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=608b167f9f9c4553c35bb1ec0eab9ddae643989b



Re: Have we tried to treat CTE as SubQuery in planner?

From
Jesse Zhang
Date:
Hi,

On Fri, Nov 13, 2020 at 10:04 PM Andy Fan wrote:
>
> Hi:
>
> Take the following example:
>
> insert into cte1  select i, i from generate_series(1, 1000000)i;
> create index on cte1(a);
>
> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;
>

ITYM:

EXPLAIN
WITH c AS (SELECT * FROM cte1)
SELECT * FROM c WHERE a = 1;

I'm also guessing your table DDL is:

CREATE TABLE cte1 (a int, b int);

> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;

Does it? On HEAD, I got the following plan:

(without stats):
 Bitmap Heap Scan on foo
   Recheck Cond: (a = 1)
   ->  Bitmap Index Scan on foo_a_idx
         Index Cond: (a = 1)

(with stats):
 Index Scan using foo_a_idx on foo
   Index Cond: (a = 1)


>
> I know how we treat cte and subqueries differently currently,
> I just don't know why we can't treat cte as a subquery, so lots of
> subquery related technology can apply to it.  Do we have any
> discussion about this?

This was brought up a few times, the most recent one I can recall was a
little bit over two years ago [1]

[1] https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk



Re: Have we tried to treat CTE as SubQuery in planner?

From
Andy Fan
Date:


On Sat, Nov 14, 2020 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> Take the following example:

> insert into cte1  select i, i from generate_series(1, 1000000)i;
> create index on cte1(a);

> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;

> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;

This example seems both confused and out of date.  Since we changed
the rules on materializing CTEs (in 608b167f9), I get

 
Sorry,  I should have tested it again on the HEAD,  and 608b167f9 is exactly
the thing  I mean. 

regression=# create table c as select i as a, i from generate_series(1, 1000000)i;
SELECT 1000000
regression=# create index on c(a);
CREATE INDEX
regression=# explain
regression-# with cte1 as (select * from c)
regression-# select * from cte1 where a = 1;
                                QUERY PLAN                               
--------------------------------------------------------------------------
 Bitmap Heap Scan on c  (cost=95.17..4793.05 rows=5000 width=8)
   Recheck Cond: (a = 1)
   ->  Bitmap Index Scan on c_a_idx  (cost=0.00..93.92 rows=5000 width=0)
         Index Cond: (a = 1)
(4 rows)

                        regards, tom lane


--
Best Regards
Andy Fan

Re: Have we tried to treat CTE as SubQuery in planner?

From
Andy Fan
Date:


On Sat, Nov 14, 2020 at 2:44 PM Jesse Zhang <sbjesse@gmail.com> wrote:
Hi,

On Fri, Nov 13, 2020 at 10:04 PM Andy Fan wrote:
>
> Hi:
>
> Take the following example:
>
> insert into cte1  select i, i from generate_series(1, 1000000)i;
> create index on cte1(a);
>
> explain
> with cte1 as  (select * from cte1)
> select * from c where a = 1;
>

ITYM:

EXPLAIN
WITH c AS (SELECT * FROM cte1)
SELECT * FROM c WHERE a = 1;

I'm also guessing your table DDL is:

CREATE TABLE cte1 (a int, b int);

> It needs to do seq scan on the above format, however it is pretty
> quick if we change the query to
> select * from (select * from cte1) c where a = 1;

Does it? On HEAD, I got the following plan:

 
You understand me correctly,  just too busy recently and make
me make mistakes like this. Sorry about that:( 


 
(without stats):
 Bitmap Heap Scan on foo
   Recheck Cond: (a = 1)
   ->  Bitmap Index Scan on foo_a_idx
         Index Cond: (a = 1)

(with stats):
 Index Scan using foo_a_idx on foo
   Index Cond: (a = 1)


 
>
> I know how we treat cte and subqueries differently currently,
> I just don't know why we can't treat cte as a subquery, so lots of
> subquery related technology can apply to it.  Do we have any
> discussion about this?

This was brought up a few times, the most recent one I can recall was a
little bit over two years ago [1]

[1] https://postgr.es/m/87sh48ffhb.fsf@news-spur.riddles.org.uk

And I should have searched "CTE" at least for a while..


--
Best Regards
Andy Fan