Re: Our trial to TPC-DS but optimizer made unreasonable plan - Mailing list pgsql-hackers

From Qingqing Zhou
Subject Re: Our trial to TPC-DS but optimizer made unreasonable plan
Date
Msg-id CAJjS0u0XWkBE5MfLqwWkKZZ20AZP+5jjsi+EuUOCmJGLDqX0oA@mail.gmail.com
Whole thread Raw
In response to Re: Our trial to TPC-DS but optimizer made unreasonable plan  (Qingqing Zhou <zhouqq.postgres@gmail.com>)
Responses Re: Our trial to TPC-DS but optimizer made unreasonable plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, Aug 19, 2015 at 10:32 AM, Qingqing Zhou
<zhouqq.postgres@gmail.com> wrote:
> On Tue, Aug 18, 2015 at 5:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> BTW, did you register the patch on the upcoming commit-fest?
>>
> Not yet, it is in WIP status.
>

While I am working on the patch, I found some issues and resort help
here. Patch attached.

Here is an example:

postgres=# explain WITH q AS (
    WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on p.i>=p1.i)
SELECT * FROM q WHERE i <= 5;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Nested Loop  (cost=0.58..5980.16 rows=133333 width=8)
   ->  Index Scan using ai on a  (cost=0.29..8.36 rows=4 width=8)
         Index Cond: (i <= 5)
   ->  Index Only Scan using ai on a a_1  (cost=0.29..1159.62
rows=33333 width=4)
         Index Cond: (i <= a.i)
(5 rows)

So far so good. But if we add other references of the CTE q (m1->m,
m->q), we still have some extra CTE scans:

postgres=# explain WITH q AS (
    WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on
p.i>=p1.i), m as (select * from q), m1 as (select * from m)
SELECT * FROM m1 WHERE i <= 5;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 CTE Scan on m  (cost=158365985.66..233365985.65 rows=1111111111 width=8)
   Filter: (i <= 5)
   CTE q
     ->  Nested Loop  (cost=0.29..91699319.00 rows=3333333333 width=8)
           ->  Seq Scan on a  (cost=0.00..1443.00 rows=100000 width=8)
           ->  Index Only Scan using ai on a a_1  (cost=0.29..583.65
rows=33333 width=4)
                 Index Cond: (i <= a.i)
   CTE m
     ->  CTE Scan on q  (cost=0.00..66666666.66 rows=3333333333 width=8)
(9 rows)

Above two queries essentially the same, but the second one is a
non-optimal plan. The reason is that how my patch works: it put a
substitution in front of SS_process_ctes():

   /*
  * If there is a WITH list, process each WITH query and build an initplan
! * SubPlan structure for it. Before we process ctes, try to subsitute with
! * subqueries to benefits from global optimization.
  */
  if (parse->cteList)
+ {
+ substitute_ctes_with_subqueries(root);
  SS_process_ctes(root);
+ }

AFAICS, the substitution only handles cteList within a query block, so
it does not go across the subquery boundary. I can see this is an
issue but can't see a nice way to fix it. Anybody has some recipe?

Regards,
Qingqing

Attachment

pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Re: Is this a bug?
Next
From: Andres Freund
Date:
Subject: Re: Custom Scans and private data