Thread: BUG #6359: excessively inlining subquery leads to slow queries

BUG #6359: excessively inlining subquery leads to slow queries

From
maxim.boguk@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      6359
Logged by:          Maksym Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.1.2
Operating system:   Ubuntu linux
Description:=20=20=20=20=20=20=20=20

Sometime Postgres inline subrequest even if it produce slower plan (and that
slow plan have higher actual cost than non-inlined plan):

test case:

drop table if exists t1;
drop table if exists t2;

create table t1 as select id from generate_series(1,1) as g(id);
create table t2 as select id from generate_series(1,1000) as g(id);
alter table t1 add primary key (id);
alter table t2 add primary key (id);


analyze t1;
analyze t2;

--fast non-inlined plan
explain (verbose, analyze)
select
    id,
=20=20=20
t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_=
id+t2_id+t2_id+t2_id+t2_id
FROM
(
   select  t1.id,
           (select t2.id from t2 where t2.id=3Dt1.id) as t2_id
   from t1
   offset 0
) as t;

=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20
                          QUERY PLAN
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
-------------------------------------------------------------------
 Subquery Scan on t  (cost=3D0.00..0.65 rows=3D1 width=3D8) (actual
time=3D0.066..0.069 rows=3D1 loops=3D1)
   Output: t.id, ((((((((((((((((t.t2_id + t.t2_id) + t.t2_id) + t.t2_id) +
t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) +
t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id)
   ->  Limit  (cost=3D0.00..0.60 rows=3D1 width=3D4) (actual time=3D0.053..=
0.056
rows=3D1 loops=3D1)
         Output: t1.id, ((SubPlan 1))
         ->  Seq Scan on public.t1  (cost=3D0.00..0.60 rows=3D1 width=3D4) =
(actual
time=3D0.052..0.053 rows=3D1 loops=3D1)
               Output: t1.id, (SubPlan 1)
               SubPlan 1
                 ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0=
.49
rows=3D1 width=3D4) (actual time=3D0.025..0.028 rows=3D1 loops=3D1)
                       Output: t2.id
                       Index Cond: (t2.id =3D t1.id)
 Total runtime: 0.161 ms
(11 rows)



--slow inlined plan
explain (verbose, analyze)
select
    id,
=20=20=20
t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_=
id+t2_id+t2_id+t2_id+t2_id
FROM
(
   select  t1.id,
           (select t2.id from t2 where t2.id=3Dt1.id) as t2_id
   from t1
--   offset 0
) as t;

=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20
                                                                 QUERY PLAN=
=20
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------
 Seq Scan on public.t1  (cost=3D0.00..8.44 rows=3D1 width=3D4) (actual
time=3D0.180..0.181 rows=3D1 loops=3D1)
   Output: t1.id, (((((((((((((((((SubPlan 1) + (SubPlan 2)) + (SubPlan 3))
+ (SubPlan 4)) + (SubPlan 5)) + (SubPlan 6)) + (SubPlan 7)) + (SubPlan 8)) +
(SubPlan 9)) + (SubPlan 10)) + (SubPlan 11)) + (SubPlan 12)) + (SubPlan 13))
+ (SubPlan 14)) + (SubPlan 15)) + (SubPlan 16)) + (SubPlan 17))
   SubPlan 1
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.025..0.028 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 2
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.006..0.007 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 3
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 4
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 5
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 6
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 7
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.004 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 8
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 9
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 10
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 11
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 12
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 13
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 14
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 15
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 16
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.004..0.005 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
   SubPlan 17
     ->  Index Scan using t2_pkey on public.t2  (cost=3D0.00..0.49 rows=3D1
width=3D4) (actual time=3D0.005..0.006 rows=3D1 loops=3D1)
           Output: public.t2.id
           Index Cond: (public.t2.id =3D t1.id)
 Total runtime: 0.466 ms
(71 rows)

The inlined plan uses 3x more time and have 10x higher cost.

I found that problem in much more longer analytical query where subrequest
is slow and complicated.

Re: BUG #6359: excessively inlining subquery leads to slow queries

From
Robert Haas
Date:
On Mon, Dec 26, 2011 at 8:50 AM,  <maxim.boguk@gmail.com> wrote:
> The following bug has been logged on the website:
>
> Bug reference: =A0 =A0 =A06359
> Logged by: =A0 =A0 =A0 =A0 =A0Maksym Boguk
> Email address: =A0 =A0 =A0maxim.boguk@gmail.com
> PostgreSQL version: 9.1.2
> Operating system: =A0 Ubuntu linux
> Description:
>
> Sometime Postgres inline subrequest even if it produce slower plan (and t=
hat
> slow plan have higher actual cost than non-inlined plan):

I'm pretty sure this has been discussed before.  In the case where a
non-trivial expression is going to get inlined in multiple places, it
would probably be better to replace it with a placeholder referencing
a subplan that evaluates it just once.  But, we don't currently try to
do that.  In practice, this only bites us occasionally, because it's
actually fairly rare to write a query that references the same
subquery output twice, let alone 17 times.  Also, detecting the
scenario where we shouldn't inline probably isn't free, so we'd have
to weigh the benefit of helping some people quite a bit against the
cost of slowing down query planning slightly for everyone else.  All
in all, it's a complicated problem...  so I'd classify this more as an
unimplemented feature than a bug.

That having been said, I hope we'll get around to improving it at some
point.  Even though in most cases inlining is a huge win, there are
certainly boundary cases where it's not, and this is one of them.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company