Thread: BUG #4290: wrong double subselect with aggregate function

BUG #4290: wrong double subselect with aggregate function

From
"Andreas"
Date:
The following bug has been logged online:

Bug reference:      4290
Logged by:          Andreas
Email address:      postgresql@elbrief.de
PostgreSQL version: 8.3.3
Operating system:   Linux
Description:        wrong double subselect with aggregate function
Details:

select version() ;
                                     version
----------------------------------------------------------------------------
------
 PostgreSQL 8.3.3 on i486-pc-linux-gnu, compiled by GCC cc (Debian 4.3.1-1)
4.3.1

create table a (
  id serial primary key
) ;

create table b (
  id serial primary key
  , aid int not null references a
) ;

create table c (
  id serial primary key
  , aid int not null references a
) ;

create table d (
  id serial primary key
  , bid int not null references b
  , cid int references b
) ;

insert into a ( id ) values ( default ) , ( default ) ;
insert into b ( aid ) values ( 1 ) , ( 2 ) ;
insert into c ( aid ) values ( 1 ) , ( 1 ) , ( 2 ) , ( 2 ) ;
insert into d ( bid ) values ( 1 ) , ( 2 ) ;
select
 ( select min( c.id ) from c where c.aid = ( select b.aid from b where b.id
= d.bid ) ) as min_c_id
  , ( select b.aid from b where b.id = d.bid ) as b_aid
  , ( select min( c.id ) from c where c.aid = 1 ) as min_c_id_1
  , ( select min( c.id ) from c where c.aid = 2 ) as min_c_id_2
from d ;

 min_c_id | b_aid | min_c_id_1 | min_c_id_2
----------+-------+------------+------------
        1 |     1 |          1 |          3
        1 |     2 |          1 |          3

I expected for min_c_id in the second row 3.

Best Regards
Andreas

Re: BUG #4290: wrong double subselect with aggregate function

From
Alvaro Herrera
Date:
Andreas wrote:

>  min_c_id | b_aid | min_c_id_1 | min_c_id_2
> ----------+-------+------------+------------
>         1 |     1 |          1 |          3
>         1 |     2 |          1 |          3
>
> I expected for min_c_id in the second row 3.

Which it is if you turn enable_indexscan to off ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: BUG #4290: wrong double subselect with aggregate function

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Andreas wrote:
>> I expected for min_c_id in the second row 3.

> Which it is if you turn enable_indexscan to off ...

I think one of the subplans isn't getting re-evaluated when it should
be, which would be a botch in the extParam/parParam signaling.  Not sure
yet if it's a planner or executor bug, but more likely the former.

            regards, tom lane

Re: BUG #4290: wrong double subselect with aggregate function

From
Tom Lane
Date:
I wrote:
> I think one of the subplans isn't getting re-evaluated when it should
> be, which would be a botch in the extParam/parParam signaling.  Not sure
> yet if it's a planner or executor bug, but more likely the former.

So, stripping the problem query down to its essentials, we have

regression=# explain select
regression-#   ( select min(c.id) from c
regression(#     where c.aid = ( select b.aid from b where b.id = d.bid ) ) as min_c_id
regression-# from d ;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Seq Scan on d  (cost=0.00..31202.28 rows=1940 width=4)
   SubPlan
     ->  Result  (cost=16.06..16.07 rows=1 width=0)
           InitPlan
             ->  Index Scan using b_pkey on b  (cost=0.00..8.27 rows=1 width=4)
                   Index Cond: (id = $0)
             ->  Limit  (cost=0.00..7.79 rows=1 width=4)
                   ->  Index Scan using c_pkey on c  (cost=0.00..85.70 rows=11 width=4)
                         Filter: ((id IS NOT NULL) AND (aid = $1))
(9 rows)

Looking into the plan tree details, the first InitPlan has
          :extParam (b 0)
          :allParam (b 0)
which correctly indicates that it depends on the value of $0
(which represents d.bid passed down from the outer query).
The second InitPlan has
          :extParam (b)
          :allParam (b 1)
which indicates that it depends on the value of $1, which is the output
if the first InitPlan.  I think it's probably a bug that the extParam
doesn't include 1, since the first InitPlan isn't a sub-InitPlan of
the second one (it could be, but we choose not to do things that way).
However I'm not sure that that has any real consequences in this
example.

The *real* problem here is that the second InitPlan doesn't show any
dependency on $0.  This means that when we re-evaluate the subquery at
the second row of d, we correctly mark the first InitPlan as needing
recalculation (because we know $0 changed), but the second InitPlan is
thought not to need recalculation, and so we just use its previous
result as-is.  The outer subquery is merely demanding $2 (the result
of the second InitPlan) and doesn't know that this isn't up-to-date.

Clearly, if an InitPlan depends on the result of a sibling InitPlan,
it had better be made to depend on all the extParams of that sibling
too.

This seems like a fairly serious bug, and one that affects all branches
back to 8.1 (before that, InitPlans couldn't depend on their siblings,
or at least so claims a comment in subselect.c).  I'm surprised we've
not seen reports of it before.

            regards, tom lane