Thread: inner query bug

inner query bug

From
Andrew Holm-Hansen
Date:
I was advised to send an e-mail after a discussion with neilc on
#postgresql.

The failure occurs randomly, never on the first time (at least that I've
seen) and usually only after doing a bunch of other stuff first (big
joins, views, etc). In one case, the backend terminated and restarted,
this has only happened once.

P4 w/1024 GB ram.  RH9
uname -a
Linux gecko.paycheckadv.com 2.6.0-0.test9.1.90 #1 Tue Nov 18 09:34:47
EST 2003 i686 i686 i386 GNU/Linux

                    
psql output follows:

SC2test7=# select * from txstatus,tx txx where txstatus.txid = txx.txid
AND txstatus.statuschangetime = (select max(txstatus.statuschangetime)
from txstatus where txstatus.txid = txx.txid); FATAL:  terminating
connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
SC2test7=# \d txstatus
no connection to the server
The connection to the server was lost. Attempting reset: Succeeded.
SC2test7=# \d txstatus
                                            Table "public.txstatus"
      Column      |           Type
|                            Modifiers
------------------+--------------------------+------------------------------------------------------------------
 txstatusid       | integer                  | not null default
nextval('public.txstatus_txstatusid_seq'::text)
 statusid         | integer                  | not null
 txid             | integer                  | not null
 statuschangetime | timestamp with time zone | not null
 employeeid       | integer                  | not null
Indexes: pk_txstatus_txstatusid primary key btree (txstatusid),
         idx_txstatus_statuschangetime btree (statuschangetime)
Foreign Key constraints: fk_txstatus_statusid FOREIGN KEY (statusid)
REFERENCES status(statusid),
                         fk_txstatus_txid FOREIGN KEY (txid) REFERENCES
tx(txid),
                         fk_txstatus_employeeid FOREIGN KEY (employeeid)
REFERENCES employee(employeeid)

                    
SC2test7=# \d tx
                                          Table "public.tx"
      Column       |           Type           |
Modifiers
-------------------+--------------------------+------------------------------------------------------
 txid              | integer                  | not null default
nextval('public.tx_txid_seq'::text)
 txtypeid          | integer                  | not null
 statusid          | integer                  | not null
 amount            | numeric(6,2)             | not null
 servicecharge     | numeric(6,2)             | not null
 customeraccountid | integer                  |
 checksid          | integer                  |
 rent              | numeric(6,2)             |
 scheduledpayments | integer                  |
 txtimestamp       | timestamp with time zone | not null
Indexes: pk_tx_txid primary key btree (txid),
         idx_tx_statusid btree (statusid),
         idx_tx_txtypeid btree (txtypeid)
Foreign Key constraints: fk_tx_txtypeid FOREIGN KEY (txtypeid)
REFERENCES txtype(txtypeid),
                         fk_tx_statusid FOREIGN KEY (statusid)
REFERENCES status(statusid),
                         fk_tx_customeraccountid FOREIGN KEY
(customeraccountid) REFERENCES customeraccount(customeraccountid),
                         fk_tx_checksid FOREIGN KEY (checksid)
REFERENCES checks(checksid)

                                                                              
select * from txstatus,tx txx where txstatus.txid = txx.txid AND
txstatus.statuschangetime = (select max(txstatus.statuschangetime) from
txstatus where txstatus.txid = txx.txid);
ERROR:  variable not found in subplan target list

Re: inner query bug

From
Tom Lane
Date:
Andrew Holm-Hansen <andrew@einer.org> writes:
> ERROR:  variable not found in subplan target list

What PG version is this?  We've fixed a number of bugs with that
symptom, IIRC ...

            regards, tom lane

Re: inner query bug

From
Neil Conway
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> What PG version is this?  We've fixed a number of bugs with that
> symptom, IIRC ...

When we were discussing this on IRC, I made sure to check it was a
recent version of PG -- I believe it is 7.4.0. The last reported
instance I could see of a bug with this symptom was for 7.4 beta 1 (at
which point we thought we'd fixed it).

-Neil

Re: inner query bug

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> When we were discussing this on IRC, I made sure to check it was a
> recent version of PG -- I believe it is 7.4.0. The last reported
> instance I could see of a bug with this symptom was for 7.4 beta 1 (at
> which point we thought we'd fixed it).

Yeah, I don't believe it is the same bug.  That particular error message
suggests that some part of the planner has dropped the ball in building
target lists for plan nodes, but that could cover a lot of territory.

The fact that Andrew is finding the bug difficult to reproduce reliably
is odd too; maybe an uninitialized-variable problem?

            regards, tom lane

Re: inner query bug

From
Tom Lane
Date:
Andrew Holm-Hansen <andrew@einer.org> writes:
> SC2test7=# select * from txstatus,tx txx where txstatus.txid = txx.txid
> AND txstatus.statuschangetime = (select max(txstatus.statuschangetime)
> from txstatus where txstatus.txid = txx.txid); FATAL:  terminating
> connection due to administrator command

The "due to administrator command" bit leaves me still scratching my
head a little.  AFAICS that message can only appear in response to a
SIGTERM signal.  You might want to root around and see if there is
anything in your system that might try to SIGTERM backends unexpectedly.

> select * from txstatus,tx txx where txstatus.txid = txx.txid AND
> txstatus.statuschangetime = (select max(txstatus.statuschangetime) from
> txstatus where txstatus.txid = txx.txid);
> ERROR:  variable not found in subplan target list

This behavior, however, I do understand now; including the fact that
it's not very repeatable.  Your test case involved enough tables (after
view expansion) to result in GEQO planning being invoked, which would
cause the selected plan to vary somewhat randomly.  The failure would
occur if a SubPlan got used in a clause of a hash join --- and the
symptoms would be different depending on whether it was on the inner or
outer side of the join.  This problem is new in 7.4 because prior
releases wouldn't consider hash join for join clauses any more
complicated than "var = var".

I plan to apply the attached minimal patch to 7.4 branch, and a more
extensive cleanup to HEAD.

            regards, tom lane

*** src/backend/executor/nodeHashjoin.c.orig    Thu Sep 25 02:57:59 2003
--- src/backend/executor/nodeHashjoin.c    Tue Nov 25 13:59:45 2003
***************
*** 417,423 ****
       */
      hjstate->hj_InnerHashKeys = (List *)
          ExecInitExpr((Expr *) hashNode->hashkeys,
!                      innerPlanState(hjstate));
      ((HashState *) innerPlanState(hjstate))->hashkeys =
          hjstate->hj_InnerHashKeys;

--- 417,423 ----
       */
      hjstate->hj_InnerHashKeys = (List *)
          ExecInitExpr((Expr *) hashNode->hashkeys,
!                      (PlanState *) hjstate);
      ((HashState *) innerPlanState(hjstate))->hashkeys =
          hjstate->hj_InnerHashKeys;

*** src/backend/optimizer/plan/createplan.c.orig    Wed Aug 27 08:44:12 2003
--- src/backend/optimizer/plan/createplan.c    Tue Nov 25 14:00:42 2003
***************
*** 1093,1104 ****
      hashclauses = order_qual_clauses(root, hashclauses);

      /*
!      * Extract the inner hash keys (right-hand operands of the
!      * hashclauses) to put in the Hash node.
       */
      innerhashkeys = NIL;
      foreach(hcl, hashclauses)
!         innerhashkeys = lappend(innerhashkeys, get_rightop(lfirst(hcl)));

      /* We don't want any excess columns in the hashed tuples */
      disuse_physical_tlist(inner_plan, best_path->jpath.innerjoinpath);
--- 1093,1106 ----
      hashclauses = order_qual_clauses(root, hashclauses);

      /*
!      * Extract the inner hash keys (right-hand operands of the hashclauses)
!      * to put in the Hash node.  Must do a deep copy in case there are
!      * subplans in the hash keys.
       */
      innerhashkeys = NIL;
      foreach(hcl, hashclauses)
!         innerhashkeys = lappend(innerhashkeys,
!                                 copyObject(get_rightop(lfirst(hcl))));

      /* We don't want any excess columns in the hashed tuples */
      disuse_physical_tlist(inner_plan, best_path->jpath.innerjoinpath);

Re: inner query bug

From
Andrew Holm-Hansen
Date:
I tried copying the text of the patch into a file and running patch on
it, but apparently I'm somehow impaired.  Manually patched the relevant
files, and it appears to be working just fine.

Thanks for all your help, it was really much more than I expected.

Andrew Holm-Hansen
andrew@einer.org

On Tue, 2003-11-25 at 13:11, Tom Lane wrote:
> Andrew Holm-Hansen <andrew@einer.org> writes:
> > SC2test7=# select * from txstatus,tx txx where txstatus.txid = txx.txid
> > AND txstatus.statuschangetime = (select max(txstatus.statuschangetime)
> > from txstatus where txstatus.txid = txx.txid); FATAL:  terminating
> > connection due to administrator command
>
> The "due to administrator command" bit leaves me still scratching my
> head a little.  AFAICS that message can only appear in response to a
> SIGTERM signal.  You might want to root around and see if there is
> anything in your system that might try to SIGTERM backends unexpectedly.
>
> > select * from txstatus,tx txx where txstatus.txid = txx.txid AND
> > txstatus.statuschangetime = (select max(txstatus.statuschangetime) from
> > txstatus where txstatus.txid = txx.txid);
> > ERROR:  variable not found in subplan target list
>
> This behavior, however, I do understand now; including the fact that
> it's not very repeatable.  Your test case involved enough tables (after
> view expansion) to result in GEQO planning being invoked, which would
> cause the selected plan to vary somewhat randomly.  The failure would
> occur if a SubPlan got used in a clause of a hash join --- and the
> symptoms would be different depending on whether it was on the inner or
> outer side of the join.  This problem is new in 7.4 because prior
> releases wouldn't consider hash join for join clauses any more
> complicated than "var = var".
>
> I plan to apply the attached minimal patch to 7.4 branch, and a more
> extensive cleanup to HEAD.
>
>             regards, tom lane
>
> *** src/backend/executor/nodeHashjoin.c.orig    Thu Sep 25 02:57:59 2003
> --- src/backend/executor/nodeHashjoin.c    Tue Nov 25 13:59:45 2003
> ***************
> *** 417,423 ****
>        */
>       hjstate->hj_InnerHashKeys = (List *)
>           ExecInitExpr((Expr *) hashNode->hashkeys,
> !                      innerPlanState(hjstate));
>       ((HashState *) innerPlanState(hjstate))->hashkeys =
>           hjstate->hj_InnerHashKeys;
>
> --- 417,423 ----
>        */
>       hjstate->hj_InnerHashKeys = (List *)
>           ExecInitExpr((Expr *) hashNode->hashkeys,
> !                      (PlanState *) hjstate);
>       ((HashState *) innerPlanState(hjstate))->hashkeys =
>           hjstate->hj_InnerHashKeys;
>
> *** src/backend/optimizer/plan/createplan.c.orig    Wed Aug 27 08:44:12 2003
> --- src/backend/optimizer/plan/createplan.c    Tue Nov 25 14:00:42 2003
> ***************
> *** 1093,1104 ****
>       hashclauses = order_qual_clauses(root, hashclauses);
>
>       /*
> !      * Extract the inner hash keys (right-hand operands of the
> !      * hashclauses) to put in the Hash node.
>        */
>       innerhashkeys = NIL;
>       foreach(hcl, hashclauses)
> !         innerhashkeys = lappend(innerhashkeys, get_rightop(lfirst(hcl)));
>
>       /* We don't want any excess columns in the hashed tuples */
>       disuse_physical_tlist(inner_plan, best_path->jpath.innerjoinpath);
> --- 1093,1106 ----
>       hashclauses = order_qual_clauses(root, hashclauses);
>
>       /*
> !      * Extract the inner hash keys (right-hand operands of the hashclauses)
> !      * to put in the Hash node.  Must do a deep copy in case there are
> !      * subplans in the hash keys.
>        */
>       innerhashkeys = NIL;
>       foreach(hcl, hashclauses)
> !         innerhashkeys = lappend(innerhashkeys,
> !                                 copyObject(get_rightop(lfirst(hcl))));
>
>       /* We don't want any excess columns in the hashed tuples */
>       disuse_physical_tlist(inner_plan, best_path->jpath.innerjoinpath);