* Tom Lane <tgl@sss.pgh.pa.us> [001109 18:30] wrote:
> I said:
> > OK, after digging some more, it seems that the critical requirement
> > is that the cursor's query contain a hash join.
>
> Here's the deal:
>
> test7=# set enable_mergejoin to off;
> SET VARIABLE
> test7=# begin;
> BEGIN
> -- I've previously checked that this produces a hash join plan:
> test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
> SELECT
> test7=# fetch 1 from c;
> f1 | f1
> ----+----
> 1 | 1
> (1 row)
>
> test7=# abort;
> NOTICE: trying to delete portal name that does not exist.
> pqReadData() -- backend closed the channel unexpectedly.
> This probably means the backend terminated abnormally
> before or while processing the request.
>
> This happens with either 7.0.2 or 7.0.3 (probably with anything back to
> 6.5, if not before). It does *not* happen with current development tip.
>
> The problem is that two "portal" structures are used. One holds the
> overall query plan and execution state for the cursor, and the other
> holds the hash table for the hash join. During abort, the portal
> manager tries to delete both of them. BUT: deleting the query plan
> causes query cleanup to be executed, which among other things deletes
> the hash join's table. Then the portal manager tries to delete the
> already-deleted second portal, which leads first to the above notice
> and then to Assert failure (and probably would lead to coredump if
> you didn't have Asserts on). Alternatively, it might try to delete
> the hash join portal first, which would leave the query cleanup code
> deleting an already-deleted portal, and doubtless still crashing.
>
> Current sources don't show the problem because hashtables aren't kept
> in portals anymore.
>
> I've thought for some time that CollectNamedPortals is a horrid kluge,
> and really ought to be rewritten. Hadn't seen it actually do the wrong
> thing before, but now...
>
> I guess the immediate question is do we want to hold up 7.0.3 release
> for a fix? This bug is clearly ancient, so I'm not sure it's
> appropriate to go through a fire drill to fix it for 7.0.3.
> Comments?
I dunno, having the database crash because a errant client disconnected
without shutting down, or needed to abort a transaction looks like
a show stopper.
We do track CVS and wouldn't have a problem shifting to 7_0_3_PATCHES,
but I'm not sure if the rest of the userbase is going to have much
fun.
It seems to be a serious problem, I think people wouldn't mind
waiting for you to squash this one.
--
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."